## Multi-Modal Documentation

### 📚 Tutorial

1. [Human Preference Alignment Training Documentation](human-preference-alignment-training-documentation.md)
2. [LmDeploy-inference-acceleration](LmDeploy-inference-acceleration.md)
3. [vLLM Inference Acceleration](vllm-inference-acceleration.md)
4. [MLLM Deployment Documentation](mutlimodal-deployment.md)

### Multi-Modal Best Practice

A single round of dialogue can contain multiple images (or no images):
1. [Qwen-VL Best Practice](qwen-vl-best-practice.md), [Qwen2-VL Best Practice](qwen2-vl-best-practice.md)
2. [Qwen-Audio Best Practice](qwen-audio-best-practice.md), [Qwen2-Audio Best Practice](https://github.com/modelscope/ms-swift/issues/1653)
3. [Llava Best Practice](llava-best-practice.md), [LLava Video Best Practice](llava-video-best-practice.md)
4. [InternVL Series Best Practice](internvl-best-practice.md)
5. [MiniCPM-V Best Practice](minicpm-v-best-practice.md), [MiniCPM-V-2.6 Best Practice](https://github.com/modelscope/ms-swift/issues/1613)
6. [Deepseek-VL Best Practice](deepseek-vl-best-practice.md)
7. [Internlm2-Xcomposers Best Practice](internlm-xcomposer2-best-practice.md)
8. [Phi3-Vision Best Practice](phi3-vision-best-practice.md), [Phi3.5-Vision Best Practice](https://github.com/modelscope/ms-swift/issues/1809).
9. [mPLUG-Owl3 Best Practice](https://github.com/modelscope/ms-swift/issues/1969)
10. [GOT-OCR2 Best Practice](https://github.com/modelscope/ms-swift/issues/2122)

A single round of dialogue can only contain one image:
1. [Yi-VL Best Practice.md](yi-vl-best-practice.md)
2. [Florence Best Practice.md](florence-best-pratice.md)

The entire conversation revolves around one image.
1. [CogVLM Best Practice](cogvlm-best-practice.md), [CogVLM2 Best Practice](cogvlm2-best-practice.md), [GLM4V Best Practice](glm4v-best-practice.md), [CogVLM2-Video Best Practice](cogvlm2-video-best-practice.md)