## Multi-Modal Documentation ### 📚 Tutorial 1. [Human Preference Alignment Training Documentation](human-preference-alignment-training-documentation.md) 2. [LmDeploy-inference-acceleration](LmDeploy-inference-acceleration.md) 3. [vLLM Inference Acceleration](vllm-inference-acceleration.md) 4. [MLLM Deployment Documentation](mutlimodal-deployment.md) ### Multi-Modal Best Practice A single round of dialogue can contain multiple images (or no images): 1. [Qwen-VL Best Practice](qwen-vl-best-practice.md), [Qwen2-VL Best Practice](qwen2-vl-best-practice.md) 2. [Qwen-Audio Best Practice](qwen-audio-best-practice.md), [Qwen2-Audio Best Practice](https://github.com/modelscope/ms-swift/issues/1653) 3. [Llava Best Practice](llava-best-practice.md), [LLava Video Best Practice](llava-video-best-practice.md) 4. [InternVL Series Best Practice](internvl-best-practice.md) 5. [MiniCPM-V Best Practice](minicpm-v-best-practice.md), [MiniCPM-V-2.6 Best Practice](https://github.com/modelscope/ms-swift/issues/1613) 6. [Deepseek-VL Best Practice](deepseek-vl-best-practice.md) 7. [Internlm2-Xcomposers Best Practice](internlm-xcomposer2-best-practice.md) 8. [Phi3-Vision Best Practice](phi3-vision-best-practice.md), [Phi3.5-Vision Best Practice](https://github.com/modelscope/ms-swift/issues/1809). 9. [mPLUG-Owl3 Best Practice](https://github.com/modelscope/ms-swift/issues/1969) 10. [GOT-OCR2 Best Practice](https://github.com/modelscope/ms-swift/issues/2122) A single round of dialogue can only contain one image: 1. [Yi-VL Best Practice.md](yi-vl-best-practice.md) 2. [Florence Best Practice.md](florence-best-pratice.md) The entire conversation revolves around one image. 1. [CogVLM Best Practice](cogvlm-best-practice.md), [CogVLM2 Best Practice](cogvlm2-best-practice.md), [GLM4V Best Practice](glm4v-best-practice.md), [CogVLM2-Video Best Practice](cogvlm2-video-best-practice.md)