Supported models and datasets

Table of Contents

Models
- LLM
- MLLM
Datasets

Models

The table below introcudes all models supported by SWIFT:

Model List: The model_type information registered in SWIFT.
Default Lora Target Modules: Default lora_target_modules used by the model.
Default Template: Default template used by the model.
Support Flash Attn: Whether the model supports flash attention to accelerate sft and infer.
Support VLLM: Whether the model supports vllm to accelerate infer and deployment.
Requires: The extra requirements used by the model.

LLM

Model Type	Model ID	Default Lora Target Modules	Default Template	Support Flash Attn	Support vLLM	Support LMDeploy	Support Megatron	Requires	Tags	HF Model ID
qwen-1_8b	qwen/Qwen-1_8B	c_attn	default-generation	✔	✔	✔	✘		-	Qwen/Qwen-1_8B
qwen-1_8b-chat	qwen/Qwen-1_8B-Chat	c_attn	qwen	✔	✔	✔	✘		-	Qwen/Qwen-1_8B-Chat
qwen-1_8b-chat-int4	qwen/Qwen-1_8B-Chat-Int4	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-1_8B-Chat-Int4
qwen-1_8b-chat-int8	qwen/Qwen-1_8B-Chat-Int8	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-1_8B-Chat-Int8
qwen-7b	qwen/Qwen-7B	c_attn	default-generation	✔	✔	✔	✘		-	Qwen/Qwen-7B
qwen-7b-chat	qwen/Qwen-7B-Chat	c_attn	qwen	✔	✔	✔	✘		-	Qwen/Qwen-7B-Chat
qwen-7b-chat-int4	qwen/Qwen-7B-Chat-Int4	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-7B-Chat-Int4
qwen-7b-chat-int8	qwen/Qwen-7B-Chat-Int8	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-7B-Chat-Int8
qwen-14b	qwen/Qwen-14B	c_attn	default-generation	✔	✔	✔	✘		-	Qwen/Qwen-14B
qwen-14b-chat	qwen/Qwen-14B-Chat	c_attn	qwen	✔	✔	✔	✘		-	Qwen/Qwen-14B-Chat
qwen-14b-chat-int4	qwen/Qwen-14B-Chat-Int4	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-14B-Chat-Int4
qwen-14b-chat-int8	qwen/Qwen-14B-Chat-Int8	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-14B-Chat-Int8
qwen-72b	qwen/Qwen-72B	c_attn	default-generation	✔	✔	✔	✘		-	Qwen/Qwen-72B
qwen-72b-chat	qwen/Qwen-72B-Chat	c_attn	qwen	✔	✔	✔	✘		-	Qwen/Qwen-72B-Chat
qwen-72b-chat-int4	qwen/Qwen-72B-Chat-Int4	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-72B-Chat-Int4
qwen-72b-chat-int8	qwen/Qwen-72B-Chat-Int8	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-72B-Chat-Int8
modelscope-agent-7b	iic/ModelScope-Agent-7B	c_attn	modelscope-agent	✔	✘	✘	✘		-	-
modelscope-agent-14b	iic/ModelScope-Agent-14B	c_attn	modelscope-agent	✔	✘	✘	✘		-	-
qwen1half-0_5b	qwen/Qwen1.5-0.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-0.5B
qwen1half-1_8b	qwen/Qwen1.5-1.8B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-1.8B
qwen1half-4b	qwen/Qwen1.5-4B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-4B
qwen1half-7b	qwen/Qwen1.5-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-7B
qwen1half-14b	qwen/Qwen1.5-14B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-14B
qwen1half-32b	qwen/Qwen1.5-32B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen1.5-32B
qwen1half-72b	qwen/Qwen1.5-72B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-72B
qwen1half-110b	qwen/Qwen1.5-110B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen1.5-110B
codeqwen1half-7b	qwen/CodeQwen1.5-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/CodeQwen1.5-7B
qwen1half-moe-a2_7b	qwen/Qwen1.5-MoE-A2.7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.40	moe	Qwen/Qwen1.5-MoE-A2.7B
qwen1half-0_5b-chat	qwen/Qwen1.5-0.5B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-0.5B-Chat
qwen1half-1_8b-chat	qwen/Qwen1.5-1.8B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-1.8B-Chat
qwen1half-4b-chat	qwen/Qwen1.5-4B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-4B-Chat
qwen1half-7b-chat	qwen/Qwen1.5-7B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-7B-Chat
qwen1half-14b-chat	qwen/Qwen1.5-14B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-14B-Chat
qwen1half-32b-chat	qwen/Qwen1.5-32B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen1.5-32B-Chat
qwen1half-72b-chat	qwen/Qwen1.5-72B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-72B-Chat
qwen1half-110b-chat	qwen/Qwen1.5-110B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen1.5-110B-Chat
qwen1half-moe-a2_7b-chat	qwen/Qwen1.5-MoE-A2.7B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	transformers>=4.40	moe	Qwen/Qwen1.5-MoE-A2.7B-Chat
codeqwen1half-7b-chat	qwen/CodeQwen1.5-7B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37	-	Qwen/CodeQwen1.5-7B-Chat
qwen1half-0_5b-chat-int4	qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4
qwen1half-1_8b-chat-int4	qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4
qwen1half-4b-chat-int4	qwen/Qwen1.5-4B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-4B-Chat-GPTQ-Int4
qwen1half-7b-chat-int4	qwen/Qwen1.5-7B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-7B-Chat-GPTQ-Int4
qwen1half-14b-chat-int4	qwen/Qwen1.5-14B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-14B-Chat-GPTQ-Int4
qwen1half-32b-chat-int4	qwen/Qwen1.5-32B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-32B-Chat-GPTQ-Int4
qwen1half-72b-chat-int4	qwen/Qwen1.5-72B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-72B-Chat-GPTQ-Int4
qwen1half-110b-chat-int4	qwen/Qwen1.5-110B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-110B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-int8	qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8
qwen1half-1_8b-chat-int8	qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8
qwen1half-4b-chat-int8	qwen/Qwen1.5-4B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-4B-Chat-GPTQ-Int8
qwen1half-7b-chat-int8	qwen/Qwen1.5-7B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-7B-Chat-GPTQ-Int8
qwen1half-14b-chat-int8	qwen/Qwen1.5-14B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-14B-Chat-GPTQ-Int8
qwen1half-72b-chat-int8	qwen/Qwen1.5-72B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-72B-Chat-GPTQ-Int8
qwen1half-moe-a2_7b-chat-int4	qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✘	✘	✘	auto_gptq>=0.5, transformers>=4.40	moe	Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-awq	qwen/Qwen1.5-0.5B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-0.5B-Chat-AWQ
qwen1half-1_8b-chat-awq	qwen/Qwen1.5-1.8B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-1.8B-Chat-AWQ
qwen1half-4b-chat-awq	qwen/Qwen1.5-4B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-4B-Chat-AWQ
qwen1half-7b-chat-awq	qwen/Qwen1.5-7B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-7B-Chat-AWQ
qwen1half-14b-chat-awq	qwen/Qwen1.5-14B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-14B-Chat-AWQ
qwen1half-32b-chat-awq	qwen/Qwen1.5-32B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-32B-Chat-AWQ
qwen1half-72b-chat-awq	qwen/Qwen1.5-72B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-72B-Chat-AWQ
qwen1half-110b-chat-awq	qwen/Qwen1.5-110B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-110B-Chat-AWQ
codeqwen1half-7b-chat-awq	qwen/CodeQwen1.5-7B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/CodeQwen1.5-7B-Chat-AWQ
qwen2-0_5b	qwen/Qwen2-0.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-0.5B
qwen2-0_5b-instruct	qwen/Qwen2-0.5B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-0.5B-Instruct
qwen2-0_5b-instruct-int4	qwen/Qwen2-0.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4
qwen2-0_5b-instruct-int8	qwen/Qwen2-0.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8
qwen2-0_5b-instruct-awq	qwen/Qwen2-0.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2-0.5B-Instruct-AWQ
qwen2-1_5b	qwen/Qwen2-1.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-1.5B
qwen2-1_5b-instruct	qwen/Qwen2-1.5B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-1.5B-Instruct
qwen2-1_5b-instruct-int4	qwen/Qwen2-1.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4
qwen2-1_5b-instruct-int8	qwen/Qwen2-1.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-1_5B-Instruct-GPTQ-Int8
qwen2-1_5b-instruct-awq	qwen/Qwen2-1.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2-1.5B-Instruct-AWQ
qwen2-7b	qwen/Qwen2-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-7B
qwen2-7b-instruct	qwen/Qwen2-7B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-7B-Instruct
qwen2-7b-instruct-int4	qwen/Qwen2-7B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-7B-Instruct-GPTQ-Int4
qwen2-7b-instruct-int8	qwen/Qwen2-7B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-7B-Instruct-GPTQ-Int8
qwen2-7b-instruct-awq	qwen/Qwen2-7B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2-7B-Instruct-AWQ
qwen2-72b	qwen/Qwen2-72B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-72B
qwen2-72b-instruct	qwen/Qwen2-72B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-72B-Instruct
qwen2-72b-instruct-int4	qwen/Qwen2-72B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-72B-Instruct-GPTQ-Int4
qwen2-72b-instruct-int8	qwen/Qwen2-72B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-72B-Instruct-GPTQ-Int8
qwen2-72b-instruct-awq	qwen/Qwen2-72B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2-72B-Instruct-AWQ
qwen2-57b-a14b	qwen/Qwen2-57B-A14B	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.40	moe	Qwen/Qwen2-57B-A14B
qwen2-57b-a14b-instruct	qwen/Qwen2-57B-A14B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	transformers>=4.40	moe	Qwen/Qwen2-57B-A14B-Instruct
qwen2-57b-a14b-instruct-int4	qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.40	moe	Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4
qwen2-math-1_5b	qwen/Qwen2-Math-1.5B	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-Math-1.5B
qwen2-math-1_5b-instruct	qwen/Qwen2-Math-1.5B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-Math-1.5B-Instruct
qwen2-math-7b	qwen/Qwen2-Math-7B	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-Math-7B
qwen2-math-7b-instruct	qwen/Qwen2-Math-7B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-Math-7B-Instruct
qwen2-math-72b	qwen/Qwen2-Math-72B	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-Math-72B
qwen2-math-72b-instruct	qwen/Qwen2-Math-72B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-Math-72B-Instruct
qwen2_5-0_5b	qwen/Qwen2.5-0.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-0.5B
qwen2_5-1_5b	qwen/Qwen2.5-1.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-1.5B
qwen2_5-3b	qwen/Qwen2.5-3B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-3B
qwen2_5-7b	qwen/Qwen2.5-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-7B
qwen2_5-14b	qwen/Qwen2.5-14B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-14B
qwen2_5-32b	qwen/Qwen2.5-32B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-32B
qwen2_5-72b	qwen/Qwen2.5-72B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-72B
qwen2_5-0_5b-instruct	qwen/Qwen2.5-0.5B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-0.5B-Instruct
qwen2_5-1_5b-instruct	qwen/Qwen2.5-1.5B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-1.5B-Instruct
qwen2_5-3b-instruct	qwen/Qwen2.5-3B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-3B-Instruct
qwen2_5-7b-instruct	qwen/Qwen2.5-7B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-7B-Instruct
qwen2_5-14b-instruct	qwen/Qwen2.5-14B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-14B-Instruct
qwen2_5-32b-instruct	qwen/Qwen2.5-32B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-32B-Instruct
qwen2_5-72b-instruct	qwen/Qwen2.5-72B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-72B-Instruct
qwen2_5-0_5b-instruct-gptq-int4	qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4
qwen2_5-1_5b-instruct-gptq-int4	qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4
qwen2_5-3b-instruct-gptq-int4	qwen/Qwen2.5-3B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4
qwen2_5-7b-instruct-gptq-int4	qwen/Qwen2.5-7B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4
qwen2_5-14b-instruct-gptq-int4	qwen/Qwen2.5-14B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4
qwen2_5-32b-instruct-gptq-int4	qwen/Qwen2.5-32B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4
qwen2_5-72b-instruct-gptq-int4	qwen/Qwen2.5-72B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4
qwen2_5-0_5b-instruct-gptq-int8	qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8
qwen2_5-1_5b-instruct-gptq-int8	qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8
qwen2_5-3b-instruct-gptq-int8	qwen/Qwen2.5-3B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8
qwen2_5-7b-instruct-gptq-int8	qwen/Qwen2.5-7B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8
qwen2_5-14b-instruct-gptq-int8	qwen/Qwen2.5-14B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8
qwen2_5-32b-instruct-gptq-int8	qwen/Qwen2.5-32B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8
qwen2_5-72b-instruct-gptq-int8	qwen/Qwen2.5-72B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8
qwen2_5-0_5b-instruct-awq	qwen/Qwen2.5-0.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-0.5B-Instruct-AWQ
qwen2_5-1_5b-instruct-awq	qwen/Qwen2.5-1.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-1.5B-Instruct-AWQ
qwen2_5-3b-instruct-awq	qwen/Qwen2.5-3B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-3B-Instruct-AWQ
qwen2_5-7b-instruct-awq	qwen/Qwen2.5-7B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-7B-Instruct-AWQ
qwen2_5-14b-instruct-awq	qwen/Qwen2.5-14B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-14B-Instruct-AWQ
qwen2_5-32b-instruct-awq	qwen/Qwen2.5-32B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-32B-Instruct-AWQ
qwen2_5-72b-instruct-awq	qwen/Qwen2.5-72B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-72B-Instruct-AWQ
qwen2_5-math-1_5b	qwen/Qwen2.5-Math-1.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Math-1.5B
qwen2_5-math-7b	qwen/Qwen2.5-Math-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Math-7B
qwen2_5-math-72b	qwen/Qwen2.5-Math-72B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Math-72B
qwen2_5-math-1_5b-instruct	qwen/Qwen2.5-Math-1.5B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Math-1.5B-Instruct
qwen2_5-math-7b-instruct	qwen/Qwen2.5-Math-7B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Math-7B-Instruct
qwen2_5-math-72b-instruct	qwen/Qwen2.5-Math-72B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Math-72B-Instruct
qwen2_5-coder-0_5b	qwen/Qwen2.5-Coder-0.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-0.5B
qwen2_5-coder-0_5b-instruct	qwen/Qwen2.5-Coder-0.5B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-0.5B-Instruct
qwen2_5-coder-0_5b-instruct-gptq-int4	qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4
qwen2_5-coder-0_5b-instruct-gptq-int8	qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8
qwen2_5-coder-0_5b-instruct-awq	qwen/Qwen2.5-Coder-0.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-0.5B-Instruct-AWQ
qwen2_5-coder-1_5b	qwen/Qwen2.5-Coder-1.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-1.5B
qwen2_5-coder-1_5b-instruct	qwen/Qwen2.5-Coder-1.5B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-1.5B-Instruct
qwen2_5-coder-1_5b-instruct-gptq-int4	qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4
qwen2_5-coder-1_5b-instruct-gptq-int8	qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8
qwen2_5-coder-1_5b-instruct-awq	qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-1.5B-Instruct-AWQ
qwen2_5-coder-3b	qwen/Qwen2.5-Coder-3B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-3B
qwen2_5-coder-3b-instruct	qwen/Qwen2.5-Coder-3B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-3B-Instruct
qwen2_5-coder-3b-instruct-gptq-int4	qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4
qwen2_5-coder-3b-instruct-gptq-int8	qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8
qwen2_5-coder-3b-instruct-awq	qwen/Qwen2.5-Coder-3B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-3B-Instruct-AWQ
qwen2_5-coder-7b	qwen/Qwen2.5-Coder-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-7B
qwen2_5-coder-7b-instruct	qwen/Qwen2.5-Coder-7B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-7B-Instruct
qwen2_5-coder-7b-instruct-gptq-int4	qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4
qwen2_5-coder-7b-instruct-gptq-int8	qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8
qwen2_5-coder-7b-instruct-awq	qwen/Qwen2.5-Coder-7B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-7B-Instruct-AWQ
qwen2_5-coder-14b	qwen/Qwen2.5-Coder-14B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-14B
qwen2_5-coder-14b-instruct	qwen/Qwen2.5-Coder-14B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-14B-Instruct
qwen2_5-coder-14b-instruct-gptq-int4	qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4
qwen2_5-coder-14b-instruct-gptq-int8	qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8
qwen2_5-coder-14b-instruct-awq	qwen/Qwen2.5-Coder-14B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-14B-Instruct-AWQ
qwen2_5-coder-32b	qwen/Qwen2.5-Coder-32B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-32B
qwen2_5-coder-32b-instruct	qwen/Qwen2.5-Coder-32B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-32B-Instruct
qwen2_5-coder-32b-instruct-gptq-int4	qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4
qwen2_5-coder-32b-instruct-gptq-int8	qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8
qwen2_5-coder-32b-instruct-awq	qwen/Qwen2.5-Coder-32B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-32B-Instruct-AWQ
qwq-32b-preview	Qwen/QwQ-32B-Preview	q_proj, k_proj, v_proj	qwq	✔	✔	✔	✔	transformers>=4.37	-	Qwen/QwQ-32B-Preview
marco-o1	AIDC-AI/Marco-o1	q_proj, k_proj, v_proj	marco_o1	✔	✔	✔	✘	transformers>=4.37	-	AIDC-AI/Marco-o1
chatglm2-6b	ZhipuAI/chatglm2-6b	query_key_value	chatglm2	✘	✔	✘	✘	transformers<4.42	-	THUDM/chatglm2-6b
chatglm2-6b-32k	ZhipuAI/chatglm2-6b-32k	query_key_value	chatglm2	✘	✔	✘	✘	transformers<4.42	-	THUDM/chatglm2-6b-32k
chatglm3-6b-base	ZhipuAI/chatglm3-6b-base	query_key_value	chatglm-generation	✘	✔	✘	✘	transformers<4.42	-	THUDM/chatglm3-6b-base
chatglm3-6b	ZhipuAI/chatglm3-6b	query_key_value	chatglm3	✘	✔	✘	✘	transformers<4.42	-	THUDM/chatglm3-6b
chatglm3-6b-32k	ZhipuAI/chatglm3-6b-32k	query_key_value	chatglm3	✘	✔	✘	✘	transformers<4.42	-	THUDM/chatglm3-6b-32k
chatglm3-6b-128k	ZhipuAI/chatglm3-6b-128k	query_key_value	chatglm3	✘	✔	✘	✘	transformers<4.42	-	THUDM/chatglm3-6b-128k
codegeex2-6b	ZhipuAI/codegeex2-6b	query_key_value	chatglm-generation	✘	✔	✘	✘	transformers<4.34	coding	THUDM/codegeex2-6b
glm4-9b	ZhipuAI/glm-4-9b	query_key_value	chatglm-generation	✔	✔	✔	✘	transformers>=4.42	-	THUDM/glm-4-9b
glm4-9b-chat	ZhipuAI/glm-4-9b-chat	query_key_value	chatglm4	✔	✔	✔	✘	transformers>=4.42	-	THUDM/glm-4-9b-chat
glm4-9b-chat-1m	ZhipuAI/glm-4-9b-chat-1m	query_key_value	chatglm4	✔	✔	✔	✘	transformers>=4.42	-	THUDM/glm-4-9b-chat-1m
codegeex4-9b-chat	ZhipuAI/codegeex4-all-9b	query_key_value	codegeex4	✔	✔	✔	✘	transformers<4.42	coding	THUDM/codegeex4-all-9b
glm-edge-1_5b-chat	ZhipuAI/glm-edge-1.5b-chat	q_proj, k_proj, v_proj	chatglm4	✔	✘	✘	✘	transformers>=4.46	-	THUDM/glm-edge-1.5b-chat
glm-edge-4b-chat	ZhipuAI/glm-edge-4b-chat	q_proj, k_proj, v_proj	chatglm4	✔	✘	✘	✘	transformers>=4.46	-	THUDM/glm-edge-4b-chat
llama2-7b	modelscope/Llama-2-7b-ms	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	meta-llama/Llama-2-7b-hf
llama2-7b-chat	modelscope/Llama-2-7b-chat-ms	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	meta-llama/Llama-2-7b-chat-hf
llama2-13b	modelscope/Llama-2-13b-ms	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	meta-llama/Llama-2-13b-hf
llama2-13b-chat	modelscope/Llama-2-13b-chat-ms	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	meta-llama/Llama-2-13b-chat-hf
llama2-70b	modelscope/Llama-2-70b-ms	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	meta-llama/Llama-2-70b-hf
llama2-70b-chat	modelscope/Llama-2-70b-chat-ms	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	meta-llama/Llama-2-70b-chat-hf
llama2-7b-aqlm-2bit-1x16	AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf	q_proj, k_proj, v_proj	default-generation	✔	✘	✘	✘	transformers>=4.38, aqlm, torch>=2.2.0	-	ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf
llama3-8b	LLM-Research/Meta-Llama-3-8B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	meta-llama/Meta-Llama-3-8B
llama3-8b-instruct	LLM-Research/Meta-Llama-3-8B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘		-	meta-llama/Meta-Llama-3-8B-Instruct
llama3-8b-instruct-int4	swift/Meta-Llama-3-8B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	auto_gptq	-	study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4
llama3-8b-instruct-int8	swift/Meta-Llama-3-8B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	auto_gptq	-	study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8
llama3-8b-instruct-awq	swift/Meta-Llama-3-8B-Instruct-AWQ	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	autoawq	-	study-hjt/Meta-Llama-3-8B-Instruct-AWQ
llama3-70b	LLM-Research/Meta-Llama-3-70B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	meta-llama/Meta-Llama-3-70B
llama3-70b-instruct	LLM-Research/Meta-Llama-3-70B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘		-	meta-llama/Meta-Llama-3-70B-Instruct
llama3-70b-instruct-int4	swift/Meta-Llama-3-70B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	auto_gptq	-	study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4
llama3-70b-instruct-int8	swift/Meta-Llama-3-70b-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	auto_gptq	-	study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8
llama3-70b-instruct-awq	swift/Meta-Llama-3-70B-Instruct-AWQ	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	autoawq	-	study-hjt/Meta-Llama-3-70B-Instruct-AWQ
llama3_1-8b	LLM-Research/Meta-Llama-3.1-8B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-8B
llama3_1-8b-instruct	LLM-Research/Meta-Llama-3.1-8B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-8B-Instruct
llama3_1-8b-instruct-awq	LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, autoawq	-	hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4
llama3_1-8b-instruct-gptq-int4	LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, auto_gptq	-	hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4
llama3_1-8b-instruct-bnb	LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, bitsandbytes	-	hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4
llama3_1-70b	LLM-Research/Meta-Llama-3.1-70B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-70B
llama3_1-70b-instruct	LLM-Research/Meta-Llama-3.1-70B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-70B-Instruct
llama3_1-70b-instruct-fp8	LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-70B-Instruct-FP8
llama3_1-70b-instruct-awq	LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	transformers>=4.43, autoawq	-	hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
llama3_1-70b-instruct-gptq-int4	LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, auto_gptq	-	hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4
llama3_1-70b-instruct-bnb	LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, bitsandbytes	-	unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit
llama3_1-405b	LLM-Research/Meta-Llama-3.1-405B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-405B
llama3_1-405b-instruct	LLM-Research/Meta-Llama-3.1-405B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-405B-Instruct
llama3_1-405b-instruct-fp8	LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
llama3_1-405b-instruct-awq	LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	transformers>=4.43, autoawq	-	hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4
llama3_1-405b-instruct-gptq-int4	LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, auto_gptq	-	hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4
llama3_1-405b-instruct-bnb	LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, bitsandbytes	-	hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4
llama-3.1-nemotron-70B-instruct-hf	AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	transformers>=4.43	-	nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
llama3_2-1b	LLM-Research/Llama-3.2-1B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.45	-	meta-llama/Llama-3.2-1B
llama3_2-1b-instruct	LLM-Research/Llama-3.2-1B-Instruct	q_proj, k_proj, v_proj	llama3_2	✔	✔	✔	✘	transformers>=4.45	-	meta-llama/Llama-3.2-1B-Instruct
llama3_2-3b	LLM-Research/Llama-3.2-3B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.45	-	meta-llama/Llama-3.2-3B
llama3_2-3b-instruct	LLM-Research/Llama-3.2-3B-Instruct	q_proj, k_proj, v_proj	llama3_2	✔	✔	✔	✘	transformers>=4.45	-	meta-llama/Llama-3.2-3B-Instruct
reflection-llama_3_1-70b	LLM-Research/Reflection-Llama-3.1-70B	q_proj, k_proj, v_proj	reflection	✔	✔	✘	✘	transformers>=4.43	-	mattshumer/Reflection-Llama-3.1-70B
longwriter-glm4-9b	ZhipuAI/LongWriter-glm4-9b	query_key_value	chatglm4	✔	✔	✔	✘	transformers>=4.42	-	THUDM/LongWriter-glm4-9b
longwriter-llama3_1-8b	ZhipuAI/LongWriter-llama3.1-8b	q_proj, k_proj, v_proj	longwriter-llama3	✔	✔	✔	✘	transformers>=4.43	-	THUDM/LongWriter-llama3.1-8b
chinese-llama-2-1_3b	AI-ModelScope/chinese-llama-2-1.3b	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/chinese-llama-2-1.3b
chinese-llama-2-7b	AI-ModelScope/chinese-llama-2-7b	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/chinese-llama-2-7b
chinese-llama-2-7b-16k	AI-ModelScope/chinese-llama-2-7b-16k	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/chinese-llama-2-7b-16k
chinese-llama-2-7b-64k	AI-ModelScope/chinese-llama-2-7b-64k	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/chinese-llama-2-7b-64k
chinese-llama-2-13b	AI-ModelScope/chinese-llama-2-13b	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/chinese-llama-2-13b
chinese-llama-2-13b-16k	AI-ModelScope/chinese-llama-2-13b-16k	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/chinese-llama-2-13b-16k
chinese-alpaca-2-1_3b	AI-ModelScope/chinese-alpaca-2-1.3b	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	hfl/chinese-alpaca-2-1.3b
chinese-alpaca-2-7b	AI-ModelScope/chinese-alpaca-2-7b	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	hfl/chinese-alpaca-2-7b
chinese-alpaca-2-7b-16k	AI-ModelScope/chinese-alpaca-2-7b-16k	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	hfl/chinese-alpaca-2-7b-16k
chinese-alpaca-2-7b-64k	AI-ModelScope/chinese-alpaca-2-7b-64k	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	hfl/chinese-alpaca-2-7b-64k
chinese-alpaca-2-13b	AI-ModelScope/chinese-alpaca-2-13b	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	hfl/chinese-alpaca-2-13b
chinese-alpaca-2-13b-16k	AI-ModelScope/chinese-alpaca-2-13b-16k	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	hfl/chinese-alpaca-2-13b-16k
llama-3-chinese-8b	ChineseAlpacaGroup/llama-3-chinese-8b	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/llama-3-chinese-8b
llama-3-chinese-8b-instruct	ChineseAlpacaGroup/llama-3-chinese-8b-instruct	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘		-	hfl/llama-3-chinese-8b-instruct
atom-7b	FlagAlpha/Atom-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘		-	FlagAlpha/Atom-7B
atom-7b-chat	FlagAlpha/Atom-7B-Chat	q_proj, k_proj, v_proj	atom	✔	✔	✘	✘		-	FlagAlpha/Atom-7B-Chat
yi-6b	01ai/Yi-6B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-6B
yi-6b-200k	01ai/Yi-6B-200K	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-6B-200K
yi-6b-chat	01ai/Yi-6B-Chat	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-6B-Chat
yi-6b-chat-awq	01ai/Yi-6B-Chat-4bits	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘	autoawq	-	01-ai/Yi-6B-Chat-4bits
yi-6b-chat-int8	01ai/Yi-6B-Chat-8bits	q_proj, k_proj, v_proj	chatml	✔	✔	✘	✘	auto_gptq	-	01-ai/Yi-6B-Chat-8bits
yi-9b	01ai/Yi-9B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-9B
yi-9b-200k	01ai/Yi-9B-200K	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-9B-200K
yi-34b	01ai/Yi-34B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-34B
yi-34b-200k	01ai/Yi-34B-200K	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-34B-200K
yi-34b-chat	01ai/Yi-34B-Chat	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-34B-Chat
yi-34b-chat-awq	01ai/Yi-34B-Chat-4bits	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘	autoawq	-	01-ai/Yi-34B-Chat-4bits
yi-34b-chat-int8	01ai/Yi-34B-Chat-8bits	q_proj, k_proj, v_proj	chatml	✔	✔	✘	✘	auto_gptq	-	01-ai/Yi-34B-Chat-8bits
yi-1_5-6b	01ai/Yi-1.5-6B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-1.5-6B
yi-1_5-6b-chat	01ai/Yi-1.5-6B-Chat	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-1.5-6B-Chat
yi-1_5-9b	01ai/Yi-1.5-9B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-1.5-9B
yi-1_5-9b-chat	01ai/Yi-1.5-9B-Chat	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-1.5-9B-Chat
yi-1_5-9b-chat-16k	01ai/Yi-1.5-9B-Chat-16K	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-1.5-9B-Chat-16K
yi-1_5-34b	01ai/Yi-1.5-34B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-1.5-34B
yi-1_5-34b-chat	01ai/Yi-1.5-34B-Chat	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-1.5-34B-Chat
yi-1_5-34b-chat-16k	01ai/Yi-1.5-34B-Chat-16K	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-1.5-34B-Chat-16K
yi-1_5-6b-chat-awq-int4	AI-ModelScope/Yi-1.5-6B-Chat-AWQ	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘	autoawq	-	modelscope/Yi-1.5-6B-Chat-AWQ
yi-1_5-6b-chat-gptq-int4	AI-ModelScope/Yi-1.5-6B-Chat-GPTQ	q_proj, k_proj, v_proj	chatml	✔	✔	✘	✘	auto_gptq>=0.5	-	modelscope/Yi-1.5-6B-Chat-GPTQ
yi-1_5-9b-chat-awq-int4	AI-ModelScope/Yi-1.5-9B-Chat-AWQ	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘	autoawq	-	modelscope/Yi-1.5-9B-Chat-AWQ
yi-1_5-9b-chat-gptq-int4	AI-ModelScope/Yi-1.5-9B-Chat-GPTQ	q_proj, k_proj, v_proj	chatml	✔	✔	✘	✘	auto_gptq>=0.5	-	modelscope/Yi-1.5-9B-Chat-GPTQ
yi-1_5-34b-chat-awq-int4	AI-ModelScope/Yi-1.5-34B-Chat-AWQ	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘	autoawq	-	modelscope/Yi-1.5-34B-Chat-AWQ
yi-1_5-34b-chat-gptq-int4	AI-ModelScope/Yi-1.5-34B-Chat-GPTQ	q_proj, k_proj, v_proj	chatml	✔	✔	✘	✘	auto_gptq>=0.5	-	modelscope/Yi-1.5-34B-Chat-GPTQ
yi-coder-1_5b	01ai/Yi-Coder-1.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-Coder-1.5B
yi-coder-1_5b-chat	01ai/Yi-Coder-1.5B-Chat	q_proj, k_proj, v_proj	yi-coder	✔	✔	✔	✘		-	01-ai/Yi-Coder-1.5B-Chat
yi-coder-9b	01ai/Yi-Coder-9B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-Coder-9B
yi-coder-9b-chat	01ai/Yi-Coder-9B-Chat	q_proj, k_proj, v_proj	yi-coder	✔	✔	✔	✘		-	01-ai/Yi-Coder-9B-Chat
internlm-7b	Shanghai_AI_Laboratory/internlm-7b	q_proj, k_proj, v_proj	default-generation	✘	✔	✔	✘		-	internlm/internlm-7b
internlm-7b-chat	Shanghai_AI_Laboratory/internlm-chat-7b	q_proj, k_proj, v_proj	internlm	✘	✔	✔	✘		-	internlm/internlm-chat-7b
internlm-7b-chat-8k	Shanghai_AI_Laboratory/internlm-chat-7b-8k	q_proj, k_proj, v_proj	internlm	✘	✔	✔	✘		-	-
internlm-20b	Shanghai_AI_Laboratory/internlm-20b	q_proj, k_proj, v_proj	default-generation	✘	✔	✔	✘		-	internlm/internlm-20b
internlm-20b-chat	Shanghai_AI_Laboratory/internlm-chat-20b	q_proj, k_proj, v_proj	internlm	✘	✔	✔	✘		-	internlm/internlm-chat-20b
internlm2-1_8b	Shanghai_AI_Laboratory/internlm2-1_8b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-1_8b
internlm2-1_8b-sft-chat	Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-chat-1_8b-sft
internlm2-1_8b-chat	Shanghai_AI_Laboratory/internlm2-chat-1_8b	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-chat-1_8b
internlm2-7b-base	Shanghai_AI_Laboratory/internlm2-base-7b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-base-7b
internlm2-7b	Shanghai_AI_Laboratory/internlm2-7b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-7b
internlm2-7b-sft-chat	Shanghai_AI_Laboratory/internlm2-chat-7b-sft	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-chat-7b-sft
internlm2-7b-chat	Shanghai_AI_Laboratory/internlm2-chat-7b	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-chat-7b
internlm2-20b-base	Shanghai_AI_Laboratory/internlm2-base-20b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-base-20b
internlm2-20b	Shanghai_AI_Laboratory/internlm2-20b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-20b
internlm2-20b-sft-chat	Shanghai_AI_Laboratory/internlm2-chat-20b-sft	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-chat-20b-sft
internlm2-20b-chat	Shanghai_AI_Laboratory/internlm2-chat-20b	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-chat-20b
internlm2_5-1_8b	Shanghai_AI_Laboratory/internlm2_5-1_8b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-1_8b
internlm2_5-1_8b-chat	Shanghai_AI_Laboratory/internlm2_5-1_8b-chat	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-1_8b-chat
internlm2_5-7b	Shanghai_AI_Laboratory/internlm2_5-7b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-7b
internlm2_5-7b-chat	Shanghai_AI_Laboratory/internlm2_5-7b-chat	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-7b-chat
internlm2_5-7b-chat-1m	Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-7b-chat-1m
internlm2_5-20b	Shanghai_AI_Laboratory/internlm2_5-20b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-20b
internlm2_5-20b-chat	Shanghai_AI_Laboratory/internlm2_5-20b-chat	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-20b-chat
internlm2-math-7b	Shanghai_AI_Laboratory/internlm2-math-base-7b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	math	internlm/internlm2-math-base-7b
internlm2-math-7b-chat	Shanghai_AI_Laboratory/internlm2-math-7b	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	math	internlm/internlm2-math-7b
internlm2-math-20b	Shanghai_AI_Laboratory/internlm2-math-base-20b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	math	internlm/internlm2-math-base-20b
internlm2-math-20b-chat	Shanghai_AI_Laboratory/internlm2-math-20b	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	math	internlm/internlm2-math-20b
deepseek-7b	deepseek-ai/deepseek-llm-7b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	deepseek-ai/deepseek-llm-7b-base
deepseek-7b-chat	deepseek-ai/deepseek-llm-7b-chat	q_proj, k_proj, v_proj	deepseek	✔	✔	✔	✘		-	deepseek-ai/deepseek-llm-7b-chat
deepseek-moe-16b	deepseek-ai/deepseek-moe-16b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘		moe	deepseek-ai/deepseek-moe-16b-base
deepseek-moe-16b-chat	deepseek-ai/deepseek-moe-16b-chat	q_proj, k_proj, v_proj	deepseek	✔	✔	✘	✘		moe	deepseek-ai/deepseek-moe-16b-chat
deepseek-67b	deepseek-ai/deepseek-llm-67b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	deepseek-ai/deepseek-llm-67b-base
deepseek-67b-chat	deepseek-ai/deepseek-llm-67b-chat	q_proj, k_proj, v_proj	deepseek	✔	✔	✔	✘		-	deepseek-ai/deepseek-llm-67b-chat
deepseek-coder-1_3b	deepseek-ai/deepseek-coder-1.3b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		coding	deepseek-ai/deepseek-coder-1.3b-base
deepseek-coder-1_3b-instruct	deepseek-ai/deepseek-coder-1.3b-instruct	q_proj, k_proj, v_proj	deepseek-coder	✔	✔	✔	✘		coding	deepseek-ai/deepseek-coder-1.3b-instruct
deepseek-coder-6_7b	deepseek-ai/deepseek-coder-6.7b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		coding	deepseek-ai/deepseek-coder-6.7b-base
deepseek-coder-6_7b-instruct	deepseek-ai/deepseek-coder-6.7b-instruct	q_proj, k_proj, v_proj	deepseek-coder	✔	✔	✔	✘		coding	deepseek-ai/deepseek-coder-6.7b-instruct
deepseek-coder-33b	deepseek-ai/deepseek-coder-33b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		coding	deepseek-ai/deepseek-coder-33b-base
deepseek-coder-33b-instruct	deepseek-ai/deepseek-coder-33b-instruct	q_proj, k_proj, v_proj	deepseek-coder	✔	✔	✔	✘		coding	deepseek-ai/deepseek-coder-33b-instruct
deepseek-coder-v2-instruct	deepseek-ai/DeepSeek-Coder-V2-Instruct	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2	✔	✔	✘	✘	transformers>=4.39.3	coding, moe	deepseek-ai/DeepSeek-Coder-V2-Instruct
deepseek-coder-v2-lite-instruct	deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2	✔	✔	✘	✘	transformers>=4.39.3	coding, moe	deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
deepseek-coder-v2	deepseek-ai/DeepSeek-Coder-V2-Base	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	default-generation	✔	✔	✘	✘	transformers>=4.39.3	coding, moe	deepseek-ai/DeepSeek-Coder-V2-Base
deepseek-coder-v2-lite	deepseek-ai/DeepSeek-Coder-V2-Lite-Base	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	default-generation	✔	✔	✘	✘	transformers>=4.39.3	coding, moe	deepseek-ai/DeepSeek-Coder-V2-Lite-Base
deepseek-math-7b	deepseek-ai/deepseek-math-7b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		math	deepseek-ai/deepseek-math-7b-base
deepseek-math-7b-instruct	deepseek-ai/deepseek-math-7b-instruct	q_proj, k_proj, v_proj	deepseek	✔	✔	✔	✘		math	deepseek-ai/deepseek-math-7b-instruct
deepseek-math-7b-chat	deepseek-ai/deepseek-math-7b-rl	q_proj, k_proj, v_proj	deepseek	✔	✔	✔	✘		math	deepseek-ai/deepseek-math-7b-rl
numina-math-7b	AI-ModelScope/NuminaMath-7B-TIR	q_proj, k_proj, v_proj	numina-math	✔	✔	✘	✘		math	AI-MO/NuminaMath-7B-TIR
deepseek-v2	deepseek-ai/DeepSeek-V2	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	default-generation	✔	✔	✘	✘	transformers>=4.39.3	moe	deepseek-ai/DeepSeek-V2
deepseek-v2-chat	deepseek-ai/DeepSeek-V2-Chat	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2	✔	✔	✘	✘	transformers>=4.39.3	moe	deepseek-ai/DeepSeek-V2-Chat
deepseek-v2-lite	deepseek-ai/DeepSeek-V2-Lite	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	default-generation	✔	✔	✘	✘	transformers>=4.39.3	moe	deepseek-ai/DeepSeek-V2-Lite
deepseek-v2-lite-chat	deepseek-ai/DeepSeek-V2-Lite-Chat	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2	✔	✔	✘	✘	transformers>=4.39.3	moe	deepseek-ai/DeepSeek-V2-Lite-Chat
deepseek-v2_5	deepseek-ai/DeepSeek-V2.5	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2_5	✔	✔	✘	✘	transformers>=4.39.3	moe	deepseek-ai/DeepSeek-V2.5
gemma-2b	AI-ModelScope/gemma-2b	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.38	-	google/gemma-2b
gemma-7b	AI-ModelScope/gemma-7b	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.38	-	google/gemma-7b
gemma-2b-instruct	AI-ModelScope/gemma-2b-it	q_proj, k_proj, v_proj	gemma	✔	✔	✘	✘	transformers>=4.38	-	google/gemma-2b-it
gemma-7b-instruct	AI-ModelScope/gemma-7b-it	q_proj, k_proj, v_proj	gemma	✔	✔	✘	✘	transformers>=4.38	-	google/gemma-7b-it
gemma2-2b	LLM-Research/gemma-2-2b	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.42	-	google/gemma-2-2b
gemma2-9b	LLM-Research/gemma-2-9b	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.42	-	google/gemma-2-9b
gemma2-27b	LLM-Research/gemma-2-27b	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.42	-	google/gemma-2-27b
gemma2-2b-instruct	LLM-Research/gemma-2-2b-it	q_proj, k_proj, v_proj	gemma	✔	✔	✘	✘	transformers>=4.42	-	google/gemma-2-2b-it
gemma2-9b-instruct	LLM-Research/gemma-2-9b-it	q_proj, k_proj, v_proj	gemma	✔	✔	✘	✘	transformers>=4.42	-	google/gemma-2-9b-it
gemma2-27b-instruct	LLM-Research/gemma-2-27b-it	q_proj, k_proj, v_proj	gemma	✔	✔	✘	✘	transformers>=4.42	-	google/gemma-2-27b-it
minicpm-1b-sft-chat	OpenBMB/MiniCPM-1B-sft-bf16	q_proj, k_proj, v_proj	minicpm	✔	✔	✘	✘	transformers>=4.36.0	-	openbmb/MiniCPM-1B-sft-bf16
minicpm-2b-sft-chat	OpenBMB/MiniCPM-2B-sft-fp32	q_proj, k_proj, v_proj	minicpm	✔	✔	✘	✘		-	openbmb/MiniCPM-2B-sft-fp32
minicpm-2b-chat	OpenBMB/MiniCPM-2B-dpo-fp32	q_proj, k_proj, v_proj	minicpm	✔	✔	✘	✘		-	openbmb/MiniCPM-2B-dpo-fp32
minicpm-2b-128k	OpenBMB/MiniCPM-2B-128k	q_proj, k_proj, v_proj	chatml	✔	✔	✘	✘	transformers>=4.36.0	-	openbmb/MiniCPM-2B-128k
minicpm-moe-8x2b	OpenBMB/MiniCPM-MoE-8x2B	q_proj, k_proj, v_proj	minicpm	✔	✔	✘	✘	transformers>=4.36.0	moe	openbmb/MiniCPM-MoE-8x2B
minicpm3-4b	OpenBMB/MiniCPM3-4B	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj	chatml	✔	✘	✘	✘	transformers>=4.36	-	openbmb/MiniCPM3-4B
openbuddy-llama-65b-chat	OpenBuddy/openbuddy-llama-65b-v8-bf16	q_proj, k_proj, v_proj	openbuddy	✔	✔	✔	✘		-	OpenBuddy/openbuddy-llama-65b-v8-bf16
openbuddy-llama2-13b-chat	OpenBuddy/openbuddy-llama2-13b-v8.1-fp16	q_proj, k_proj, v_proj	openbuddy	✔	✔	✔	✘		-	OpenBuddy/openbuddy-llama2-13b-v8.1-fp16
openbuddy-llama2-70b-chat	OpenBuddy/openbuddy-llama2-70b-v10.1-bf16	q_proj, k_proj, v_proj	openbuddy	✔	✔	✔	✘		-	OpenBuddy/openbuddy-llama2-70b-v10.1-bf16
openbuddy-llama3-8b-chat	OpenBuddy/openbuddy-llama3-8b-v21.1-8k	q_proj, k_proj, v_proj	openbuddy2	✔	✔	✔	✘		-	OpenBuddy/openbuddy-llama3-8b-v21.1-8k
openbuddy-llama3-70b-chat	OpenBuddy/openbuddy-llama3-70b-v21.1-8k	q_proj, k_proj, v_proj	openbuddy2	✔	✔	✔	✘		-	OpenBuddy/openbuddy-llama3-70b-v21.1-8k
openbuddy-mistral-7b-chat	OpenBuddy/openbuddy-mistral-7b-v17.1-32k	q_proj, k_proj, v_proj	openbuddy	✔	✔	✔	✘	transformers>=4.34	-	OpenBuddy/openbuddy-mistral-7b-v17.1-32k
openbuddy-zephyr-7b-chat	OpenBuddy/openbuddy-zephyr-7b-v14.1	q_proj, k_proj, v_proj	openbuddy	✔	✔	✔	✘	transformers>=4.34	-	OpenBuddy/openbuddy-zephyr-7b-v14.1
openbuddy-deepseek-67b-chat	OpenBuddy/openbuddy-deepseek-67b-v15.2	q_proj, k_proj, v_proj	openbuddy	✔	✔	✔	✘		-	OpenBuddy/openbuddy-deepseek-67b-v15.2
openbuddy-mixtral-moe-7b-chat	OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k	q_proj, k_proj, v_proj	openbuddy	✔	✔	✘	✘	transformers>=4.36	moe	OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k
openbuddy-llama3_1-8b-chat	OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k	q_proj, k_proj, v_proj	openbuddy2	✔	✔	✔	✘	transformers>=4.43	-	OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k
mistral-7b	AI-ModelScope/Mistral-7B-v0.1	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.34	-	mistralai/Mistral-7B-v0.1
mistral-7b-v2	AI-ModelScope/Mistral-7B-v0.2-hf	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.34	-	alpindale/Mistral-7B-v0.2-hf
mistral-7b-instruct	AI-ModelScope/Mistral-7B-Instruct-v0.1	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘	transformers>=4.34	-	mistralai/Mistral-7B-Instruct-v0.1
mistral-7b-instruct-v2	AI-ModelScope/Mistral-7B-Instruct-v0.2	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘	transformers>=4.34	-	mistralai/Mistral-7B-Instruct-v0.2
mistral-7b-instruct-v3	LLM-Research/Mistral-7B-Instruct-v0.3	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘	transformers>=4.34	-	mistralai/Mistral-7B-Instruct-v0.3
mistral-nemo-base-2407	AI-ModelScope/Mistral-Nemo-Base-2407	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.43	-	mistralai/Mistral-Nemo-Base-2407
mistral-nemo-instruct-2407	AI-ModelScope/Mistral-Nemo-Instruct-2407	q_proj, k_proj, v_proj	mistral-nemo	✔	✔	✘	✘	transformers>=4.43	-	mistralai/Mistral-Nemo-Instruct-2407
mistral-large-instruct-2407	LLM-Research/Mistral-Large-Instruct-2407	q_proj, k_proj, v_proj	mistral-nemo	✔	✔	✘	✘	transformers>=4.43	-	mistralai/Mistral-Large-Instruct-2407
mistral-small-instruct-2409	AI-ModelScope/Mistral-Small-Instruct-2409	q_proj, k_proj, v_proj	mistral-nemo	✔	✔	✘	✘	transformers>=4.43	-	mistralai/Mistral-Small-Instruct-2409
mixtral-moe-7b	AI-ModelScope/Mixtral-8x7B-v0.1	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.36	moe	mistralai/Mixtral-8x7B-v0.1
mixtral-moe-7b-instruct	AI-ModelScope/Mixtral-8x7B-Instruct-v0.1	q_proj, k_proj, v_proj	llama	✔	✔	✘	✘	transformers>=4.36	moe	mistralai/Mixtral-8x7B-Instruct-v0.1
mixtral-moe-7b-aqlm-2bit-1x16	AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf	q_proj, k_proj, v_proj	default-generation	✔	✘	✘	✘	transformers>=4.38, aqlm, torch>=2.2.0	moe	ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf
mixtral-moe-8x22b-v1	AI-ModelScope/Mixtral-8x22B-v0.1	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.36	moe	mistral-community/Mixtral-8x22B-v0.1
ministral-8b-instruct-2410	AI-ModelScope/Ministral-8B-Instruct-2410	q_proj, k_proj, v_proj	mistral-nemo	✔	✔	✘	✘	transformers>=4.46	-	mistralai/Ministral-8B-Instruct-2410
wizardlm2-7b-awq	AI-ModelScope/WizardLM-2-7B-AWQ	q_proj, k_proj, v_proj	wizardlm2-awq	✔	✔	✘	✘	transformers>=4.34	-	MaziyarPanahi/WizardLM-2-7B-AWQ
wizardlm2-8x22b	AI-ModelScope/WizardLM-2-8x22B	q_proj, k_proj, v_proj	wizardlm2	✔	✔	✘	✘	transformers>=4.36	-	alpindale/WizardLM-2-8x22B
baichuan-7b	baichuan-inc/baichuan-7B	W_pack	default-generation	✘	✔	✔	✘	transformers<4.34	-	baichuan-inc/Baichuan-7B
baichuan-13b	baichuan-inc/Baichuan-13B-Base	W_pack	default-generation	✘	✔	✔	✘	transformers<4.34	-	baichuan-inc/Baichuan-13B-Base
baichuan-13b-chat	baichuan-inc/Baichuan-13B-Chat	W_pack	baichuan	✘	✔	✔	✘	transformers<4.34	-	baichuan-inc/Baichuan-13B-Chat
baichuan2-7b	baichuan-inc/Baichuan2-7B-Base	W_pack	default-generation	✘	✔	✔	✘		-	baichuan-inc/Baichuan2-7B-Base
baichuan2-7b-chat	baichuan-inc/Baichuan2-7B-Chat	W_pack	baichuan	✘	✔	✔	✘		-	baichuan-inc/Baichuan2-7B-Chat
baichuan2-7b-chat-int4	baichuan-inc/Baichuan2-7B-Chat-4bits	W_pack	baichuan	✘	✘	✘	✘	bitsandbytes<0.41.2, accelerate<0.26	-	baichuan-inc/Baichuan2-7B-Chat-4bits
baichuan2-13b	baichuan-inc/Baichuan2-13B-Base	W_pack	default-generation	✘	✔	✔	✘		-	baichuan-inc/Baichuan2-13B-Base
baichuan2-13b-chat	baichuan-inc/Baichuan2-13B-Chat	W_pack	baichuan	✘	✔	✔	✘		-	baichuan-inc/Baichuan2-13B-Chat
baichuan2-13b-chat-int4	baichuan-inc/Baichuan2-13B-Chat-4bits	W_pack	baichuan	✘	✘	✘	✘	bitsandbytes<0.41.2, accelerate<0.26	-	baichuan-inc/Baichuan2-13B-Chat-4bits
yuan2-2b-instruct	YuanLLM/Yuan2.0-2B-hf	q_proj, k_proj, v_proj	yuan	✔	✘	✘	✘		-	IEITYuan/Yuan2-2B-hf
yuan2-2b-janus-instruct	YuanLLM/Yuan2-2B-Janus-hf	q_proj, k_proj, v_proj	yuan	✔	✘	✘	✘		-	IEITYuan/Yuan2-2B-Janus-hf
yuan2-51b-instruct	YuanLLM/Yuan2.0-51B-hf	q_proj, k_proj, v_proj	yuan	✔	✘	✘	✘		-	IEITYuan/Yuan2-51B-hf
yuan2-102b-instruct	YuanLLM/Yuan2.0-102B-hf	q_proj, k_proj, v_proj	yuan	✔	✘	✘	✘		-	IEITYuan/Yuan2-102B-hf
yuan2-m32	YuanLLM/Yuan2-M32-hf	q_proj, k_proj, v_proj	yuan	✔	✘	✘	✘		moe	IEITYuan/Yuan2-M32-hf
xverse-7b	xverse/XVERSE-7B	q_proj, k_proj, v_proj	default-generation	✘	✔	✘	✘		-	xverse/XVERSE-7B
xverse-7b-chat	xverse/XVERSE-7B-Chat	q_proj, k_proj, v_proj	xverse	✘	✔	✘	✘		-	xverse/XVERSE-7B-Chat
xverse-13b	xverse/XVERSE-13B	q_proj, k_proj, v_proj	default-generation	✘	✔	✘	✘		-	xverse/XVERSE-13B
xverse-13b-chat	xverse/XVERSE-13B-Chat	q_proj, k_proj, v_proj	xverse	✘	✔	✘	✘		-	xverse/XVERSE-13B-Chat
xverse-65b	xverse/XVERSE-65B	q_proj, k_proj, v_proj	default-generation	✘	✔	✘	✘		-	xverse/XVERSE-65B
xverse-65b-v2	xverse/XVERSE-65B-2	q_proj, k_proj, v_proj	default-generation	✘	✔	✘	✘		-	xverse/XVERSE-65B-2
xverse-65b-chat	xverse/XVERSE-65B-Chat	q_proj, k_proj, v_proj	xverse	✘	✔	✘	✘		-	xverse/XVERSE-65B-Chat
xverse-13b-256k	xverse/XVERSE-13B-256K	q_proj, k_proj, v_proj	default-generation	✘	✔	✘	✘		-	xverse/XVERSE-13B-256K
xverse-moe-a4_2b	xverse/XVERSE-MoE-A4.2B	q_proj, k_proj, v_proj	default-generation	✘	✘	✘	✘		moe	xverse/XVERSE-MoE-A4.2B
orion-14b	OrionStarAI/Orion-14B-Base	q_proj, k_proj, v_proj	default-generation	✔	✘	✘	✘		-	OrionStarAI/Orion-14B-Base
orion-14b-chat	OrionStarAI/Orion-14B-Chat	q_proj, k_proj, v_proj	orion	✔	✘	✘	✘		-	OrionStarAI/Orion-14B-Chat
bluelm-7b	vivo-ai/BlueLM-7B-Base	q_proj, k_proj, v_proj	default-generation	✘	✘	✘	✘		-	vivo-ai/BlueLM-7B-Base
bluelm-7b-32k	vivo-ai/BlueLM-7B-Base-32K	q_proj, k_proj, v_proj	default-generation	✘	✘	✘	✘		-	vivo-ai/BlueLM-7B-Base-32K
bluelm-7b-chat	vivo-ai/BlueLM-7B-Chat	q_proj, k_proj, v_proj	bluelm	✘	✘	✘	✘		-	vivo-ai/BlueLM-7B-Chat
bluelm-7b-chat-32k	vivo-ai/BlueLM-7B-Chat-32K	q_proj, k_proj, v_proj	bluelm	✘	✘	✘	✘		-	vivo-ai/BlueLM-7B-Chat-32K
ziya2-13b	Fengshenbang/Ziya2-13B-Base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	IDEA-CCNL/Ziya2-13B-Base
ziya2-13b-chat	Fengshenbang/Ziya2-13B-Chat	q_proj, k_proj, v_proj	ziya	✔	✔	✔	✘		-	IDEA-CCNL/Ziya2-13B-Chat
skywork-13b	skywork/Skywork-13B-base	q_proj, k_proj, v_proj	default-generation	✘	✘	✘	✘		-	Skywork/Skywork-13B-base
skywork-13b-chat	skywork/Skywork-13B-chat	q_proj, k_proj, v_proj	skywork	✘	✘	✘	✘		-	-
zephyr-7b-beta-chat	modelscope/zephyr-7b-beta	q_proj, k_proj, v_proj	zephyr	✔	✔	✔	✘	transformers>=4.34	-	HuggingFaceH4/zephyr-7b-beta
polylm-13b	damo/nlp_polylm_13b_text_generation	c_attn	default-generation	✘	✘	✘	✘		-	DAMO-NLP-MT/polylm-13b
seqgpt-560m	damo/nlp_seqgpt-560m	query_key_value	default-generation	✘	✔	✘	✘		-	DAMO-NLP/SeqGPT-560M
sus-34b-chat	SUSTC/SUS-Chat-34B	q_proj, k_proj, v_proj	sus	✔	✔	✔	✘		-	SUSTech/SUS-Chat-34B
tongyi-finance-14b	TongyiFinance/Tongyi-Finance-14B	c_attn	default-generation	✔	✔	✔	✘		financial	-
tongyi-finance-14b-chat	TongyiFinance/Tongyi-Finance-14B-Chat	c_attn	qwen	✔	✔	✔	✘		financial	jxy/Tongyi-Finance-14B-Chat
tongyi-finance-14b-chat-int4	TongyiFinance/Tongyi-Finance-14B-Chat-Int4	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	financial	jxy/Tongyi-Finance-14B-Chat-Int4
codefuse-codellama-34b-chat	codefuse-ai/CodeFuse-CodeLlama-34B	q_proj, k_proj, v_proj	codefuse-codellama	✔	✔	✔	✘		coding	codefuse-ai/CodeFuse-CodeLlama-34B
codefuse-codegeex2-6b-chat	codefuse-ai/CodeFuse-CodeGeeX2-6B	query_key_value	codefuse	✘	✔	✘	✘	transformers<4.34	coding	codefuse-ai/CodeFuse-CodeGeeX2-6B
codefuse-qwen-14b-chat	codefuse-ai/CodeFuse-QWen-14B	c_attn	codefuse	✔	✔	✔	✘		coding	codefuse-ai/CodeFuse-QWen-14B
phi2-3b	AI-ModelScope/phi-2	Wqkv	default-generation	✔	✔	✘	✘		coding	microsoft/phi-2
phi3-4b-4k-instruct	LLM-Research/Phi-3-mini-4k-instruct	qkv_proj	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3-mini-4k-instruct
phi3-4b-128k-instruct	LLM-Research/Phi-3-mini-128k-instruct	qkv_proj	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3-mini-128k-instruct
phi3-small-8k-instruct	LLM-Research/Phi-3-small-8k-instruct	query_key_value	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3-small-8k-instruct
phi3-medium-4k-instruct	LLM-Research/Phi-3-medium-4k-instruct	qkv_proj	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3-medium-4k-instruct
phi3-small-128k-instruct	LLM-Research/Phi-3-small-128k-instruct	query_key_value	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3-small-128k-instruct
phi3-medium-128k-instruct	LLM-Research/Phi-3-medium-128k-instruct	qkv_proj	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3-medium-128k-instruct
phi3_5-mini-instruct	LLM-Research/Phi-3.5-mini-instruct	qkv_proj	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3.5-mini-instruct
phi3_5-moe-instruct	LLM-Research/Phi-3.5-MoE-instruct	q_proj, k_proj, v_proj	phi3	✔	✔	✘	✘	transformers>=4.36	moe	microsoft/Phi-3.5-MoE-instruct
mamba-130m	AI-ModelScope/mamba-130m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-130m-hf
mamba-370m	AI-ModelScope/mamba-370m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-370m-hf
mamba-390m	AI-ModelScope/mamba-390m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-390m-hf
mamba-790m	AI-ModelScope/mamba-790m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-790m-hf
mamba-1.4b	AI-ModelScope/mamba-1.4b-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-1.4b-hf
mamba-2.8b	AI-ModelScope/mamba-2.8b-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-2.8b-hf
telechat-7b	TeleAI/TeleChat-7B	key_value, query	telechat	✔	✘	✘	✘		-	Tele-AI/telechat-7B
telechat-12b	TeleAI/TeleChat-12B	key_value, query	telechat	✔	✘	✘	✘		-	Tele-AI/TeleChat-12B
telechat-12b-v2	TeleAI/TeleChat-12B-v2	key_value, query	telechat	✔	✘	✘	✘		-	Tele-AI/TeleChat-12B-v2
telechat-12b-v2-gptq-int4	swift/TeleChat-12B-V2-GPTQ-Int4	key_value, query	telechat	✔	✘	✘	✘	auto_gptq>=0.5	-	-
telechat2-115b	TeleAI/TeleChat2-115B	key_value, query	telechat2	✔	✘	✘	✘		-	Tele-AI/TeleChat2-115B
grok-1	colossalai/grok-1-pytorch	q_proj, k_proj, v_proj	default-generation	✘	✘	✘	✘		-	hpcai-tech/grok-1
dbrx-instruct	AI-ModelScope/dbrx-instruct	attn.Wqkv	dbrx	✔	✔	✘	✘	transformers>=4.36	moe	databricks/dbrx-instruct
dbrx-base	AI-ModelScope/dbrx-base	attn.Wqkv	dbrx	✔	✔	✘	✘	transformers>=4.36	moe	databricks/dbrx-base
mengzi3-13b-base	langboat/Mengzi3-13B-Base	q_proj, k_proj, v_proj	mengzi	✔	✔	✘	✘		-	Langboat/Mengzi3-13B-Base
c4ai-command-r-v01	AI-ModelScope/c4ai-command-r-v01	q_proj, k_proj, v_proj	c4ai	✔	✔	✘	✘	transformers>=4.39.1	-	CohereForAI/c4ai-command-r-v01
c4ai-command-r-plus	AI-ModelScope/c4ai-command-r-plus	q_proj, k_proj, v_proj	c4ai	✔	✔	✘	✘	transformers>4.39	-	CohereForAI/c4ai-command-r-plus
aya-expanse-8b	AI-ModelScope/aya-expanse-8b	q_proj, k_proj, v_proj	aya	✔	✔	✘	✘	transformers>=4.44.0	-	CohereForAI/aya-expanse-8b
aya-expanse-32b	AI-ModelScope/aya-expanse-32b	q_proj, k_proj, v_proj	aya	✔	✔	✘	✘	transformers>=4.44.0	-	CohereForAI/aya-expanse-32b
codestral-22b	swift/Codestral-22B-v0.1	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.34	-	mistralai/Codestral-22B-v0.1

MLLM

Model Type	Model ID	Default Lora Target Modules	Default Template	Support Flash Attn	Support vLLM	Support LMDeploy	Support Megatron	Requires	Tags	HF Model ID
qwen-vl	qwen/Qwen-VL	^(transformer.h)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen-vl-generation	✔	✔	✔	✘		vision	Qwen/Qwen-VL
qwen-vl-chat	qwen/Qwen-VL-Chat	^(transformer.h)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen-vl	✔	✔	✔	✘		vision	Qwen/Qwen-VL-Chat
qwen-vl-chat-int4	qwen/Qwen-VL-Chat-Int4	^(transformer.h)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen-vl	✔	✔	✘	✘	auto_gptq>=0.5	vision	Qwen/Qwen-VL-Chat-Int4
qwen-audio	qwen/Qwen-Audio	^(transformer.h)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen-audio-generation	✔	✘	✘	✘		audio	Qwen/Qwen-Audio
qwen-audio-chat	qwen/Qwen-Audio-Chat	^(transformer.h)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen-audio	✔	✘	✘	✘		audio	Qwen/Qwen-Audio-Chat
qwen2-audio-7b	qwen/Qwen2-Audio-7B	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-audio-generation	✔	✘	✘	✘	librosa, transformers>=4.45	audio	Qwen/Qwen2-Audio-7B
qwen2-audio-7b-instruct	qwen/Qwen2-Audio-7B-Instruct	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-audio	✔	✘	✘	✘	librosa, transformers>=4.45	audio	Qwen/Qwen2-Audio-7B-Instruct
qwen2-vl-2b	qwen/Qwen2-VL-2B	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl-generation	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils	vision, video	Qwen/Qwen2-VL-2B
qwen2-vl-2b-instruct	qwen/Qwen2-VL-2B-Instruct	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils	vision, video	Qwen/Qwen2-VL-2B-Instruct
qwen2-vl-2b-instruct-gptq-int4	qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5	vision, video	Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4
qwen2-vl-2b-instruct-gptq-int8	qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5	vision, video	Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8
qwen2-vl-2b-instruct-awq	qwen/Qwen2-VL-2B-Instruct-AWQ	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, autoawq	vision, video	Qwen/Qwen2-VL-2B-Instruct-AWQ
qwen2-vl-7b	qwen/Qwen2-VL-7B	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl-generation	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils	vision, video	Qwen/Qwen2-VL-7B
qwen2-vl-7b-instruct	qwen/Qwen2-VL-7B-Instruct	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils	vision, video	Qwen/Qwen2-VL-7B-Instruct
qwen2-vl-7b-instruct-gptq-int4	qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5	vision, video	Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4
qwen2-vl-7b-instruct-gptq-int8	qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5	vision, video	Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8
qwen2-vl-7b-instruct-awq	qwen/Qwen2-VL-7B-Instruct-AWQ	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, autoawq	vision, video	Qwen/Qwen2-VL-7B-Instruct-AWQ
qwen2-vl-72b	qwen/Qwen2-VL-72B	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl-generation	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils	vision, video	Qwen/Qwen2-VL-72B
qwen2-vl-72b-instruct	qwen/Qwen2-VL-72B-Instruct	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils	vision, video	Qwen/Qwen2-VL-72B-Instruct
qwen2-vl-72b-instruct-gptq-int4	qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5	vision, video	Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4
qwen2-vl-72b-instruct-gptq-int8	qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5	vision, video	Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8
qwen2-vl-72b-instruct-awq	qwen/Qwen2-VL-72B-Instruct-AWQ	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, autoawq	vision, video	Qwen/Qwen2-VL-72B-Instruct-AWQ
glm4v-9b-chat	ZhipuAI/glm-4v-9b	^(transformer.encoder)(?!.(lm_head\|output\|emb\|wte\|shared)).	glm4v	✘	✘	✘	✘	transformers>=4.42	vision	THUDM/glm-4v-9b
glm-edge-v-2b	ZhipuAI/glm-edge-v-2b	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	glm-edge-v	✔	✘	✘	✘	transformers>=4.46	vision	THUDM/glm-edge-v-2b
glm-edge-v-5b	ZhipuAI/glm-edge-v-5b	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	glm-edge-v	✔	✘	✘	✘	transformers>=4.46	vision	THUDM/glm-edge-v-5b
llama3_2-11b-vision	LLM-Research/Llama-3.2-11B-Vision	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama3_2-vision-generation	✔	✔	✘	✘	transformers>=4.45	vision	meta-llama/Llama-3.2-11B-Vision
llama3_2-11b-vision-instruct	LLM-Research/Llama-3.2-11B-Vision-Instruct	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama3_2-vision	✔	✔	✘	✘	transformers>=4.45	vision	meta-llama/Llama-3.2-11B-Vision-Instruct
llama3_2-90b-vision	LLM-Research/Llama-3.2-90B-Vision	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama3_2-vision-generation	✔	✔	✘	✘	transformers>=4.45	vision	meta-llama/Llama-3.2-90B-Vision
llama3_2-90b-vision-instruct	LLM-Research/Llama-3.2-90B-Vision-Instruct	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama3_2-vision	✔	✔	✘	✘	transformers>=4.45	vision	meta-llama/Llama-3.2-90B-Vision-Instruct
llama3_1-8b-omni	ICTNLP/Llama-3.1-8B-Omni	^(model.layers\|model.speech_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama3_1-omni	✔	✘	✘	✘	whisper, openai-whisper	audio	ICTNLP/Llama-3.1-8B-Omni
idefics3-8b-llama3	AI-ModelScope/Idefics3-8B-Llama3	^(model.text_model\|model.connector)(?!.(lm_head\|output\|emb\|wte\|shared)).	idefics3	✔	✘	✘	✘	transformers>=4.45	vision	HuggingFaceM4/Idefics3-8B-Llama3
llava1_5-7b-instruct	swift/llava-1.5-7b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava1_5	✔	✔	✘	✘	transformers>=4.36	vision	llava-hf/llava-1.5-7b-hf
llava1_5-13b-instruct	swift/llava-1.5-13b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava1_5	✔	✔	✘	✘	transformers>=4.36	vision	llava-hf/llava-1.5-13b-hf
llava1_6-mistral-7b-instruct	swift/llava-v1.6-mistral-7b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-mistral	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llava-v1.6-mistral-7b-hf
llava1_6-vicuna-7b-instruct	swift/llava-v1.6-vicuna-7b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-vicuna	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llava-v1.6-vicuna-7b-hf
llava1_6-vicuna-13b-instruct	swift/llava-v1.6-vicuna-13b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-vicuna	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llava-v1.6-vicuna-13b-hf
llava1_6-llama3_1-8b-instruct	swift/llava-llama3.1-8b	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-next-llama3	✔	✘	✘	✘	transformers>=4.41	vision	-
llava1_6-yi-34b-instruct	swift/llava-v1.6-34b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-yi	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llava-v1.6-34b-hf
llama3-llava-next-8b-hf	swift/llama3-llava-next-8b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama-llava-next-hf	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llama3-llava-next-8b-hf
llava-next-72b-hf	AI-ModelScope/llava-next-72b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama-qwen-hf	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llava-next-72b-hf
llava-next-110b-hf	AI-ModelScope/llava-next-110b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama-qwen-hf	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llava-next-110b-hf
llava-onevision-qwen2-0_5b-ov	AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-onevision-qwen	✔	✘	✘	✘	transformers>=4.45	vision, video	llava-hf/llava-onevision-qwen2-0.5b-ov-hf
llava-onevision-qwen2-7b-ov	AI-ModelScope/llava-onevision-qwen2-7b-ov-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-onevision-qwen	✔	✘	✘	✘	transformers>=4.45	vision, video	llava-hf/llava-onevision-qwen2-7b-ov-hf
llava-onevision-qwen2-72b-ov	AI-ModelScope/llava-onevision-qwen2-72b-ov-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-onevision-qwen	✔	✘	✘	✘	transformers>=4.45	vision, video	llava-hf/llava-onevision-qwen2-72b-ov-hf
llama3-llava-next-8b	AI-Modelscope/llama3-llava-next-8b	^(model.layers\|model.mm_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama3-llava-next	✔	✘	✘	✘		vision	lmms-lab/llama3-llava-next-8b
llava-next-72b	AI-Modelscope/llava-next-72b	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-qwen	✔	✘	✘	✘		vision	lmms-lab/llava-next-72b
llava-next-110b	AI-Modelscope/llava-next-110b	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-qwen	✔	✘	✘	✘		vision	lmms-lab/llava-next-110b
llava-next-video-7b-instruct	swift/LLaVA-NeXT-Video-7B-hf	^(language_model\|multi_modal_projector\|vision_resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-next-video	✔	✔	✘	✘	transformers>=4.42, av	video	llava-hf/LLaVA-NeXT-Video-7B-hf
llava-next-video-7b-32k-instruct	swift/LLaVA-NeXT-Video-7B-32K-hf	^(language_model\|multi_modal_projector\|vision_resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-next-video	✔	✔	✘	✘	transformers>=4.42, av	video	llava-hf/LLaVA-NeXT-Video-7B-32K-hf
llava-next-video-7b-dpo-instruct	swift/LLaVA-NeXT-Video-7B-DPO-hf	^(language_model\|multi_modal_projector\|vision_resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-next-video	✔	✔	✘	✘	transformers>=4.42, av	video	llava-hf/LLaVA-NeXT-Video-7B-DPO-hf
llava-next-video-34b-instruct	swift/LLaVA-NeXT-Video-34B-hf	^(language_model\|multi_modal_projector\|vision_resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-next-video-yi	✔	✔	✘	✘	transformers>=4.42, av	video	llava-hf/LLaVA-NeXT-Video-34B-hf
yi-vl-6b-chat	01ai/Yi-VL-6B	^(model.layers\|model.mm_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	yi-vl	✔	✘	✘	✘	transformers>=4.34	vision	01-ai/Yi-VL-6B
yi-vl-34b-chat	01ai/Yi-VL-34B	^(model.layers\|model.mm_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	yi-vl	✔	✘	✘	✘	transformers>=4.34	vision	01-ai/Yi-VL-34B
llava-llama3-8b-v1_1	AI-ModelScope/llava-llama-3-8b-v1_1-transformers	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-llama-instruct	✔	✔	✘	✘	transformers>=4.36	vision	xtuner/llava-llama-3-8b-v1_1-transformers
internlm-xcomposer2-7b-chat	Shanghai_AI_Laboratory/internlm-xcomposer2-7b	attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3	internlm-xcomposer2	✔	✘	✔	✘		vision	internlm/internlm-xcomposer2-7b
internlm-xcomposer2-4khd-7b-chat	Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b	attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3	internlm-xcomposer2-4khd	✔	✘	✔	✘		vision	internlm/internlm-xcomposer2-4khd-7b
internlm-xcomposer2_5-7b-chat	Shanghai_AI_Laboratory/internlm-xcomposer2d5-7b	attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3	internlm-xcomposer2_5	✔	✘	✔	✘		vision	internlm/internlm-xcomposer2d5-7b
internvl-chat-v1_5	AI-ModelScope/InternVL-Chat-V1-5	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl	✔	✔	✔	✘	transformers>=4.35, timm	vision	OpenGVLab/InternVL-Chat-V1-5
internvl-chat-v1_5-int8	AI-ModelScope/InternVL-Chat-V1-5-int8	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl	✔	✘	✘	✘	transformers>=4.35, timm	vision	OpenGVLab/InternVL-Chat-V1-5-int8
mini-internvl-chat-2b-v1_5	OpenGVLab/Mini-InternVL-Chat-2B-V1-5	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl	✔	✔	✔	✘	transformers>=4.35, timm	vision	OpenGVLab/Mini-InternVL-Chat-2B-V1-5
mini-internvl-chat-4b-v1_5	OpenGVLab/Mini-InternVL-Chat-4B-V1-5	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl-phi3	✔	✔	✘	✘	transformers>=4.35,<4.42, timm	vision	OpenGVLab/Mini-InternVL-Chat-4B-V1-5
internvl2-1b	OpenGVLab/InternVL2-1B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-1B
internvl2-2b	OpenGVLab/InternVL2-2B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-2B
internvl2-4b	OpenGVLab/InternVL2-4B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2-phi3	✔	✔	✔	✘	transformers>=4.36,<4.42, timm	vision, video	OpenGVLab/InternVL2-4B
internvl2-8b	OpenGVLab/InternVL2-8B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-8B
internvl2-26b	OpenGVLab/InternVL2-26B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-26B
internvl2-40b	OpenGVLab/InternVL2-40B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-40B
internvl2-llama3-76b	OpenGVLab/InternVL2-Llama3-76B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-Llama3-76B
internvl2-2b-awq	OpenGVLab/InternVL2-2B-AWQ	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-2B-AWQ
internvl2-8b-awq	OpenGVLab/InternVL2-8B-AWQ	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-8B-AWQ
internvl2-26b-awq	OpenGVLab/InternVL2-26B-AWQ	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-26B-AWQ
internvl2-40b-awq	OpenGVLab/InternVL2-40B-AWQ	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-40B-AWQ
internvl2-llama3-76b-awq	OpenGVLab/InternVL2-Llama3-76B-AWQ	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-Llama3-76B-AWQ
deepseek-janus-1_3b	deepseek-ai/Janus-1.3B	^(language_model\|aligner)(?!.(lm_head\|output\|emb\|wte\|shared)).	deepseek-janus	✔	✘	✘	✘		vision	deepseek-ai/Janus-1.3B
deepseek-vl-1_3b-chat	deepseek-ai/deepseek-vl-1.3b-chat	^(language_model\|aligner)(?!.(lm_head\|output\|emb\|wte\|shared)).	deepseek-vl	✔	✘	✔	✘		vision	deepseek-ai/deepseek-vl-1.3b-chat
deepseek-vl-7b-chat	deepseek-ai/deepseek-vl-7b-chat	^(language_model\|aligner)(?!.(lm_head\|output\|emb\|wte\|shared)).	deepseek-vl	✔	✘	✔	✘		vision	deepseek-ai/deepseek-vl-7b-chat
ovis1_6-gemma2-9b	AIDC-AI/Ovis1.6-Gemma2-9B	^(llm)(?!.(lm_head\|output\|emb\|wte\|shared)).	ovis1_6	✔	✘	✘	✘	transformers>=4.42	vision	AIDC-AI/Ovis1.6-Gemma2-9B
paligemma-3b-pt-224	AI-ModelScope/paligemma-3b-pt-224	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	paligemma	✔	✔	✘	✘	transformers>=4.41	vision	google/paligemma-3b-pt-224
paligemma-3b-pt-448	AI-ModelScope/paligemma-3b-pt-448	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	paligemma	✔	✔	✘	✘	transformers>=4.41	vision	google/paligemma-3b-pt-448
paligemma-3b-pt-896	AI-ModelScope/paligemma-3b-pt-896	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	paligemma	✔	✔	✘	✘	transformers>=4.41	vision	google/paligemma-3b-pt-896
paligemma-3b-mix-224	AI-ModelScope/paligemma-3b-mix-224	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	paligemma	✔	✔	✘	✘	transformers>=4.41	vision	google/paligemma-3b-mix-224
paligemma-3b-mix-448	AI-ModelScope/paligemma-3b-mix-448	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	paligemma	✔	✔	✘	✘	transformers>=4.41	vision	google/paligemma-3b-mix-448
minicpm-v-3b-chat	OpenBMB/MiniCPM-V	^(llm\|resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	minicpm-v	✔	✘	✘	✘	timm, transformers<4.42	vision	openbmb/MiniCPM-V
minicpm-v-v2-chat	OpenBMB/MiniCPM-V-2	^(llm\|resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	minicpm-v	✔	✘	✘	✘	timm, transformers<4.42	vision	openbmb/MiniCPM-V-2
minicpm-v-v2_5-chat	OpenBMB/MiniCPM-Llama3-V-2_5	^(llm\|resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	minicpm-v-v2_5	✔	✔	✘	✘	timm, transformers>=4.36	vision	openbmb/MiniCPM-Llama3-V-2_5
minicpm-v-v2_6-chat	OpenBMB/MiniCPM-V-2_6	^(llm\|resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	minicpm-v-v2_6	✔	✔	✘	✘	timm, transformers>=4.36	vision, video	openbmb/MiniCPM-V-2_6
pixtral-12b	AI-ModelScope/pixtral-12b	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	pixtral	✘	✘	✘	✘	transformers>=4.45	vision	mistral-community/pixtral-12b
mplug-owl2-chat	iic/mPLUG-Owl2	q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1	mplug-owl2	✔	✘	✘	✘	transformers<4.35, icecream	vision	MAGAer13/mplug-owl2-llama2-7b
mplug-owl2_1-chat	iic/mPLUG-Owl2.1	c_attn.multiway.0, c_attn.multiway.1	mplug-owl2	✔	✘	✘	✘	transformers<4.35, icecream	vision	Mizukiluke/mplug_owl_2_1
mplug-owl3-1b-chat	iic/mPLUG-Owl3-1B-241014	^(language_model\|vision2text_model)(?!.(lm_head\|output\|emb\|wte\|shared)).	mplug_owl3	✔	✘	✘	✘	transformers>=4.36, icecream	vision, video	mPLUG/mPLUG-Owl3-1B-241014
mplug-owl3-2b-chat	iic/mPLUG-Owl3-2B-241014	^(language_model\|vision2text_model)(?!.(lm_head\|output\|emb\|wte\|shared)).	mplug_owl3	✔	✘	✘	✘	transformers>=4.36, icecream	vision, video	mPLUG/mPLUG-Owl3-2B-241014
mplug-owl3-7b-chat	iic/mPLUG-Owl3-7B-240728	^(language_model\|vision2text_model)(?!.(lm_head\|output\|emb\|wte\|shared)).	mplug_owl3	✔	✘	✘	✘	transformers>=4.36, icecream	vision, video	mPLUG/mPLUG-Owl3-7B-240728
mplug-owl3v-7b-chat	iic/mPLUG-Owl3-7B-241101	^(language_model\|vision2text_model)(?!.(lm_head\|output\|emb\|wte\|shared)).	mplug_owl3v	✔	✘	✘	✘	transformers>=4.36, icecream	vision, video	mPLUG/mPLUG-Owl3-7B-241101
phi3-vision-128k-instruct	LLM-Research/Phi-3-vision-128k-instruct	^(model.layers\|model.vision_embed_tokens.img_projection)(?!.(lm_head\|output\|emb\|wte\|shared)).	phi3-vl	✔	✔	✘	✘	transformers>=4.36	vision	microsoft/Phi-3-vision-128k-instruct
phi3_5-vision-instruct	LLM-Research/Phi-3.5-vision-instruct	^(model.layers\|model.vision_embed_tokens.img_projection)(?!.(lm_head\|output\|emb\|wte\|shared)).	phi3-vl	✔	✔	✘	✘	transformers>=4.36	vision	microsoft/Phi-3.5-vision-instruct
cogvlm-17b-chat	ZhipuAI/cogvlm-chat	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	cogvlm	✘	✘	✘	✘	transformers<4.42	vision	THUDM/cogvlm-chat-hf
cogvlm2-19b-chat	ZhipuAI/cogvlm2-llama3-chinese-chat-19B	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	cogvlm	✘	✘	✔	✘	transformers<4.42	vision	THUDM/cogvlm2-llama3-chinese-chat-19B
cogvlm2-en-19b-chat	ZhipuAI/cogvlm2-llama3-chat-19B	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	cogvlm	✘	✘	✔	✘	transformers<4.42	vision	THUDM/cogvlm2-llama3-chat-19B
cogvlm2-video-13b-chat	ZhipuAI/cogvlm2-video-llama3-chat	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	cogvlm2-video	✘	✘	✘	✘	decord, pytorchvideo, transformers>=4.42	vision, video	THUDM/cogvlm2-video-llama3-chat
cogagent-18b-chat	ZhipuAI/cogagent-chat	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	cogagent-chat	✘	✘	✘	✘	timm	vision	THUDM/cogagent-chat-hf
cogagent-18b-instruct	ZhipuAI/cogagent-vqa	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	cogagent-instruct	✘	✘	✘	✘	timm	vision	THUDM/cogagent-vqa-hf
molmoe-1b	LLM-Research/MolmoE-1B-0924	^(model.transformer)(?!.(lm_head\|output\|emb\|wte\|shared)).	molmo	✔	✘	✘	✘	transformers>=4.45.0	vision	allenai/MolmoE-1B-0924
molmo-7b-o	LLM-Research/Molmo-7B-O-0924	^(model.transformer)(?!.(lm_head\|output\|emb\|wte\|shared)).	molmo	✔	✘	✘	✘	transformers>=4.45.0	vision	allenai/Molmo-7B-O-0924
molmo-7b-d	LLM-Research/Molmo-7B-D-0924	^(model.transformer)(?!.(lm_head\|output\|emb\|wte\|shared)).	molmo	✔	✘	✘	✘	transformers>=4.45.0	vision	allenai/Molmo-7B-D-0924
molmo-72b	LLM-Research/Molmo-72B-0924	^(model.transformer)(?!.(lm_head\|output\|emb\|wte\|shared)).	molmo	✔	✘	✘	✘	transformers>=4.45.0	vision	allenai/Molmo-72B-0924
emu3-chat	BAAI/Emu3-Chat	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	emu3-chat	✔	✘	✘	✘	transformers>=4.44.0	vision	BAAI/Emu3-Chat
florence-2-base	AI-ModelScope/Florence-2-base	^(language_model\|image_projection)(?!.(lm_head\|output\|emb\|wte\|shared)).	florence	✔	✘	✘	✘		vision	microsoft/Florence-2-base
florence-2-base-ft	AI-ModelScope/Florence-2-base-ft	^(language_model\|image_projection)(?!.(lm_head\|output\|emb\|wte\|shared)).	florence	✔	✘	✘	✘		vision	microsoft/Florence-2-base-ft
florence-2-large	AI-ModelScope/Florence-2-large	^(language_model\|image_projection)(?!.(lm_head\|output\|emb\|wte\|shared)).	florence	✔	✘	✘	✘		vision	microsoft/Florence-2-large
florence-2-large-ft	AI-ModelScope/Florence-2-large-ft	^(language_model\|image_projection)(?!.(lm_head\|output\|emb\|wte\|shared)).	florence	✔	✘	✘	✘		vision	microsoft/Florence-2-large-ft
got-ocr2	stepfun-ai/GOT-OCR2_0	^(model.layers\|model.mm_projector_vary)(?!.(lm_head\|output\|emb\|wte\|shared)).	got_ocr2	✔	✘	✘	✘		audio	stepfun-ai/GOT-OCR2_0

Datasets

The table below introduces the datasets supported by SWIFT:

Dataset Name: The dataset name registered in SWIFT.
Dataset ID: The dataset id in ModelScope.
Size: The data row count of the dataset.
Statistic: Dataset statistics. We use the number of tokens for statistics, which helps adjust the max_length hyperparameter. We concatenate the training and validation sets of the dataset and then compute the statistics. We use qwen’s tokenizer to tokenize the dataset. Different tokenizers produce different statistics. If you want to obtain token statistics for tokenizers of other models, you can use the script to get them yourself.

Dataset Name	Dataset ID	Subsets	Dataset Size	Statistic (token)	Tags	HF Dataset ID
🔥ms-bench	iic/ms_bench		316820	346.9±443.2, min=22, max=30960	chat, general, multi-round	-
🔥alpaca-en	AI-ModelScope/alpaca-gpt4-data-en		52002	176.2±125.8, min=26, max=740	chat, general	vicgalle/alpaca-gpt4
🔥alpaca-zh	AI-ModelScope/alpaca-gpt4-data-zh		48818	162.1±93.9, min=26, max=856	chat, general	llm-wizard/alpaca-gpt4-data-zh
multi-alpaca	damo/nlp_polylm_multialpaca_sft	ar de es fr id ja ko pt ru th vi	131867	112.9±50.6, min=26, max=1226	chat, general, multilingual	-
instinwild	wyj123456/instinwild	default subset	103695	145.4±60.7, min=28, max=1434	-	-
cot-en	YorickHe/CoT		74771	122.7±64.8, min=51, max=8320	chat, general	-
cot-zh	YorickHe/CoT_zh		74771	117.5±70.8, min=43, max=9636	chat, general	-
instruct-en	wyj123456/instruct		888970	269.1±331.5, min=26, max=7254	chat, general	-
firefly-zh	AI-ModelScope/firefly-train-1.1M		1649399	178.1±260.4, min=26, max=12516	chat, general	YeungNLP/firefly-train-1.1M
gpt4all-en	wyj123456/GPT4all		806199	302.7±384.5, min=27, max=7391	chat, general	-
sharegpt	swift/sharegpt	common-zh computer-zh unknow-zh common-en computer-en	96566	933.3±864.8, min=21, max=66412	chat, general, multi-round	-
tulu-v2-sft-mixture	AI-ModelScope/tulu-v2-sft-mixture		5119	520.7±437.6, min=68, max=2549	chat, multilingual, general, multi-round	allenai/tulu-v2-sft-mixture
wikipedia-zh	AI-ModelScope/wikipedia-cn-20230720-filtered		254547	568.4±713.2, min=37, max=78678	text-generation, general, pretrained	pleisto/wikipedia-cn-20230720-filtered
open-orca	AI-ModelScope/OpenOrca		994896	382.3±417.4, min=31, max=8740	chat, multilingual, general	-
🔥sharegpt-gpt4	AI-ModelScope/sharegpt_gpt4	default V3_format zh_38K_format	72684	1047.6±1313.1, min=22, max=66412	chat, multilingual, general, multi-round, gpt4	-
deepctrl-sft	AI-ModelScope/deepctrl-sft-data	default en	14149024	389.8±628.6, min=21, max=626237	chat, general, sft, multi-round	-
🔥coig-cqia	AI-ModelScope/COIG-CQIA	chinese_traditional coig_pc exam finance douban human_value logi_qa ruozhiba segmentfault wiki wikihow xhs zhihu	44694	703.8±654.2, min=33, max=19288	general	-
🔥ruozhiba	AI-ModelScope/ruozhiba	post-annual title-good title-norm	85658	39.9±13.1, min=21, max=559	pretrain	-
long-alpaca-12k	AI-ModelScope/LongAlpaca-12k		11998	9619.0±8295.8, min=36, max=78925	longlora, QA	Yukang/LongAlpaca-12k
lmsys-chat-1m	AI-ModelScope/lmsys-chat-1m		-	Dataset is too huge, please click the original link to view the dataset stat.	chat, em	lmsys/lmsys-chat-1m
🔥ms-agent	iic/ms_agent		26336	650.9±217.2, min=209, max=2740	chat, agent, multi-round	-
🔥ms-agent-for-agentfabric	AI-ModelScope/ms_agent_for_agentfabric	default addition	30000	617.8±199.1, min=251, max=2657	chat, agent, multi-round	-
ms-agent-multirole	iic/MSAgent-MultiRole		9500	447.6±84.9, min=145, max=1101	chat, agent, multi-round, role-play, multi-agent	-
🔥toolbench-for-alpha-umi	shenweizhou/alpha-umi-toolbench-processed-v2	backbone caller planner summarizer	1448337	1439.7±853.9, min=123, max=18467	chat, agent	-
damo-agent-zh	damo/MSAgent-Bench		386984	956.5±407.3, min=326, max=19001	chat, agent, multi-round	-
damo-agent-zh-mini	damo/MSAgent-Bench		20845	1326.4±329.6, min=571, max=4304	chat, agent, multi-round	-
agent-instruct-all-en	huangjintao/AgentInstruct_copy	alfworld db kg mind2web os webshop	1866	1144.3±635.5, min=206, max=6412	chat, agent, multi-round	-
🔥msagent-pro	iic/MSAgent-Pro		21905	1524.5±921.3, min=64, max=16770	chat, agent, multi-round	-
toolbench	swift/ToolBench		124345	3669.5±1600.9, min=1047, max=22581	chat, agent, multi-round	-
code-alpaca-en	wyj123456/code_alpaca_en		20016	100.2±60.1, min=29, max=1776	-	sahil2801/CodeAlpaca-20k
🔥leetcode-python-en	AI-ModelScope/leetcode-solutions-python		2359	727.1±235.9, min=259, max=2146	chat, coding	-
🔥codefuse-python-en	codefuse-ai/CodeExercise-Python-27k		27224	483.6±193.9, min=45, max=3082	chat, coding	-
🔥codefuse-evol-instruction-zh	codefuse-ai/Evol-instruction-66k		66862	439.6±206.3, min=37, max=2983	chat, coding	-
medical-en	swift/medical_zh	en	117617	257.4±89.1, min=36, max=2564	chat, medical	-
medical-zh	swift/medical_zh	zh	1950972	167.2±219.7, min=26, max=27351	chat, medical	-
🔥disc-med-sft-zh	AI-ModelScope/DISC-Med-SFT		441767	354.1±193.1, min=25, max=2231	chat, medical	Flmc/DISC-Med-SFT
lawyer-llama-zh	AI-ModelScope/lawyer_llama_data		21476	194.4±91.7, min=27, max=924	chat, law	Skepsun/lawyer_llama_data
tigerbot-law-zh	AI-ModelScope/tigerbot-law-plugin		55895	109.9±126.4, min=37, max=18878	text-generation, law, pretrained	TigerResearch/tigerbot-law-plugin
🔥disc-law-sft-zh	AI-ModelScope/DISC-Law-SFT		166758	533.7±495.4, min=30, max=15169	chat, law	ShengbinYue/DISC-Law-SFT
🔥blossom-math-zh	AI-ModelScope/blossom-math-v2		10000	169.3±58.7, min=35, max=563	chat, math	Azure99/blossom-math-v2
school-math-zh	AI-ModelScope/school_math_0.25M		248480	157.7±72.2, min=33, max=3450	chat, math, quality	BelleGroup/school_math_0.25M
open-platypus-en	AI-ModelScope/Open-Platypus		24926	367.9±254.8, min=30, max=3951	chat, math, quality	garage-bAInd/Open-Platypus
text2sql-en	AI-ModelScope/texttosqlv2_25000_v2		25000	274.6±326.4, min=38, max=1975	chat, sql	Clinton/texttosqlv2_25000_v2
🔥sql-create-context-en	AI-ModelScope/sql-create-context		78577	80.2±17.8, min=36, max=456	chat, sql	b-mc2/sql-create-context
synthetic-text-to-sql	AI-ModelScope/synthetic_text_to_sql	default	100000	283.4±115.8, min=61, max=1356	nl2sql, en	gretelai/synthetic_text_to_sql
🔥advertise-gen-zh	lvjianjin/AdvertiseGen		98399	130.6±21.7, min=51, max=241	text-generation	shibing624/AdvertiseGen
🔥dureader-robust-zh	modelscope/DuReader_robust-QG		17899	241.1±137.4, min=60, max=1416	text-generation	-
cmnli-zh	modelscope/clue	cmnli	404024	82.6±16.6, min=51, max=199	text-generation, classification	clue
🔥jd-sentiment-zh	DAMO_NLP/jd		50000	66.0±83.2, min=39, max=4039	text-generation, classification	-
🔥hc3-zh	simpleai/HC3-Chinese	baike open_qa nlpcc_dbqa finance medicine law psychology	39781	176.8±81.5, min=57, max=3051	text-generation, classification	Hello-SimpleAI/HC3-Chinese
🔥hc3-en	simpleai/HC3	finance medicine	11021	298.3±138.7, min=65, max=2267	text-generation, classification	Hello-SimpleAI/HC3
dolly-15k	AI-ModelScope/databricks-dolly-15k	default	15011	199.2±267.8, min=22, max=8615	multi-task, en, quality	databricks/databricks-dolly-15k
zhihu-kol	OmniData/Zhihu-KOL	default	-	Dataset is too huge, please click the original link to view the dataset stat.	zhihu, qa	wangrui6/Zhihu-KOL
zhihu-kol-filtered	OmniData/Zhihu-KOL-More-Than-100-Upvotes	default	271261	952.0±1727.2, min=25, max=98658	zhihu, qa	bzb2023/Zhihu-KOL-More-Than-100-Upvotes
finance-en	wyj123456/finance_en		68911	135.6±134.3, min=26, max=3525	chat, financial	ssbuild/alpaca_finance_en
poetry-zh	modelscope/chinese-poetry-collection		390309	55.2±9.4, min=23, max=83	text-generation, poetry	-
webnovel-zh	AI-ModelScope/webnovel_cn		50000	1478.9±11526.1, min=100, max=490484	chat, novel	zxbsmk/webnovel_cn
generated-chat-zh	AI-ModelScope/generated_chat_0.4M		396004	273.3±52.0, min=32, max=873	chat, character-dialogue	BelleGroup/generated_chat_0.4M
🔥self-cognition	swift/self-cognition		134	53.6±18.6, min=29, max=121	chat, self-cognition	modelscope/self-cognition
🔥swift-mix	swift/swift-sft-mixture	sharegpt firefly codefuse metamathqa	-	Dataset is too huge, please click the original link to view the dataset stat.	chat, sft, general	-
cls-fudan-news-zh	damo/zh_cls_fudan-news		4959	3234.4±2547.5, min=91, max=19548	chat, classification	-
ner-jave-zh	damo/zh_ner-JAVE		1266	118.3±45.5, min=44, max=223	chat, ner	-
coco-en	modelscope/coco_2014_caption	coco_2014_caption	454617	299.8±2.8, min=295, max=352	chat, multi-modal, vision	-
🔥coco-en-mini	modelscope/coco_2014_caption	coco_2014_caption	40504	299.8±2.6, min=295, max=338	chat, multi-modal, vision	-
coco-en-2	modelscope/coco_2014_caption	coco_2014_caption	454617	36.8±2.8, min=32, max=89	chat, multi-modal, vision	-
🔥coco-en-2-mini	modelscope/coco_2014_caption	coco_2014_caption	40504	36.8±2.6, min=32, max=75	chat, multi-modal, vision	-
capcha-images	AI-ModelScope/captcha-images		8000	31.0±0.0, min=31, max=31	chat, multi-modal, vision	-
latex-ocr-print	AI-ModelScope/LaTeX_OCR	default	17918	362.7±34.8, min=294, max=528	chat, ocr, multi-modal, vision	linxy/LaTeX_OCR
latex-ocr-handwrite	AI-ModelScope/LaTeX_OCR	synthetic_handwrite	95424	375.1±59.4, min=292, max=2115	chat, ocr, multi-modal, vision	linxy/LaTeX_OCR
aishell1-zh	speech_asr/speech_asr_aishell1_trainsets		141600	152.2±36.8, min=63, max=419	chat, multi-modal, audio	-
🔥aishell1-zh-mini	speech_asr/speech_asr_aishell1_trainsets		14526	152.2±35.6, min=74, max=359	chat, multi-modal, audio	-
🔥video-chatgpt	swift/VideoChatGPT	Generic Temporal Consistency	3206	88.4±48.3, min=32, max=399	chat, multi-modal, video	lmms-lab/VideoChatGPT
egoschema	AI-ModelScope/egoschema	Subset	101	191.6±80.7, min=96, max=435	chat, multi-modal, video	lmms-lab/egoschema
llava-video-178k	lmms-lab/LLaVA-Video-178K	0_30_s_academic_v0_1 0_30_s_youtube_v0_1 1_2_m_academic_v0_1 1_2_m_youtube_v0_1 2_3_m_academic_v0_1 2_3_m_youtube_v0_1 30_60_s_academic_v0_1 30_60_s_youtube_v0_1	-	Dataset is too huge, please click the original link to view the dataset stat.	chat, multi-modal, video	lmms-lab/LLaVA-Video-178K
moviechat-1k-test	AI-ModelScope/MovieChat-1K-test		486	36.1±4.3, min=27, max=42	chat, multi-modal, video	Enxin/MovieChat-1K-test
hh-rlhf	AI-ModelScope/hh-rlhf	harmless-base helpful-base helpful-online helpful-rejection-sampled	127459	245.4±190.7, min=22, max=1999	rlhf, dpo, pairwise	-
🔥hh-rlhf-cn	AI-ModelScope/hh_rlhf_cn	hh_rlhf harmless_base_cn harmless_base_en helpful_base_cn helpful_base_en	355920	171.2±122.7, min=22, max=3078	rlhf, dpo, pairwise	-
orpo-dpo-mix-40k	AI-ModelScope/orpo-dpo-mix-40k	default	43666	548.3±397.4, min=28, max=8483	dpo, orpo, en, quality	mlabonne/orpo-dpo-mix-40k
stack-exchange-paired	AI-ModelScope/stack-exchange-paired		4483004	534.5±594.6, min=31, max=56588	hfrl, dpo, pairwise	lvwerra/stack-exchange-paired
shareai-llama3-dpo-zh-en-emoji	hjh0119/shareAI-Llama3-DPO-zh-en-emoji	default	2449	334.0±162.8, min=36, max=1801	rlhf, dpo, pairwise	-
ultrafeedback-kto	AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto	default	230720	11.0±0.0, min=11, max=11	rlhf, kto	-
rlaif-v	swift/RLAIF-V-Dataset	default	83132	119.8±52.6, min=28, max=556	rlhf, dpo, multi-modal, en	openbmb/RLAIF-V-Dataset
pileval	swift/pile-val-backup		214670	1612.3±8856.2, min=11, max=1208955	text-generation, awq	mit-han-lab/pile-val-backup
mantis-instruct	swift/Mantis-Instruct	birds-to-words chartqa coinstruct contrastive_caption docvqa dreamsim dvqa iconqa imagecode llava_665k_multi lrv_multi multi_vqa nextqa nlvr2 spot-the-diff star visual_story_telling	655351	825.7±812.5, min=284, max=13563	chat, multi-modal, vision, quality	TIGER-Lab/Mantis-Instruct
llava-data-instruct	swift/llava-data	llava_instruct	364100	189.0±142.1, min=33, max=5183	sft, multi-modal, quality	TIGER-Lab/llava-data
midefics	swift/MideficsDataset		3800	201.3±70.2, min=60, max=454	medical, en, vqa	WinterSchool/MideficsDataset
gqa	None	train_all_instructions	-	Dataset is too huge, please click the original link to view the dataset stat.	multi-modal, en, vqa, quality	lmms-lab/GQA
text-caps	swift/TextCaps		18145	38.2±4.4, min=31, max=73	multi-modal, en, caption, quality	HuggingFaceM4/TextCaps
refcoco-unofficial-caption	swift/refcoco		46215	44.7±3.2, min=36, max=71	multi-modal, en, caption	jxu124/refcoco
refcoco-unofficial-grounding	swift/refcoco		46215	45.2±3.1, min=37, max=69	multi-modal, en, grounding	jxu124/refcoco
refcocog-unofficial-caption	swift/refcocog		44799	49.7±4.7, min=37, max=88	multi-modal, en, caption	jxu124/refcocog
refcocog-unofficial-grounding	swift/refcocog		44799	50.1±4.7, min=37, max=90	multi-modal, en, grounding	jxu124/refcocog
a-okvqa	swift/A-OKVQA		18201	45.8±7.9, min=32, max=100	multi-modal, en, vqa, quality	HuggingFaceM4/A-OKVQA
okvqa	swift/OK-VQA_train		9009	34.4±3.3, min=28, max=59	multi-modal, en, vqa, quality	Multimodal-Fatima/OK-VQA_train
ocr-vqa	swift/OCR-VQA		186753	35.6±6.6, min=29, max=193	multi-modal, en, ocr-vqa	howard-hou/OCR-VQA
grit	swift/GRIT		-	Dataset is too huge, please click the original link to view the dataset stat.	multi-modal, en, caption-grounding, quality	zzliang/GRIT
llava-instruct-mix	swift/llava-instruct-mix-vsft		13640	179.8±120.2, min=30, max=962	multi-modal, en, vqa, quality	HuggingFaceH4/llava-instruct-mix-vsft
lnqa	swift/lnqa		-	Dataset is too huge, please click the original link to view the dataset stat.	multi-modal, en, ocr-vqa, quality	vikhyatk/lnqa
science-qa	swift/ScienceQA		8315	100.3±59.5, min=38, max=638	multi-modal, science, vqa, quality	derek-thomas/ScienceQA
guanaco	AI-ModelScope/GuanacoDataset	default	31561	250.1±70.3, min=89, max=1436	chat, zh	JosephusCheung/GuanacoDataset
mind2web	swift/Multimodal-Mind2Web		1009	297522.4±325496.2, min=8592, max=3499715	agent, multi-modal	osunlp/Multimodal-Mind2Web
sharegpt-4o-image	AI-ModelScope/ShareGPT-4o	image_caption	57289	638.7±157.9, min=47, max=4640	vqa, multi-modal	OpenGVLab/ShareGPT-4o
pixelprose	swift/pixelprose		-	Dataset is too huge, please click the original link to view the dataset stat.	caption, multi-modal, vision	tomg-group-umd/pixelprose
m3it	AI-ModelScope/M3IT	coco vqa-v2 shapes shapes-rephrased coco-goi-rephrased snli-ve snli-ve-rephrased okvqa a-okvqa viquae textcap docvqa science-qa imagenet imagenet-open-ended imagenet-rephrased coco-goi clevr clevr-rephrased nlvr coco-itm coco-itm-rephrased vsr vsr-rephrased mocheg mocheg-rephrased coco-text fm-iqa activitynet-qa msrvtt ss coco-cn refcoco refcoco-rephrased multi30k image-paragraph-captioning visual-dialog visual-dialog-rephrased iqa vcr visual-mrc ivqa msrvtt-qa msvd-qa gqa text-vqa ocr-vqa st-vqa flickr8k-cn	-	Dataset is too huge, please click the original link to view the dataset stat.	chat, multi-modal, vision	-
sharegpt4v	AI-ModelScope/ShareGPT4V	ShareGPT4V ShareGPT4V-PT	-	Dataset is too huge, please click the original link to view the dataset stat.	chat, multi-modal, vision	-
llava-instruct-150k	AI-ModelScope/LLaVA-Instruct-150K		624610	490.4±180.2, min=288, max=5438	chat, multi-modal, vision	-
llava-pretrain	AI-ModelScope/LLaVA-Pretrain	default	-	Dataset is too huge, please click the original link to view the dataset stat.	vqa, multi-modal, quality	liuhaotian/LLaVA-Pretrain
sa1b-dense-caption	Tongyi-DataEngine/SA1B-Dense-Caption		-	Dataset is too huge, please click the original link to view the dataset stat.	zh, multi-modal, vqa	-
sa1b-paired-caption	Tongyi-DataEngine/SA1B-Paired-Captions-Images		-	Dataset is too huge, please click the original link to view the dataset stat.	zh, multi-modal, vqa	-
alpaca-cleaned	AI-ModelScope/alpaca-cleaned		51760	177.9±126.4, min=26, max=1044	chat, general, bench, quality	yahma/alpaca-cleaned
aya-collection	swift/aya_collection	aya_dataset	202364	494.0±6911.3, min=21, max=3044268	multi-lingual, qa	CohereForAI/aya_collection
belle-generated-chat-0.4M	AI-ModelScope/generated_chat_0.4M		396004	273.3±52.0, min=32, max=873	common, zh	BelleGroup/generated_chat_0.4M
belle-math-0.25M	AI-ModelScope/school_math_0.25M		248480	157.7±72.2, min=33, max=3450	math, zh	BelleGroup/school_math_0.25M
belle-train-0.5M-CN	AI-ModelScope/train_0.5M_CN		519255	129.1±91.5, min=27, max=6507	common, zh, quality	BelleGroup/train_0.5M_CN
belle-train-1M-CN	AI-ModelScope/train_1M_CN		-	Dataset is too huge, please click the original link to view the dataset stat.	common, zh, quality	BelleGroup/train_1M_CN
belle-train-2M-CN	AI-ModelScope/train_2M_CN		-	Dataset is too huge, please click the original link to view the dataset stat.	common, zh, quality	BelleGroup/train_2M_CN
belle-train-3.5M-CN	swift/train_3.5M_CN		-	Dataset is too huge, please click the original link to view the dataset stat.	common, zh, quality	BelleGroup/train_3.5M_CN
c4	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	allenai/c4
chart-qa	swift/ChartQA		28299	43.1±5.5, min=29, max=77	en, vqa, quality	HuggingFaceM4/ChartQA
chinese-c4	swift/chinese-c4		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, zh, quality	shjwudp/chinese-c4
cinepile	swift/cinepile		-	Dataset is too huge, please click the original link to view the dataset stat.	vqa, en, youtube, video	tomg-group-umd/cinepile
classical-chinese-translate	swift/classical_chinese_translate		6655	344.0±76.4, min=61, max=815	chat, play-ground	-
codealpaca-20k	AI-ModelScope/CodeAlpaca-20k		20016	100.2±60.1, min=29, max=1776	code, en	HuggingFaceH4/CodeAlpaca_20K
cosmopedia	None	auto_math_text khanacademy openstax stanford stories web_samples_v1 web_samples_v2 wikihow	-	Dataset is too huge, please click the original link to view the dataset stat.	multi-domain, en, qa	HuggingFaceTB/cosmopedia
cosmopedia-100k	swift/cosmopedia-100k		100000	1024.5±243.1, min=239, max=2981	multi-domain, en, qa	HuggingFaceTB/cosmopedia-100k
dolma	swift/dolma	v1_7	-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	allenai/dolma
dolphin	swift/dolphin	flan1m-alpaca-uncensored flan5m-alpaca-uncensored	-	Dataset is too huge, please click the original link to view the dataset stat.	en	cognitivecomputations/dolphin
duet	AI-ModelScope/Duet-v0.5		5000	1157.4±189.3, min=657, max=2344	CoT, en	G-reen/Duet-v0.5
evol-instruct-v2	AI-ModelScope/WizardLM_evol_instruct_V2_196k		109184	480.9±333.1, min=26, max=4942	chat, en	WizardLM/WizardLM_evol_instruct_V2_196k
fineweb	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	HuggingFaceFW/fineweb
gen-qa	swift/GenQA		-	Dataset is too huge, please click the original link to view the dataset stat.	qa, quality, multi-task	tomg-group-umd/GenQA
github-code	swift/github-code		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	codeparrot/github-code
gpt4v-dataset	swift/gpt4v-dataset		12356	217.9±68.3, min=35, max=596	en, caption, multi-modal, quality	laion/gpt4v-dataset
guanaco-belle-merge	AI-ModelScope/guanaco_belle_merge_v1.0		693987	134.2±92.0, min=24, max=6507	QA, zh	Chinese-Vicuna/guanaco_belle_merge_v1.0
infinity-instruct	swift/Infinity-Instruct		-	Dataset is too huge, please click the original link to view the dataset stat.	qa, quality, multi-task	BAAI/Infinity-Instruct
llava-med-zh-instruct	swift/llava-med-zh-instruct-60k		56649	207.7±67.6, min=37, max=657	zh, medical, vqa	BUAADreamer/llava-med-zh-instruct-60k
🔥longwriter-6k	ZhipuAI/LongWriter-6k		6000	4887.2±2879.2, min=117, max=30354	long, chat, sft	THUDM/LongWriter-6k
🔥longwriter-6k-filtered	swift/longwriter-6k-filtered		666	4108.9±2636.9, min=1190, max=17050	long, chat, sft	-
math-instruct	AI-ModelScope/MathInstruct		262283	254.4±183.5, min=11, max=4383	math, cot, en, quality	TIGER-Lab/MathInstruct
math-plus	TIGER-Lab/MATH-plus	train	893929	287.1±158.7, min=24, max=2919	qa, math, en, quality	TIGER-Lab/MATH-plus
moondream2-coyo-5M	swift/moondream2-coyo-5M-captions		-	Dataset is too huge, please click the original link to view the dataset stat.	caption, pretrain, quality	isidentical/moondream2-coyo-5M-captions
no-robots	swift/no_robots		9485	298.7±246.4, min=40, max=6739	multi-task, quality, human-annotated	HuggingFaceH4/no_robots
open-hermes	swift/OpenHermes-2.5		-	Dataset is too huge, please click the original link to view the dataset stat.	cot, en, quality	teknium/OpenHermes-2.5
open-o1	AI-ModelScope/OpenO1-SFT	default	203579	615.5±659.6, min=11, max=27509	chat, general, o1	O1-OPEN/OpenO1-SFT
open-orca-chinese	AI-ModelScope/OpenOrca-Chinese		-	Dataset is too huge, please click the original link to view the dataset stat.	QA, zh, general, quality	yys/OpenOrca-Chinese
orca_dpo_pairs	swift/orca_dpo_pairs		12859	366.9±251.9, min=30, max=2010	rlhf, quality	Intel/orca_dpo_pairs
path-vqa	swift/path-vqa		19654	34.8±7.3, min=27, max=85	multi-modal, vqa, medical	flaviagiammarino/path-vqa
pile	AI-ModelScope/pile		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain	EleutherAI/pile
poison-mpts	iic/100PoisonMpts		906	150.6±80.8, min=39, max=656	poison-management, zh	-
🔥qwen2-pro-en	AI-ModelScope/Magpie-Qwen2-Pro-200K-English		200000	605.4±287.3, min=221, max=4267	chat, sft, en	Magpie-Align/Magpie-Qwen2-Pro-200K-English
🔥qwen2-pro-filtered	AI-ModelScope/Magpie-Qwen2-Pro-300K-Filtered		300000	555.8±286.6, min=148, max=4267	chat, sft	Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered
🔥qwen2-pro-zh	AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese		200000	446.2±246.4, min=74, max=4101	chat, sft, zh	Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
redpajama-data-1t	swift/RedPajama-Data-1T		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	togethercomputer/RedPajama-Data-1T
redpajama-data-v2	swift/RedPajama-Data-V2		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	togethercomputer/RedPajama-Data-V2
refinedweb	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	tiiuae/falcon-refinedweb
rwkv-pretrain-web	mapjack/openwebtext_dataset		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, zh, quality	-
sft-nectar	AI-ModelScope/SFT-Nectar		131192	396.4±272.1, min=44, max=10732	cot, en, quality	AstraMindAI/SFT-Nectar
skypile	AI-ModelScope/SkyPile-150B		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality, zh	Skywork/SkyPile-150B
slim-orca	swift/SlimOrca		517982	399.1±370.2, min=35, max=8756	quality, en	Open-Orca/SlimOrca
slim-pajama-627b	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	cerebras/SlimPajama-627B
starcoder	AI-ModelScope/starcoderdata		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	bigcode/starcoderdata
tagengo-gpt4	swift/tagengo-gpt4		78057	472.3±292.9, min=22, max=3521	chat, multi-lingual, quality	lightblue/tagengo-gpt4
the-stack	AI-ModelScope/the-stack		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	bigcode/the-stack
ultrachat-200k	swift/ultrachat_200k		207865	1195.4±573.7, min=76, max=4470	chat, en, quality	HuggingFaceH4/ultrachat_200k
vqa-v2	swift/VQAv2		443757	31.8±2.2, min=27, max=58	en, vqa, quality	HuggingFaceM4/VQAv2
web-instruct-sub	swift/WebInstructSub		-	Dataset is too huge, please click the original link to view the dataset stat.	qa, en, math, quality, multi-domain, science	TIGER-Lab/WebInstructSub
wikipedia	swift/wikipedia		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	wikipedia
wikipedia-cn-filtered	AI-ModelScope/wikipedia-cn-20230720-filtered		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	pleisto/wikipedia-cn-20230720-filtered
zhihu-rlhf	AI-ModelScope/zhihu_rlhf_3k		3460	594.5±365.9, min=31, max=1716	rlhf, dpo, zh	liyucheng/zhihu_rlhf_3k