Supported models and datasets

Table of Contents

Models

The table below introcudes all models supported by SWIFT:

  • Model List: The model_type information registered in SWIFT.

  • Default Lora Target Modules: Default lora_target_modules used by the model.

  • Default Template: Default template used by the model.

  • Support Flash Attn: Whether the model supports flash attention to accelerate sft and infer.

  • Support VLLM: Whether the model supports vllm to accelerate infer and deployment.

  • Requires: The extra requirements used by the model.

LLM

Model Type

Model ID

Default Lora Target Modules

Default Template

Support Flash Attn

Support vLLM

Support LMDeploy

Support Megatron

Requires

Tags

HF Model ID

qwen-1_8b

qwen/Qwen-1_8B

c_attn

default-generation

βœ”

βœ”

βœ”

✘

-

Qwen/Qwen-1_8B

qwen-1_8b-chat

qwen/Qwen-1_8B-Chat

c_attn

qwen

βœ”

βœ”

βœ”

✘

-

Qwen/Qwen-1_8B-Chat

qwen-1_8b-chat-int4

qwen/Qwen-1_8B-Chat-Int4

c_attn

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5

-

Qwen/Qwen-1_8B-Chat-Int4

qwen-1_8b-chat-int8

qwen/Qwen-1_8B-Chat-Int8

c_attn

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5

-

Qwen/Qwen-1_8B-Chat-Int8

qwen-7b

qwen/Qwen-7B

c_attn

default-generation

βœ”

βœ”

βœ”

✘

-

Qwen/Qwen-7B

qwen-7b-chat

qwen/Qwen-7B-Chat

c_attn

qwen

βœ”

βœ”

βœ”

✘

-

Qwen/Qwen-7B-Chat

qwen-7b-chat-int4

qwen/Qwen-7B-Chat-Int4

c_attn

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5

-

Qwen/Qwen-7B-Chat-Int4

qwen-7b-chat-int8

qwen/Qwen-7B-Chat-Int8

c_attn

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5

-

Qwen/Qwen-7B-Chat-Int8

qwen-14b

qwen/Qwen-14B

c_attn

default-generation

βœ”

βœ”

βœ”

✘

-

Qwen/Qwen-14B

qwen-14b-chat

qwen/Qwen-14B-Chat

c_attn

qwen

βœ”

βœ”

βœ”

✘

-

Qwen/Qwen-14B-Chat

qwen-14b-chat-int4

qwen/Qwen-14B-Chat-Int4

c_attn

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5

-

Qwen/Qwen-14B-Chat-Int4

qwen-14b-chat-int8

qwen/Qwen-14B-Chat-Int8

c_attn

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5

-

Qwen/Qwen-14B-Chat-Int8

qwen-72b

qwen/Qwen-72B

c_attn

default-generation

βœ”

βœ”

βœ”

✘

-

Qwen/Qwen-72B

qwen-72b-chat

qwen/Qwen-72B-Chat

c_attn

qwen

βœ”

βœ”

βœ”

✘

-

Qwen/Qwen-72B-Chat

qwen-72b-chat-int4

qwen/Qwen-72B-Chat-Int4

c_attn

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5

-

Qwen/Qwen-72B-Chat-Int4

qwen-72b-chat-int8

qwen/Qwen-72B-Chat-Int8

c_attn

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5

-

Qwen/Qwen-72B-Chat-Int8

modelscope-agent-7b

iic/ModelScope-Agent-7B

c_attn

modelscope-agent

βœ”

✘

✘

✘

-

-

modelscope-agent-14b

iic/ModelScope-Agent-14B

c_attn

modelscope-agent

βœ”

✘

✘

✘

-

-

qwen1half-0_5b

qwen/Qwen1.5-0.5B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen1.5-0.5B

qwen1half-1_8b

qwen/Qwen1.5-1.8B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen1.5-1.8B

qwen1half-4b

qwen/Qwen1.5-4B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen1.5-4B

qwen1half-7b

qwen/Qwen1.5-7B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen1.5-7B

qwen1half-14b

qwen/Qwen1.5-14B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen1.5-14B

qwen1half-32b

qwen/Qwen1.5-32B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen1.5-32B

qwen1half-72b

qwen/Qwen1.5-72B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen1.5-72B

qwen1half-110b

qwen/Qwen1.5-110B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen1.5-110B

codeqwen1half-7b

qwen/CodeQwen1.5-7B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/CodeQwen1.5-7B

qwen1half-moe-a2_7b

qwen/Qwen1.5-MoE-A2.7B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.40

moe

Qwen/Qwen1.5-MoE-A2.7B

qwen1half-0_5b-chat

qwen/Qwen1.5-0.5B-Chat

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat

qwen1half-1_8b-chat

qwen/Qwen1.5-1.8B-Chat

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat

qwen1half-4b-chat

qwen/Qwen1.5-4B-Chat

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat

qwen1half-7b-chat

qwen/Qwen1.5-7B-Chat

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat

qwen1half-14b-chat

qwen/Qwen1.5-14B-Chat

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat

qwen1half-32b-chat

qwen/Qwen1.5-32B-Chat

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen1.5-32B-Chat

qwen1half-72b-chat

qwen/Qwen1.5-72B-Chat

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat

qwen1half-110b-chat

qwen/Qwen1.5-110B-Chat

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen1.5-110B-Chat

qwen1half-moe-a2_7b-chat

qwen/Qwen1.5-MoE-A2.7B-Chat

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

transformers>=4.40

moe

Qwen/Qwen1.5-MoE-A2.7B-Chat

codeqwen1half-7b-chat

qwen/CodeQwen1.5-7B-Chat

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/CodeQwen1.5-7B-Chat

qwen1half-0_5b-chat-int4

qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4

qwen1half-1_8b-chat-int4

qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4

qwen1half-4b-chat-int4

qwen/Qwen1.5-4B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat-GPTQ-Int4

qwen1half-7b-chat-int4

qwen/Qwen1.5-7B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat-GPTQ-Int4

qwen1half-14b-chat-int4

qwen/Qwen1.5-14B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat-GPTQ-Int4

qwen1half-32b-chat-int4

qwen/Qwen1.5-32B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-32B-Chat-GPTQ-Int4

qwen1half-72b-chat-int4

qwen/Qwen1.5-72B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat-GPTQ-Int4

qwen1half-110b-chat-int4

qwen/Qwen1.5-110B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-110B-Chat-GPTQ-Int4

qwen1half-0_5b-chat-int8

qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8

qwen1half-1_8b-chat-int8

qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8

qwen1half-4b-chat-int8

qwen/Qwen1.5-4B-Chat-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat-GPTQ-Int8

qwen1half-7b-chat-int8

qwen/Qwen1.5-7B-Chat-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat-GPTQ-Int8

qwen1half-14b-chat-int8

qwen/Qwen1.5-14B-Chat-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat-GPTQ-Int8

qwen1half-72b-chat-int8

qwen/Qwen1.5-72B-Chat-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat-GPTQ-Int8

qwen1half-moe-a2_7b-chat-int4

qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

✘

✘

✘

auto_gptq>=0.5, transformers>=4.40

moe

Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4

qwen1half-0_5b-chat-awq

qwen/Qwen1.5-0.5B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-0.5B-Chat-AWQ

qwen1half-1_8b-chat-awq

qwen/Qwen1.5-1.8B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-1.8B-Chat-AWQ

qwen1half-4b-chat-awq

qwen/Qwen1.5-4B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-4B-Chat-AWQ

qwen1half-7b-chat-awq

qwen/Qwen1.5-7B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-7B-Chat-AWQ

qwen1half-14b-chat-awq

qwen/Qwen1.5-14B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-14B-Chat-AWQ

qwen1half-32b-chat-awq

qwen/Qwen1.5-32B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-32B-Chat-AWQ

qwen1half-72b-chat-awq

qwen/Qwen1.5-72B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-72B-Chat-AWQ

qwen1half-110b-chat-awq

qwen/Qwen1.5-110B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-110B-Chat-AWQ

codeqwen1half-7b-chat-awq

qwen/CodeQwen1.5-7B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/CodeQwen1.5-7B-Chat-AWQ

qwen2-0_5b

qwen/Qwen2-0.5B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-0.5B

qwen2-0_5b-instruct

qwen/Qwen2-0.5B-Instruct

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct

qwen2-0_5b-instruct-int4

qwen/Qwen2-0.5B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4

qwen2-0_5b-instruct-int8

qwen/Qwen2-0.5B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8

qwen2-0_5b-instruct-awq

qwen/Qwen2-0.5B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2-0.5B-Instruct-AWQ

qwen2-1_5b

qwen/Qwen2-1.5B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-1.5B

qwen2-1_5b-instruct

qwen/Qwen2-1.5B-Instruct

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-1.5B-Instruct

qwen2-1_5b-instruct-int4

qwen/Qwen2-1.5B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4

qwen2-1_5b-instruct-int8

qwen/Qwen2-1.5B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-1_5B-Instruct-GPTQ-Int8

qwen2-1_5b-instruct-awq

qwen/Qwen2-1.5B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2-1.5B-Instruct-AWQ

qwen2-7b

qwen/Qwen2-7B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-7B

qwen2-7b-instruct

qwen/Qwen2-7B-Instruct

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-7B-Instruct

qwen2-7b-instruct-int4

qwen/Qwen2-7B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-7B-Instruct-GPTQ-Int4

qwen2-7b-instruct-int8

qwen/Qwen2-7B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-7B-Instruct-GPTQ-Int8

qwen2-7b-instruct-awq

qwen/Qwen2-7B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2-7B-Instruct-AWQ

qwen2-72b

qwen/Qwen2-72B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-72B

qwen2-72b-instruct

qwen/Qwen2-72B-Instruct

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-72B-Instruct

qwen2-72b-instruct-int4

qwen/Qwen2-72B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-72B-Instruct-GPTQ-Int4

qwen2-72b-instruct-int8

qwen/Qwen2-72B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-72B-Instruct-GPTQ-Int8

qwen2-72b-instruct-awq

qwen/Qwen2-72B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2-72B-Instruct-AWQ

qwen2-57b-a14b

qwen/Qwen2-57B-A14B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.40

moe

Qwen/Qwen2-57B-A14B

qwen2-57b-a14b-instruct

qwen/Qwen2-57B-A14B-Instruct

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

transformers>=4.40

moe

Qwen/Qwen2-57B-A14B-Instruct

qwen2-57b-a14b-instruct-int4

qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.40

moe

Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4

qwen2-math-1_5b

qwen/Qwen2-Math-1.5B

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-Math-1.5B

qwen2-math-1_5b-instruct

qwen/Qwen2-Math-1.5B-Instruct

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-Math-1.5B-Instruct

qwen2-math-7b

qwen/Qwen2-Math-7B

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-Math-7B

qwen2-math-7b-instruct

qwen/Qwen2-Math-7B-Instruct

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-Math-7B-Instruct

qwen2-math-72b

qwen/Qwen2-Math-72B

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-Math-72B

qwen2-math-72b-instruct

qwen/Qwen2-Math-72B-Instruct

q_proj, k_proj, v_proj

qwen

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/Qwen2-Math-72B-Instruct

qwen2_5-0_5b

qwen/Qwen2.5-0.5B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-0.5B

qwen2_5-1_5b

qwen/Qwen2.5-1.5B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-1.5B

qwen2_5-3b

qwen/Qwen2.5-3B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-3B

qwen2_5-7b

qwen/Qwen2.5-7B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-7B

qwen2_5-14b

qwen/Qwen2.5-14B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-14B

qwen2_5-32b

qwen/Qwen2.5-32B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-32B

qwen2_5-72b

qwen/Qwen2.5-72B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-72B

qwen2_5-0_5b-instruct

qwen/Qwen2.5-0.5B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct

qwen2_5-1_5b-instruct

qwen/Qwen2.5-1.5B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct

qwen2_5-3b-instruct

qwen/Qwen2.5-3B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct

qwen2_5-7b-instruct

qwen/Qwen2.5-7B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct

qwen2_5-14b-instruct

qwen/Qwen2.5-14B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct

qwen2_5-32b-instruct

qwen/Qwen2.5-32B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct

qwen2_5-72b-instruct

qwen/Qwen2.5-72B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct

qwen2_5-0_5b-instruct-gptq-int4

qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4

qwen2_5-1_5b-instruct-gptq-int4

qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4

qwen2_5-3b-instruct-gptq-int4

qwen/Qwen2.5-3B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4

qwen2_5-7b-instruct-gptq-int4

qwen/Qwen2.5-7B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4

qwen2_5-14b-instruct-gptq-int4

qwen/Qwen2.5-14B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4

qwen2_5-32b-instruct-gptq-int4

qwen/Qwen2.5-32B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4

qwen2_5-72b-instruct-gptq-int4

qwen/Qwen2.5-72B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4

qwen2_5-0_5b-instruct-gptq-int8

qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8

qwen2_5-1_5b-instruct-gptq-int8

qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8

qwen2_5-3b-instruct-gptq-int8

qwen/Qwen2.5-3B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8

qwen2_5-7b-instruct-gptq-int8

qwen/Qwen2.5-7B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8

qwen2_5-14b-instruct-gptq-int8

qwen/Qwen2.5-14B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8

qwen2_5-32b-instruct-gptq-int8

qwen/Qwen2.5-32B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8

qwen2_5-72b-instruct-gptq-int8

qwen/Qwen2.5-72B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8

qwen2_5-0_5b-instruct-awq

qwen/Qwen2.5-0.5B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-0.5B-Instruct-AWQ

qwen2_5-1_5b-instruct-awq

qwen/Qwen2.5-1.5B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-1.5B-Instruct-AWQ

qwen2_5-3b-instruct-awq

qwen/Qwen2.5-3B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-3B-Instruct-AWQ

qwen2_5-7b-instruct-awq

qwen/Qwen2.5-7B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-7B-Instruct-AWQ

qwen2_5-14b-instruct-awq

qwen/Qwen2.5-14B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-14B-Instruct-AWQ

qwen2_5-32b-instruct-awq

qwen/Qwen2.5-32B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-32B-Instruct-AWQ

qwen2_5-72b-instruct-awq

qwen/Qwen2.5-72B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-72B-Instruct-AWQ

qwen2_5-math-1_5b

qwen/Qwen2.5-Math-1.5B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Math-1.5B

qwen2_5-math-7b

qwen/Qwen2.5-Math-7B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Math-7B

qwen2_5-math-72b

qwen/Qwen2.5-Math-72B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Math-72B

qwen2_5-math-1_5b-instruct

qwen/Qwen2.5-Math-1.5B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Math-1.5B-Instruct

qwen2_5-math-7b-instruct

qwen/Qwen2.5-Math-7B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Math-7B-Instruct

qwen2_5-math-72b-instruct

qwen/Qwen2.5-Math-72B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Math-72B-Instruct

qwen2_5-coder-0_5b

qwen/Qwen2.5-Coder-0.5B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Coder-0.5B

qwen2_5-coder-0_5b-instruct

qwen/Qwen2.5-Coder-0.5B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Coder-0.5B-Instruct

qwen2_5-coder-0_5b-instruct-gptq-int4

qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4

qwen2_5-coder-0_5b-instruct-gptq-int8

qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8

qwen2_5-coder-0_5b-instruct-awq

qwen/Qwen2.5-Coder-0.5B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-0.5B-Instruct-AWQ

qwen2_5-coder-1_5b

qwen/Qwen2.5-Coder-1.5B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Coder-1.5B

qwen2_5-coder-1_5b-instruct

qwen/Qwen2.5-Coder-1.5B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Coder-1.5B-Instruct

qwen2_5-coder-1_5b-instruct-gptq-int4

qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4

qwen2_5-coder-1_5b-instruct-gptq-int8

qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8

qwen2_5-coder-1_5b-instruct-awq

qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-1.5B-Instruct-AWQ

qwen2_5-coder-3b

qwen/Qwen2.5-Coder-3B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Coder-3B

qwen2_5-coder-3b-instruct

qwen/Qwen2.5-Coder-3B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Coder-3B-Instruct

qwen2_5-coder-3b-instruct-gptq-int4

qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4

qwen2_5-coder-3b-instruct-gptq-int8

qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8

qwen2_5-coder-3b-instruct-awq

qwen/Qwen2.5-Coder-3B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-3B-Instruct-AWQ

qwen2_5-coder-7b

qwen/Qwen2.5-Coder-7B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Coder-7B

qwen2_5-coder-7b-instruct

qwen/Qwen2.5-Coder-7B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Coder-7B-Instruct

qwen2_5-coder-7b-instruct-gptq-int4

qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4

qwen2_5-coder-7b-instruct-gptq-int8

qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8

qwen2_5-coder-7b-instruct-awq

qwen/Qwen2.5-Coder-7B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-7B-Instruct-AWQ

qwen2_5-coder-14b

qwen/Qwen2.5-Coder-14B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Coder-14B

qwen2_5-coder-14b-instruct

qwen/Qwen2.5-Coder-14B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Coder-14B-Instruct

qwen2_5-coder-14b-instruct-gptq-int4

qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4

qwen2_5-coder-14b-instruct-gptq-int8

qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8

qwen2_5-coder-14b-instruct-awq

qwen/Qwen2.5-Coder-14B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-14B-Instruct-AWQ

qwen2_5-coder-32b

qwen/Qwen2.5-Coder-32B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Coder-32B

qwen2_5-coder-32b-instruct

qwen/Qwen2.5-Coder-32B-Instruct

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

Qwen/Qwen2.5-Coder-32B-Instruct

qwen2_5-coder-32b-instruct-gptq-int4

qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4

qwen2_5-coder-32b-instruct-gptq-int8

qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

✘

✘

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8

qwen2_5-coder-32b-instruct-awq

qwen/Qwen2.5-Coder-32B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

βœ”

βœ”

βœ”

✘

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-32B-Instruct-AWQ

qwq-32b-preview

Qwen/QwQ-32B-Preview

q_proj, k_proj, v_proj

qwq

βœ”

βœ”

βœ”

βœ”

transformers>=4.37

-

Qwen/QwQ-32B-Preview

marco-o1

AIDC-AI/Marco-o1

q_proj, k_proj, v_proj

marco_o1

βœ”

βœ”

βœ”

✘

transformers>=4.37

-

AIDC-AI/Marco-o1

chatglm2-6b

ZhipuAI/chatglm2-6b

query_key_value

chatglm2

✘

βœ”

✘

✘

transformers<4.42

-

THUDM/chatglm2-6b

chatglm2-6b-32k

ZhipuAI/chatglm2-6b-32k

query_key_value

chatglm2

✘

βœ”

✘

✘

transformers<4.42

-

THUDM/chatglm2-6b-32k

chatglm3-6b-base

ZhipuAI/chatglm3-6b-base

query_key_value

chatglm-generation

✘

βœ”

✘

✘

transformers<4.42

-

THUDM/chatglm3-6b-base

chatglm3-6b

ZhipuAI/chatglm3-6b

query_key_value

chatglm3

✘

βœ”

✘

✘

transformers<4.42

-

THUDM/chatglm3-6b

chatglm3-6b-32k

ZhipuAI/chatglm3-6b-32k

query_key_value

chatglm3

✘

βœ”

✘

✘

transformers<4.42

-

THUDM/chatglm3-6b-32k

chatglm3-6b-128k

ZhipuAI/chatglm3-6b-128k

query_key_value

chatglm3

✘

βœ”

✘

✘

transformers<4.42

-

THUDM/chatglm3-6b-128k

codegeex2-6b

ZhipuAI/codegeex2-6b

query_key_value

chatglm-generation

✘

βœ”

✘

✘

transformers<4.34

coding

THUDM/codegeex2-6b

glm4-9b

ZhipuAI/glm-4-9b

query_key_value

chatglm-generation

βœ”

βœ”

βœ”

✘

transformers>=4.42

-

THUDM/glm-4-9b

glm4-9b-chat

ZhipuAI/glm-4-9b-chat

query_key_value

chatglm4

βœ”

βœ”

βœ”

✘

transformers>=4.42

-

THUDM/glm-4-9b-chat

glm4-9b-chat-1m

ZhipuAI/glm-4-9b-chat-1m

query_key_value

chatglm4

βœ”

βœ”

βœ”

✘

transformers>=4.42

-

THUDM/glm-4-9b-chat-1m

codegeex4-9b-chat

ZhipuAI/codegeex4-all-9b

query_key_value

codegeex4

βœ”

βœ”

βœ”

✘

transformers<4.42

coding

THUDM/codegeex4-all-9b

glm-edge-1_5b-chat

ZhipuAI/glm-edge-1.5b-chat

q_proj, k_proj, v_proj

chatglm4

βœ”

✘

✘

✘

transformers>=4.46

-

THUDM/glm-edge-1.5b-chat

glm-edge-4b-chat

ZhipuAI/glm-edge-4b-chat

q_proj, k_proj, v_proj

chatglm4

βœ”

✘

✘

✘

transformers>=4.46

-

THUDM/glm-edge-4b-chat

llama2-7b

modelscope/Llama-2-7b-ms

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

meta-llama/Llama-2-7b-hf

llama2-7b-chat

modelscope/Llama-2-7b-chat-ms

q_proj, k_proj, v_proj

llama

βœ”

βœ”

βœ”

✘

-

meta-llama/Llama-2-7b-chat-hf

llama2-13b

modelscope/Llama-2-13b-ms

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

meta-llama/Llama-2-13b-hf

llama2-13b-chat

modelscope/Llama-2-13b-chat-ms

q_proj, k_proj, v_proj

llama

βœ”

βœ”

βœ”

✘

-

meta-llama/Llama-2-13b-chat-hf

llama2-70b

modelscope/Llama-2-70b-ms

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

meta-llama/Llama-2-70b-hf

llama2-70b-chat

modelscope/Llama-2-70b-chat-ms

q_proj, k_proj, v_proj

llama

βœ”

βœ”

βœ”

✘

-

meta-llama/Llama-2-70b-chat-hf

llama2-7b-aqlm-2bit-1x16

AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf

q_proj, k_proj, v_proj

default-generation

βœ”

✘

✘

✘

transformers>=4.38, aqlm, torch>=2.2.0

-

ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf

llama3-8b

LLM-Research/Meta-Llama-3-8B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

meta-llama/Meta-Llama-3-8B

llama3-8b-instruct

LLM-Research/Meta-Llama-3-8B-Instruct

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

βœ”

✘

-

meta-llama/Meta-Llama-3-8B-Instruct

llama3-8b-instruct-int4

swift/Meta-Llama-3-8B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

✘

✘

auto_gptq

-

study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4

llama3-8b-instruct-int8

swift/Meta-Llama-3-8B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

✘

✘

auto_gptq

-

study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8

llama3-8b-instruct-awq

swift/Meta-Llama-3-8B-Instruct-AWQ

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

βœ”

✘

autoawq

-

study-hjt/Meta-Llama-3-8B-Instruct-AWQ

llama3-70b

LLM-Research/Meta-Llama-3-70B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

meta-llama/Meta-Llama-3-70B

llama3-70b-instruct

LLM-Research/Meta-Llama-3-70B-Instruct

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

βœ”

✘

-

meta-llama/Meta-Llama-3-70B-Instruct

llama3-70b-instruct-int4

swift/Meta-Llama-3-70B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

✘

✘

auto_gptq

-

study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4

llama3-70b-instruct-int8

swift/Meta-Llama-3-70b-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

✘

✘

auto_gptq

-

study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8

llama3-70b-instruct-awq

swift/Meta-Llama-3-70B-Instruct-AWQ

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

βœ”

✘

autoawq

-

study-hjt/Meta-Llama-3-70B-Instruct-AWQ

llama3_1-8b

LLM-Research/Meta-Llama-3.1-8B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-8B

llama3_1-8b-instruct

LLM-Research/Meta-Llama-3.1-8B-Instruct

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

βœ”

✘

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-8B-Instruct

llama3_1-8b-instruct-awq

LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

✘

✘

transformers>=4.43, autoawq

-

hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4

llama3_1-8b-instruct-gptq-int4

LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

✘

✘

transformers>=4.43, auto_gptq

-

hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4

llama3_1-8b-instruct-bnb

LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

✘

✘

transformers>=4.43, bitsandbytes

-

hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4

llama3_1-70b

LLM-Research/Meta-Llama-3.1-70B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-70B

llama3_1-70b-instruct

LLM-Research/Meta-Llama-3.1-70B-Instruct

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

βœ”

✘

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-70B-Instruct

llama3_1-70b-instruct-fp8

LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

✘

✘

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-70B-Instruct-FP8

llama3_1-70b-instruct-awq

LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

βœ”

✘

transformers>=4.43, autoawq

-

hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4

llama3_1-70b-instruct-gptq-int4

LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

✘

✘

transformers>=4.43, auto_gptq

-

hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4

llama3_1-70b-instruct-bnb

LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

✘

✘

transformers>=4.43, bitsandbytes

-

unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit

llama3_1-405b

LLM-Research/Meta-Llama-3.1-405B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-405B

llama3_1-405b-instruct

LLM-Research/Meta-Llama-3.1-405B-Instruct

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

βœ”

✘

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-405B-Instruct

llama3_1-405b-instruct-fp8

LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

✘

✘

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-405B-Instruct-FP8

llama3_1-405b-instruct-awq

LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

βœ”

✘

transformers>=4.43, autoawq

-

hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4

llama3_1-405b-instruct-gptq-int4

LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

✘

✘

transformers>=4.43, auto_gptq

-

hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4

llama3_1-405b-instruct-bnb

LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

✘

✘

transformers>=4.43, bitsandbytes

-

hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4

llama-3.1-nemotron-70B-instruct-hf

AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

βœ”

✘

transformers>=4.43

-

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

llama3_2-1b

LLM-Research/Llama-3.2-1B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.45

-

meta-llama/Llama-3.2-1B

llama3_2-1b-instruct

LLM-Research/Llama-3.2-1B-Instruct

q_proj, k_proj, v_proj

llama3_2

βœ”

βœ”

βœ”

✘

transformers>=4.45

-

meta-llama/Llama-3.2-1B-Instruct

llama3_2-3b

LLM-Research/Llama-3.2-3B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.45

-

meta-llama/Llama-3.2-3B

llama3_2-3b-instruct

LLM-Research/Llama-3.2-3B-Instruct

q_proj, k_proj, v_proj

llama3_2

βœ”

βœ”

βœ”

✘

transformers>=4.45

-

meta-llama/Llama-3.2-3B-Instruct

reflection-llama_3_1-70b

LLM-Research/Reflection-Llama-3.1-70B

q_proj, k_proj, v_proj

reflection

βœ”

βœ”

✘

✘

transformers>=4.43

-

mattshumer/Reflection-Llama-3.1-70B

longwriter-glm4-9b

ZhipuAI/LongWriter-glm4-9b

query_key_value

chatglm4

βœ”

βœ”

βœ”

✘

transformers>=4.42

-

THUDM/LongWriter-glm4-9b

longwriter-llama3_1-8b

ZhipuAI/LongWriter-llama3.1-8b

q_proj, k_proj, v_proj

longwriter-llama3

βœ”

βœ”

βœ”

✘

transformers>=4.43

-

THUDM/LongWriter-llama3.1-8b

chinese-llama-2-1_3b

AI-ModelScope/chinese-llama-2-1.3b

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

hfl/chinese-llama-2-1.3b

chinese-llama-2-7b

AI-ModelScope/chinese-llama-2-7b

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

hfl/chinese-llama-2-7b

chinese-llama-2-7b-16k

AI-ModelScope/chinese-llama-2-7b-16k

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

hfl/chinese-llama-2-7b-16k

chinese-llama-2-7b-64k

AI-ModelScope/chinese-llama-2-7b-64k

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

hfl/chinese-llama-2-7b-64k

chinese-llama-2-13b

AI-ModelScope/chinese-llama-2-13b

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

hfl/chinese-llama-2-13b

chinese-llama-2-13b-16k

AI-ModelScope/chinese-llama-2-13b-16k

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

hfl/chinese-llama-2-13b-16k

chinese-alpaca-2-1_3b

AI-ModelScope/chinese-alpaca-2-1.3b

q_proj, k_proj, v_proj

llama

βœ”

βœ”

βœ”

✘

-

hfl/chinese-alpaca-2-1.3b

chinese-alpaca-2-7b

AI-ModelScope/chinese-alpaca-2-7b

q_proj, k_proj, v_proj

llama

βœ”

βœ”

βœ”

✘

-

hfl/chinese-alpaca-2-7b

chinese-alpaca-2-7b-16k

AI-ModelScope/chinese-alpaca-2-7b-16k

q_proj, k_proj, v_proj

llama

βœ”

βœ”

βœ”

✘

-

hfl/chinese-alpaca-2-7b-16k

chinese-alpaca-2-7b-64k

AI-ModelScope/chinese-alpaca-2-7b-64k

q_proj, k_proj, v_proj

llama

βœ”

βœ”

βœ”

✘

-

hfl/chinese-alpaca-2-7b-64k

chinese-alpaca-2-13b

AI-ModelScope/chinese-alpaca-2-13b

q_proj, k_proj, v_proj

llama

βœ”

βœ”

βœ”

✘

-

hfl/chinese-alpaca-2-13b

chinese-alpaca-2-13b-16k

AI-ModelScope/chinese-alpaca-2-13b-16k

q_proj, k_proj, v_proj

llama

βœ”

βœ”

βœ”

✘

-

hfl/chinese-alpaca-2-13b-16k

llama-3-chinese-8b

ChineseAlpacaGroup/llama-3-chinese-8b

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

hfl/llama-3-chinese-8b

llama-3-chinese-8b-instruct

ChineseAlpacaGroup/llama-3-chinese-8b-instruct

q_proj, k_proj, v_proj

llama3

βœ”

βœ”

βœ”

✘

-

hfl/llama-3-chinese-8b-instruct

atom-7b

FlagAlpha/Atom-7B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

✘

✘

-

FlagAlpha/Atom-7B

atom-7b-chat

FlagAlpha/Atom-7B-Chat

q_proj, k_proj, v_proj

atom

βœ”

βœ”

✘

✘

-

FlagAlpha/Atom-7B-Chat

yi-6b

01ai/Yi-6B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-6B

yi-6b-200k

01ai/Yi-6B-200K

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-6B-200K

yi-6b-chat

01ai/Yi-6B-Chat

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-6B-Chat

yi-6b-chat-awq

01ai/Yi-6B-Chat-4bits

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

βœ”

✘

autoawq

-

01-ai/Yi-6B-Chat-4bits

yi-6b-chat-int8

01ai/Yi-6B-Chat-8bits

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

✘

✘

auto_gptq

-

01-ai/Yi-6B-Chat-8bits

yi-9b

01ai/Yi-9B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-9B

yi-9b-200k

01ai/Yi-9B-200K

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-9B-200K

yi-34b

01ai/Yi-34B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-34B

yi-34b-200k

01ai/Yi-34B-200K

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-34B-200K

yi-34b-chat

01ai/Yi-34B-Chat

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-34B-Chat

yi-34b-chat-awq

01ai/Yi-34B-Chat-4bits

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

βœ”

✘

autoawq

-

01-ai/Yi-34B-Chat-4bits

yi-34b-chat-int8

01ai/Yi-34B-Chat-8bits

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

✘

✘

auto_gptq

-

01-ai/Yi-34B-Chat-8bits

yi-1_5-6b

01ai/Yi-1.5-6B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-1.5-6B

yi-1_5-6b-chat

01ai/Yi-1.5-6B-Chat

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-1.5-6B-Chat

yi-1_5-9b

01ai/Yi-1.5-9B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-1.5-9B

yi-1_5-9b-chat

01ai/Yi-1.5-9B-Chat

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-1.5-9B-Chat

yi-1_5-9b-chat-16k

01ai/Yi-1.5-9B-Chat-16K

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-1.5-9B-Chat-16K

yi-1_5-34b

01ai/Yi-1.5-34B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-1.5-34B

yi-1_5-34b-chat

01ai/Yi-1.5-34B-Chat

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-1.5-34B-Chat

yi-1_5-34b-chat-16k

01ai/Yi-1.5-34B-Chat-16K

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-1.5-34B-Chat-16K

yi-1_5-6b-chat-awq-int4

AI-ModelScope/Yi-1.5-6B-Chat-AWQ

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

βœ”

✘

autoawq

-

modelscope/Yi-1.5-6B-Chat-AWQ

yi-1_5-6b-chat-gptq-int4

AI-ModelScope/Yi-1.5-6B-Chat-GPTQ

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

✘

✘

auto_gptq>=0.5

-

modelscope/Yi-1.5-6B-Chat-GPTQ

yi-1_5-9b-chat-awq-int4

AI-ModelScope/Yi-1.5-9B-Chat-AWQ

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

βœ”

✘

autoawq

-

modelscope/Yi-1.5-9B-Chat-AWQ

yi-1_5-9b-chat-gptq-int4

AI-ModelScope/Yi-1.5-9B-Chat-GPTQ

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

✘

✘

auto_gptq>=0.5

-

modelscope/Yi-1.5-9B-Chat-GPTQ

yi-1_5-34b-chat-awq-int4

AI-ModelScope/Yi-1.5-34B-Chat-AWQ

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

βœ”

✘

autoawq

-

modelscope/Yi-1.5-34B-Chat-AWQ

yi-1_5-34b-chat-gptq-int4

AI-ModelScope/Yi-1.5-34B-Chat-GPTQ

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

✘

✘

auto_gptq>=0.5

-

modelscope/Yi-1.5-34B-Chat-GPTQ

yi-coder-1_5b

01ai/Yi-Coder-1.5B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-Coder-1.5B

yi-coder-1_5b-chat

01ai/Yi-Coder-1.5B-Chat

q_proj, k_proj, v_proj

yi-coder

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-Coder-1.5B-Chat

yi-coder-9b

01ai/Yi-Coder-9B

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-Coder-9B

yi-coder-9b-chat

01ai/Yi-Coder-9B-Chat

q_proj, k_proj, v_proj

yi-coder

βœ”

βœ”

βœ”

✘

-

01-ai/Yi-Coder-9B-Chat

internlm-7b

Shanghai_AI_Laboratory/internlm-7b

q_proj, k_proj, v_proj

default-generation

✘

βœ”

βœ”

✘

-

internlm/internlm-7b

internlm-7b-chat

Shanghai_AI_Laboratory/internlm-chat-7b

q_proj, k_proj, v_proj

internlm

✘

βœ”

βœ”

✘

-

internlm/internlm-chat-7b

internlm-7b-chat-8k

Shanghai_AI_Laboratory/internlm-chat-7b-8k

q_proj, k_proj, v_proj

internlm

✘

βœ”

βœ”

✘

-

-

internlm-20b

Shanghai_AI_Laboratory/internlm-20b

q_proj, k_proj, v_proj

default-generation

✘

βœ”

βœ”

✘

-

internlm/internlm-20b

internlm-20b-chat

Shanghai_AI_Laboratory/internlm-chat-20b

q_proj, k_proj, v_proj

internlm

✘

βœ”

βœ”

✘

-

internlm/internlm-chat-20b

internlm2-1_8b

Shanghai_AI_Laboratory/internlm2-1_8b

wqkv

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2-1_8b

internlm2-1_8b-sft-chat

Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft

wqkv

internlm2

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2-chat-1_8b-sft

internlm2-1_8b-chat

Shanghai_AI_Laboratory/internlm2-chat-1_8b

wqkv

internlm2

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2-chat-1_8b

internlm2-7b-base

Shanghai_AI_Laboratory/internlm2-base-7b

wqkv

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2-base-7b

internlm2-7b

Shanghai_AI_Laboratory/internlm2-7b

wqkv

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2-7b

internlm2-7b-sft-chat

Shanghai_AI_Laboratory/internlm2-chat-7b-sft

wqkv

internlm2

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2-chat-7b-sft

internlm2-7b-chat

Shanghai_AI_Laboratory/internlm2-chat-7b

wqkv

internlm2

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2-chat-7b

internlm2-20b-base

Shanghai_AI_Laboratory/internlm2-base-20b

wqkv

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2-base-20b

internlm2-20b

Shanghai_AI_Laboratory/internlm2-20b

wqkv

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2-20b

internlm2-20b-sft-chat

Shanghai_AI_Laboratory/internlm2-chat-20b-sft

wqkv

internlm2

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2-chat-20b-sft

internlm2-20b-chat

Shanghai_AI_Laboratory/internlm2-chat-20b

wqkv

internlm2

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2-chat-20b

internlm2_5-1_8b

Shanghai_AI_Laboratory/internlm2_5-1_8b

wqkv

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2_5-1_8b

internlm2_5-1_8b-chat

Shanghai_AI_Laboratory/internlm2_5-1_8b-chat

wqkv

internlm2

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2_5-1_8b-chat

internlm2_5-7b

Shanghai_AI_Laboratory/internlm2_5-7b

wqkv

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2_5-7b

internlm2_5-7b-chat

Shanghai_AI_Laboratory/internlm2_5-7b-chat

wqkv

internlm2

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2_5-7b-chat

internlm2_5-7b-chat-1m

Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m

wqkv

internlm2

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2_5-7b-chat-1m

internlm2_5-20b

Shanghai_AI_Laboratory/internlm2_5-20b

wqkv

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2_5-20b

internlm2_5-20b-chat

Shanghai_AI_Laboratory/internlm2_5-20b-chat

wqkv

internlm2

βœ”

βœ”

βœ”

✘

transformers>=4.38

-

internlm/internlm2_5-20b-chat

internlm2-math-7b

Shanghai_AI_Laboratory/internlm2-math-base-7b

wqkv

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.38

math

internlm/internlm2-math-base-7b

internlm2-math-7b-chat

Shanghai_AI_Laboratory/internlm2-math-7b

wqkv

internlm2

βœ”

βœ”

βœ”

✘

transformers>=4.38

math

internlm/internlm2-math-7b

internlm2-math-20b

Shanghai_AI_Laboratory/internlm2-math-base-20b

wqkv

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.38

math

internlm/internlm2-math-base-20b

internlm2-math-20b-chat

Shanghai_AI_Laboratory/internlm2-math-20b

wqkv

internlm2

βœ”

βœ”

βœ”

✘

transformers>=4.38

math

internlm/internlm2-math-20b

deepseek-7b

deepseek-ai/deepseek-llm-7b-base

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

deepseek-ai/deepseek-llm-7b-base

deepseek-7b-chat

deepseek-ai/deepseek-llm-7b-chat

q_proj, k_proj, v_proj

deepseek

βœ”

βœ”

βœ”

✘

-

deepseek-ai/deepseek-llm-7b-chat

deepseek-moe-16b

deepseek-ai/deepseek-moe-16b-base

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

✘

✘

moe

deepseek-ai/deepseek-moe-16b-base

deepseek-moe-16b-chat

deepseek-ai/deepseek-moe-16b-chat

q_proj, k_proj, v_proj

deepseek

βœ”

βœ”

✘

✘

moe

deepseek-ai/deepseek-moe-16b-chat

deepseek-67b

deepseek-ai/deepseek-llm-67b-base

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

deepseek-ai/deepseek-llm-67b-base

deepseek-67b-chat

deepseek-ai/deepseek-llm-67b-chat

q_proj, k_proj, v_proj

deepseek

βœ”

βœ”

βœ”

✘

-

deepseek-ai/deepseek-llm-67b-chat

deepseek-coder-1_3b

deepseek-ai/deepseek-coder-1.3b-base

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

coding

deepseek-ai/deepseek-coder-1.3b-base

deepseek-coder-1_3b-instruct

deepseek-ai/deepseek-coder-1.3b-instruct

q_proj, k_proj, v_proj

deepseek-coder

βœ”

βœ”

βœ”

✘

coding

deepseek-ai/deepseek-coder-1.3b-instruct

deepseek-coder-6_7b

deepseek-ai/deepseek-coder-6.7b-base

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

coding

deepseek-ai/deepseek-coder-6.7b-base

deepseek-coder-6_7b-instruct

deepseek-ai/deepseek-coder-6.7b-instruct

q_proj, k_proj, v_proj

deepseek-coder

βœ”

βœ”

βœ”

✘

coding

deepseek-ai/deepseek-coder-6.7b-instruct

deepseek-coder-33b

deepseek-ai/deepseek-coder-33b-base

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

coding

deepseek-ai/deepseek-coder-33b-base

deepseek-coder-33b-instruct

deepseek-ai/deepseek-coder-33b-instruct

q_proj, k_proj, v_proj

deepseek-coder

βœ”

βœ”

βœ”

✘

coding

deepseek-ai/deepseek-coder-33b-instruct

deepseek-coder-v2-instruct

deepseek-ai/DeepSeek-Coder-V2-Instruct

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

deepseek2

βœ”

βœ”

✘

✘

transformers>=4.39.3

coding, moe

deepseek-ai/DeepSeek-Coder-V2-Instruct

deepseek-coder-v2-lite-instruct

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

deepseek2

βœ”

βœ”

✘

✘

transformers>=4.39.3

coding, moe

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

deepseek-coder-v2

deepseek-ai/DeepSeek-Coder-V2-Base

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.39.3

coding, moe

deepseek-ai/DeepSeek-Coder-V2-Base

deepseek-coder-v2-lite

deepseek-ai/DeepSeek-Coder-V2-Lite-Base

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.39.3

coding, moe

deepseek-ai/DeepSeek-Coder-V2-Lite-Base

deepseek-math-7b

deepseek-ai/deepseek-math-7b-base

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

math

deepseek-ai/deepseek-math-7b-base

deepseek-math-7b-instruct

deepseek-ai/deepseek-math-7b-instruct

q_proj, k_proj, v_proj

deepseek

βœ”

βœ”

βœ”

✘

math

deepseek-ai/deepseek-math-7b-instruct

deepseek-math-7b-chat

deepseek-ai/deepseek-math-7b-rl

q_proj, k_proj, v_proj

deepseek

βœ”

βœ”

βœ”

✘

math

deepseek-ai/deepseek-math-7b-rl

numina-math-7b

AI-ModelScope/NuminaMath-7B-TIR

q_proj, k_proj, v_proj

numina-math

βœ”

βœ”

✘

✘

math

AI-MO/NuminaMath-7B-TIR

deepseek-v2

deepseek-ai/DeepSeek-V2

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.39.3

moe

deepseek-ai/DeepSeek-V2

deepseek-v2-chat

deepseek-ai/DeepSeek-V2-Chat

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

deepseek2

βœ”

βœ”

✘

✘

transformers>=4.39.3

moe

deepseek-ai/DeepSeek-V2-Chat

deepseek-v2-lite

deepseek-ai/DeepSeek-V2-Lite

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.39.3

moe

deepseek-ai/DeepSeek-V2-Lite

deepseek-v2-lite-chat

deepseek-ai/DeepSeek-V2-Lite-Chat

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

deepseek2

βœ”

βœ”

✘

✘

transformers>=4.39.3

moe

deepseek-ai/DeepSeek-V2-Lite-Chat

deepseek-v2_5

deepseek-ai/DeepSeek-V2.5

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

deepseek2_5

βœ”

βœ”

✘

✘

transformers>=4.39.3

moe

deepseek-ai/DeepSeek-V2.5

gemma-2b

AI-ModelScope/gemma-2b

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.38

-

google/gemma-2b

gemma-7b

AI-ModelScope/gemma-7b

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.38

-

google/gemma-7b

gemma-2b-instruct

AI-ModelScope/gemma-2b-it

q_proj, k_proj, v_proj

gemma

βœ”

βœ”

✘

✘

transformers>=4.38

-

google/gemma-2b-it

gemma-7b-instruct

AI-ModelScope/gemma-7b-it

q_proj, k_proj, v_proj

gemma

βœ”

βœ”

✘

✘

transformers>=4.38

-

google/gemma-7b-it

gemma2-2b

LLM-Research/gemma-2-2b

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.42

-

google/gemma-2-2b

gemma2-9b

LLM-Research/gemma-2-9b

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.42

-

google/gemma-2-9b

gemma2-27b

LLM-Research/gemma-2-27b

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.42

-

google/gemma-2-27b

gemma2-2b-instruct

LLM-Research/gemma-2-2b-it

q_proj, k_proj, v_proj

gemma

βœ”

βœ”

✘

✘

transformers>=4.42

-

google/gemma-2-2b-it

gemma2-9b-instruct

LLM-Research/gemma-2-9b-it

q_proj, k_proj, v_proj

gemma

βœ”

βœ”

✘

✘

transformers>=4.42

-

google/gemma-2-9b-it

gemma2-27b-instruct

LLM-Research/gemma-2-27b-it

q_proj, k_proj, v_proj

gemma

βœ”

βœ”

✘

✘

transformers>=4.42

-

google/gemma-2-27b-it

minicpm-1b-sft-chat

OpenBMB/MiniCPM-1B-sft-bf16

q_proj, k_proj, v_proj

minicpm

βœ”

βœ”

✘

✘

transformers>=4.36.0

-

openbmb/MiniCPM-1B-sft-bf16

minicpm-2b-sft-chat

OpenBMB/MiniCPM-2B-sft-fp32

q_proj, k_proj, v_proj

minicpm

βœ”

βœ”

✘

✘

-

openbmb/MiniCPM-2B-sft-fp32

minicpm-2b-chat

OpenBMB/MiniCPM-2B-dpo-fp32

q_proj, k_proj, v_proj

minicpm

βœ”

βœ”

✘

✘

-

openbmb/MiniCPM-2B-dpo-fp32

minicpm-2b-128k

OpenBMB/MiniCPM-2B-128k

q_proj, k_proj, v_proj

chatml

βœ”

βœ”

✘

✘

transformers>=4.36.0

-

openbmb/MiniCPM-2B-128k

minicpm-moe-8x2b

OpenBMB/MiniCPM-MoE-8x2B

q_proj, k_proj, v_proj

minicpm

βœ”

βœ”

✘

✘

transformers>=4.36.0

moe

openbmb/MiniCPM-MoE-8x2B

minicpm3-4b

OpenBMB/MiniCPM3-4B

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj

chatml

βœ”

✘

✘

✘

transformers>=4.36

-

openbmb/MiniCPM3-4B

openbuddy-llama-65b-chat

OpenBuddy/openbuddy-llama-65b-v8-bf16

q_proj, k_proj, v_proj

openbuddy

βœ”

βœ”

βœ”

✘

-

OpenBuddy/openbuddy-llama-65b-v8-bf16

openbuddy-llama2-13b-chat

OpenBuddy/openbuddy-llama2-13b-v8.1-fp16

q_proj, k_proj, v_proj

openbuddy

βœ”

βœ”

βœ”

✘

-

OpenBuddy/openbuddy-llama2-13b-v8.1-fp16

openbuddy-llama2-70b-chat

OpenBuddy/openbuddy-llama2-70b-v10.1-bf16

q_proj, k_proj, v_proj

openbuddy

βœ”

βœ”

βœ”

✘

-

OpenBuddy/openbuddy-llama2-70b-v10.1-bf16

openbuddy-llama3-8b-chat

OpenBuddy/openbuddy-llama3-8b-v21.1-8k

q_proj, k_proj, v_proj

openbuddy2

βœ”

βœ”

βœ”

✘

-

OpenBuddy/openbuddy-llama3-8b-v21.1-8k

openbuddy-llama3-70b-chat

OpenBuddy/openbuddy-llama3-70b-v21.1-8k

q_proj, k_proj, v_proj

openbuddy2

βœ”

βœ”

βœ”

✘

-

OpenBuddy/openbuddy-llama3-70b-v21.1-8k

openbuddy-mistral-7b-chat

OpenBuddy/openbuddy-mistral-7b-v17.1-32k

q_proj, k_proj, v_proj

openbuddy

βœ”

βœ”

βœ”

✘

transformers>=4.34

-

OpenBuddy/openbuddy-mistral-7b-v17.1-32k

openbuddy-zephyr-7b-chat

OpenBuddy/openbuddy-zephyr-7b-v14.1

q_proj, k_proj, v_proj

openbuddy

βœ”

βœ”

βœ”

✘

transformers>=4.34

-

OpenBuddy/openbuddy-zephyr-7b-v14.1

openbuddy-deepseek-67b-chat

OpenBuddy/openbuddy-deepseek-67b-v15.2

q_proj, k_proj, v_proj

openbuddy

βœ”

βœ”

βœ”

✘

-

OpenBuddy/openbuddy-deepseek-67b-v15.2

openbuddy-mixtral-moe-7b-chat

OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k

q_proj, k_proj, v_proj

openbuddy

βœ”

βœ”

✘

✘

transformers>=4.36

moe

OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k

openbuddy-llama3_1-8b-chat

OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k

q_proj, k_proj, v_proj

openbuddy2

βœ”

βœ”

βœ”

✘

transformers>=4.43

-

OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k

mistral-7b

AI-ModelScope/Mistral-7B-v0.1

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.34

-

mistralai/Mistral-7B-v0.1

mistral-7b-v2

AI-ModelScope/Mistral-7B-v0.2-hf

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

transformers>=4.34

-

alpindale/Mistral-7B-v0.2-hf

mistral-7b-instruct

AI-ModelScope/Mistral-7B-Instruct-v0.1

q_proj, k_proj, v_proj

llama

βœ”

βœ”

βœ”

✘

transformers>=4.34

-

mistralai/Mistral-7B-Instruct-v0.1

mistral-7b-instruct-v2

AI-ModelScope/Mistral-7B-Instruct-v0.2

q_proj, k_proj, v_proj

llama

βœ”

βœ”

βœ”

✘

transformers>=4.34

-

mistralai/Mistral-7B-Instruct-v0.2

mistral-7b-instruct-v3

LLM-Research/Mistral-7B-Instruct-v0.3

q_proj, k_proj, v_proj

llama

βœ”

βœ”

βœ”

✘

transformers>=4.34

-

mistralai/Mistral-7B-Instruct-v0.3

mistral-nemo-base-2407

AI-ModelScope/Mistral-Nemo-Base-2407

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.43

-

mistralai/Mistral-Nemo-Base-2407

mistral-nemo-instruct-2407

AI-ModelScope/Mistral-Nemo-Instruct-2407

q_proj, k_proj, v_proj

mistral-nemo

βœ”

βœ”

✘

✘

transformers>=4.43

-

mistralai/Mistral-Nemo-Instruct-2407

mistral-large-instruct-2407

LLM-Research/Mistral-Large-Instruct-2407

q_proj, k_proj, v_proj

mistral-nemo

βœ”

βœ”

✘

✘

transformers>=4.43

-

mistralai/Mistral-Large-Instruct-2407

mistral-small-instruct-2409

AI-ModelScope/Mistral-Small-Instruct-2409

q_proj, k_proj, v_proj

mistral-nemo

βœ”

βœ”

✘

✘

transformers>=4.43

-

mistralai/Mistral-Small-Instruct-2409

mixtral-moe-7b

AI-ModelScope/Mixtral-8x7B-v0.1

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.36

moe

mistralai/Mixtral-8x7B-v0.1

mixtral-moe-7b-instruct

AI-ModelScope/Mixtral-8x7B-Instruct-v0.1

q_proj, k_proj, v_proj

llama

βœ”

βœ”

✘

✘

transformers>=4.36

moe

mistralai/Mixtral-8x7B-Instruct-v0.1

mixtral-moe-7b-aqlm-2bit-1x16

AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf

q_proj, k_proj, v_proj

default-generation

βœ”

✘

✘

✘

transformers>=4.38, aqlm, torch>=2.2.0

moe

ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf

mixtral-moe-8x22b-v1

AI-ModelScope/Mixtral-8x22B-v0.1

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.36

moe

mistral-community/Mixtral-8x22B-v0.1

ministral-8b-instruct-2410

AI-ModelScope/Ministral-8B-Instruct-2410

q_proj, k_proj, v_proj

mistral-nemo

βœ”

βœ”

✘

✘

transformers>=4.46

-

mistralai/Ministral-8B-Instruct-2410

wizardlm2-7b-awq

AI-ModelScope/WizardLM-2-7B-AWQ

q_proj, k_proj, v_proj

wizardlm2-awq

βœ”

βœ”

✘

✘

transformers>=4.34

-

MaziyarPanahi/WizardLM-2-7B-AWQ

wizardlm2-8x22b

AI-ModelScope/WizardLM-2-8x22B

q_proj, k_proj, v_proj

wizardlm2

βœ”

βœ”

✘

✘

transformers>=4.36

-

alpindale/WizardLM-2-8x22B

baichuan-7b

baichuan-inc/baichuan-7B

W_pack

default-generation

✘

βœ”

βœ”

✘

transformers<4.34

-

baichuan-inc/Baichuan-7B

baichuan-13b

baichuan-inc/Baichuan-13B-Base

W_pack

default-generation

✘

βœ”

βœ”

✘

transformers<4.34

-

baichuan-inc/Baichuan-13B-Base

baichuan-13b-chat

baichuan-inc/Baichuan-13B-Chat

W_pack

baichuan

✘

βœ”

βœ”

✘

transformers<4.34

-

baichuan-inc/Baichuan-13B-Chat

baichuan2-7b

baichuan-inc/Baichuan2-7B-Base

W_pack

default-generation

✘

βœ”

βœ”

✘

-

baichuan-inc/Baichuan2-7B-Base

baichuan2-7b-chat

baichuan-inc/Baichuan2-7B-Chat

W_pack

baichuan

✘

βœ”

βœ”

✘

-

baichuan-inc/Baichuan2-7B-Chat

baichuan2-7b-chat-int4

baichuan-inc/Baichuan2-7B-Chat-4bits

W_pack

baichuan

✘

✘

✘

✘

bitsandbytes<0.41.2, accelerate<0.26

-

baichuan-inc/Baichuan2-7B-Chat-4bits

baichuan2-13b

baichuan-inc/Baichuan2-13B-Base

W_pack

default-generation

✘

βœ”

βœ”

✘

-

baichuan-inc/Baichuan2-13B-Base

baichuan2-13b-chat

baichuan-inc/Baichuan2-13B-Chat

W_pack

baichuan

✘

βœ”

βœ”

✘

-

baichuan-inc/Baichuan2-13B-Chat

baichuan2-13b-chat-int4

baichuan-inc/Baichuan2-13B-Chat-4bits

W_pack

baichuan

✘

✘

✘

✘

bitsandbytes<0.41.2, accelerate<0.26

-

baichuan-inc/Baichuan2-13B-Chat-4bits

yuan2-2b-instruct

YuanLLM/Yuan2.0-2B-hf

q_proj, k_proj, v_proj

yuan

βœ”

✘

✘

✘

-

IEITYuan/Yuan2-2B-hf

yuan2-2b-janus-instruct

YuanLLM/Yuan2-2B-Janus-hf

q_proj, k_proj, v_proj

yuan

βœ”

✘

✘

✘

-

IEITYuan/Yuan2-2B-Janus-hf

yuan2-51b-instruct

YuanLLM/Yuan2.0-51B-hf

q_proj, k_proj, v_proj

yuan

βœ”

✘

✘

✘

-

IEITYuan/Yuan2-51B-hf

yuan2-102b-instruct

YuanLLM/Yuan2.0-102B-hf

q_proj, k_proj, v_proj

yuan

βœ”

✘

✘

✘

-

IEITYuan/Yuan2-102B-hf

yuan2-m32

YuanLLM/Yuan2-M32-hf

q_proj, k_proj, v_proj

yuan

βœ”

✘

✘

✘

moe

IEITYuan/Yuan2-M32-hf

xverse-7b

xverse/XVERSE-7B

q_proj, k_proj, v_proj

default-generation

✘

βœ”

✘

✘

-

xverse/XVERSE-7B

xverse-7b-chat

xverse/XVERSE-7B-Chat

q_proj, k_proj, v_proj

xverse

✘

βœ”

✘

✘

-

xverse/XVERSE-7B-Chat

xverse-13b

xverse/XVERSE-13B

q_proj, k_proj, v_proj

default-generation

✘

βœ”

✘

✘

-

xverse/XVERSE-13B

xverse-13b-chat

xverse/XVERSE-13B-Chat

q_proj, k_proj, v_proj

xverse

✘

βœ”

✘

✘

-

xverse/XVERSE-13B-Chat

xverse-65b

xverse/XVERSE-65B

q_proj, k_proj, v_proj

default-generation

✘

βœ”

✘

✘

-

xverse/XVERSE-65B

xverse-65b-v2

xverse/XVERSE-65B-2

q_proj, k_proj, v_proj

default-generation

✘

βœ”

✘

✘

-

xverse/XVERSE-65B-2

xverse-65b-chat

xverse/XVERSE-65B-Chat

q_proj, k_proj, v_proj

xverse

✘

βœ”

✘

✘

-

xverse/XVERSE-65B-Chat

xverse-13b-256k

xverse/XVERSE-13B-256K

q_proj, k_proj, v_proj

default-generation

✘

βœ”

✘

✘

-

xverse/XVERSE-13B-256K

xverse-moe-a4_2b

xverse/XVERSE-MoE-A4.2B

q_proj, k_proj, v_proj

default-generation

✘

✘

✘

✘

moe

xverse/XVERSE-MoE-A4.2B

orion-14b

OrionStarAI/Orion-14B-Base

q_proj, k_proj, v_proj

default-generation

βœ”

✘

✘

✘

-

OrionStarAI/Orion-14B-Base

orion-14b-chat

OrionStarAI/Orion-14B-Chat

q_proj, k_proj, v_proj

orion

βœ”

✘

✘

✘

-

OrionStarAI/Orion-14B-Chat

bluelm-7b

vivo-ai/BlueLM-7B-Base

q_proj, k_proj, v_proj

default-generation

✘

✘

✘

✘

-

vivo-ai/BlueLM-7B-Base

bluelm-7b-32k

vivo-ai/BlueLM-7B-Base-32K

q_proj, k_proj, v_proj

default-generation

✘

✘

✘

✘

-

vivo-ai/BlueLM-7B-Base-32K

bluelm-7b-chat

vivo-ai/BlueLM-7B-Chat

q_proj, k_proj, v_proj

bluelm

✘

✘

✘

✘

-

vivo-ai/BlueLM-7B-Chat

bluelm-7b-chat-32k

vivo-ai/BlueLM-7B-Chat-32K

q_proj, k_proj, v_proj

bluelm

✘

✘

✘

✘

-

vivo-ai/BlueLM-7B-Chat-32K

ziya2-13b

Fengshenbang/Ziya2-13B-Base

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

βœ”

✘

-

IDEA-CCNL/Ziya2-13B-Base

ziya2-13b-chat

Fengshenbang/Ziya2-13B-Chat

q_proj, k_proj, v_proj

ziya

βœ”

βœ”

βœ”

✘

-

IDEA-CCNL/Ziya2-13B-Chat

skywork-13b

skywork/Skywork-13B-base

q_proj, k_proj, v_proj

default-generation

✘

✘

✘

✘

-

Skywork/Skywork-13B-base

skywork-13b-chat

skywork/Skywork-13B-chat

q_proj, k_proj, v_proj

skywork

✘

✘

✘

✘

-

-

zephyr-7b-beta-chat

modelscope/zephyr-7b-beta

q_proj, k_proj, v_proj

zephyr

βœ”

βœ”

βœ”

✘

transformers>=4.34

-

HuggingFaceH4/zephyr-7b-beta

polylm-13b

damo/nlp_polylm_13b_text_generation

c_attn

default-generation

✘

✘

✘

✘

-

DAMO-NLP-MT/polylm-13b

seqgpt-560m

damo/nlp_seqgpt-560m

query_key_value

default-generation

✘

βœ”

✘

✘

-

DAMO-NLP/SeqGPT-560M

sus-34b-chat

SUSTC/SUS-Chat-34B

q_proj, k_proj, v_proj

sus

βœ”

βœ”

βœ”

✘

-

SUSTech/SUS-Chat-34B

tongyi-finance-14b

TongyiFinance/Tongyi-Finance-14B

c_attn

default-generation

βœ”

βœ”

βœ”

✘

financial

-

tongyi-finance-14b-chat

TongyiFinance/Tongyi-Finance-14B-Chat

c_attn

qwen

βœ”

βœ”

βœ”

✘

financial

jxy/Tongyi-Finance-14B-Chat

tongyi-finance-14b-chat-int4

TongyiFinance/Tongyi-Finance-14B-Chat-Int4

c_attn

qwen

βœ”

βœ”

✘

✘

auto_gptq>=0.5

financial

jxy/Tongyi-Finance-14B-Chat-Int4

codefuse-codellama-34b-chat

codefuse-ai/CodeFuse-CodeLlama-34B

q_proj, k_proj, v_proj

codefuse-codellama

βœ”

βœ”

βœ”

✘

coding

codefuse-ai/CodeFuse-CodeLlama-34B

codefuse-codegeex2-6b-chat

codefuse-ai/CodeFuse-CodeGeeX2-6B

query_key_value

codefuse

✘

βœ”

✘

✘

transformers<4.34

coding

codefuse-ai/CodeFuse-CodeGeeX2-6B

codefuse-qwen-14b-chat

codefuse-ai/CodeFuse-QWen-14B

c_attn

codefuse

βœ”

βœ”

βœ”

✘

coding

codefuse-ai/CodeFuse-QWen-14B

phi2-3b

AI-ModelScope/phi-2

Wqkv

default-generation

βœ”

βœ”

✘

✘

coding

microsoft/phi-2

phi3-4b-4k-instruct

LLM-Research/Phi-3-mini-4k-instruct

qkv_proj

phi3

βœ”

βœ”

✘

✘

transformers>=4.36

-

microsoft/Phi-3-mini-4k-instruct

phi3-4b-128k-instruct

LLM-Research/Phi-3-mini-128k-instruct

qkv_proj

phi3

βœ”

βœ”

✘

✘

transformers>=4.36

-

microsoft/Phi-3-mini-128k-instruct

phi3-small-8k-instruct

LLM-Research/Phi-3-small-8k-instruct

query_key_value

phi3

βœ”

βœ”

✘

✘

transformers>=4.36

-

microsoft/Phi-3-small-8k-instruct

phi3-medium-4k-instruct

LLM-Research/Phi-3-medium-4k-instruct

qkv_proj

phi3

βœ”

βœ”

✘

✘

transformers>=4.36

-

microsoft/Phi-3-medium-4k-instruct

phi3-small-128k-instruct

LLM-Research/Phi-3-small-128k-instruct

query_key_value

phi3

βœ”

βœ”

✘

✘

transformers>=4.36

-

microsoft/Phi-3-small-128k-instruct

phi3-medium-128k-instruct

LLM-Research/Phi-3-medium-128k-instruct

qkv_proj

phi3

βœ”

βœ”

✘

✘

transformers>=4.36

-

microsoft/Phi-3-medium-128k-instruct

phi3_5-mini-instruct

LLM-Research/Phi-3.5-mini-instruct

qkv_proj

phi3

βœ”

βœ”

✘

✘

transformers>=4.36

-

microsoft/Phi-3.5-mini-instruct

phi3_5-moe-instruct

LLM-Research/Phi-3.5-MoE-instruct

q_proj, k_proj, v_proj

phi3

βœ”

βœ”

✘

✘

transformers>=4.36

moe

microsoft/Phi-3.5-MoE-instruct

mamba-130m

AI-ModelScope/mamba-130m-hf

in_proj, x_proj, embeddings, out_proj

default-generation

✘

✘

✘

✘

transformers>=4.39.0

-

state-spaces/mamba-130m-hf

mamba-370m

AI-ModelScope/mamba-370m-hf

in_proj, x_proj, embeddings, out_proj

default-generation

✘

✘

✘

✘

transformers>=4.39.0

-

state-spaces/mamba-370m-hf

mamba-390m

AI-ModelScope/mamba-390m-hf

in_proj, x_proj, embeddings, out_proj

default-generation

✘

✘

✘

✘

transformers>=4.39.0

-

state-spaces/mamba-390m-hf

mamba-790m

AI-ModelScope/mamba-790m-hf

in_proj, x_proj, embeddings, out_proj

default-generation

✘

✘

✘

✘

transformers>=4.39.0

-

state-spaces/mamba-790m-hf

mamba-1.4b

AI-ModelScope/mamba-1.4b-hf

in_proj, x_proj, embeddings, out_proj

default-generation

✘

✘

✘

✘

transformers>=4.39.0

-

state-spaces/mamba-1.4b-hf

mamba-2.8b

AI-ModelScope/mamba-2.8b-hf

in_proj, x_proj, embeddings, out_proj

default-generation

✘

✘

✘

✘

transformers>=4.39.0

-

state-spaces/mamba-2.8b-hf

telechat-7b

TeleAI/TeleChat-7B

key_value, query

telechat

βœ”

✘

✘

✘

-

Tele-AI/telechat-7B

telechat-12b

TeleAI/TeleChat-12B

key_value, query

telechat

βœ”

✘

✘

✘

-

Tele-AI/TeleChat-12B

telechat-12b-v2

TeleAI/TeleChat-12B-v2

key_value, query

telechat

βœ”

✘

✘

✘

-

Tele-AI/TeleChat-12B-v2

telechat-12b-v2-gptq-int4

swift/TeleChat-12B-V2-GPTQ-Int4

key_value, query

telechat

βœ”

✘

✘

✘

auto_gptq>=0.5

-

-

telechat2-115b

TeleAI/TeleChat2-115B

key_value, query

telechat2

βœ”

✘

✘

✘

-

Tele-AI/TeleChat2-115B

grok-1

colossalai/grok-1-pytorch

q_proj, k_proj, v_proj

default-generation

✘

✘

✘

✘

-

hpcai-tech/grok-1

dbrx-instruct

AI-ModelScope/dbrx-instruct

attn.Wqkv

dbrx

βœ”

βœ”

✘

✘

transformers>=4.36

moe

databricks/dbrx-instruct

dbrx-base

AI-ModelScope/dbrx-base

attn.Wqkv

dbrx

βœ”

βœ”

✘

✘

transformers>=4.36

moe

databricks/dbrx-base

mengzi3-13b-base

langboat/Mengzi3-13B-Base

q_proj, k_proj, v_proj

mengzi

βœ”

βœ”

✘

✘

-

Langboat/Mengzi3-13B-Base

c4ai-command-r-v01

AI-ModelScope/c4ai-command-r-v01

q_proj, k_proj, v_proj

c4ai

βœ”

βœ”

✘

✘

transformers>=4.39.1

-

CohereForAI/c4ai-command-r-v01

c4ai-command-r-plus

AI-ModelScope/c4ai-command-r-plus

q_proj, k_proj, v_proj

c4ai

βœ”

βœ”

✘

✘

transformers>4.39

-

CohereForAI/c4ai-command-r-plus

aya-expanse-8b

AI-ModelScope/aya-expanse-8b

q_proj, k_proj, v_proj

aya

βœ”

βœ”

✘

✘

transformers>=4.44.0

-

CohereForAI/aya-expanse-8b

aya-expanse-32b

AI-ModelScope/aya-expanse-32b

q_proj, k_proj, v_proj

aya

βœ”

βœ”

✘

✘

transformers>=4.44.0

-

CohereForAI/aya-expanse-32b

codestral-22b

swift/Codestral-22B-v0.1

q_proj, k_proj, v_proj

default-generation

βœ”

βœ”

✘

✘

transformers>=4.34

-

mistralai/Codestral-22B-v0.1

MLLM

Model Type

Model ID

Default Lora Target Modules

Default Template

Support Flash Attn

Support vLLM

Support LMDeploy

Support Megatron

Requires

Tags

HF Model ID

qwen-vl

qwen/Qwen-VL

^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).*

qwen-vl-generation

βœ”

βœ”

βœ”

✘

vision

Qwen/Qwen-VL

qwen-vl-chat

qwen/Qwen-VL-Chat

^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).*

qwen-vl

βœ”

βœ”

βœ”

✘

vision

Qwen/Qwen-VL-Chat

qwen-vl-chat-int4

qwen/Qwen-VL-Chat-Int4

^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).*

qwen-vl

βœ”

βœ”

✘

✘

auto_gptq>=0.5

vision

Qwen/Qwen-VL-Chat-Int4

qwen-audio

qwen/Qwen-Audio

^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).*

qwen-audio-generation

βœ”

✘

✘

✘

audio

Qwen/Qwen-Audio

qwen-audio-chat

qwen/Qwen-Audio-Chat

^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).*

qwen-audio

βœ”

✘

✘

✘

audio

Qwen/Qwen-Audio-Chat

qwen2-audio-7b

qwen/Qwen2-Audio-7B

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-audio-generation

βœ”

✘

✘

✘

librosa, transformers>=4.45

audio

Qwen/Qwen2-Audio-7B

qwen2-audio-7b-instruct

qwen/Qwen2-Audio-7B-Instruct

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-audio

βœ”

✘

✘

✘

librosa, transformers>=4.45

audio

Qwen/Qwen2-Audio-7B-Instruct

qwen2-vl-2b

qwen/Qwen2-VL-2B

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl-generation

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils

vision, video

Qwen/Qwen2-VL-2B

qwen2-vl-2b-instruct

qwen/Qwen2-VL-2B-Instruct

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils

vision, video

Qwen/Qwen2-VL-2B-Instruct

qwen2-vl-2b-instruct-gptq-int4

qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5

vision, video

Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4

qwen2-vl-2b-instruct-gptq-int8

qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5

vision, video

Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8

qwen2-vl-2b-instruct-awq

qwen/Qwen2-VL-2B-Instruct-AWQ

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils, autoawq

vision, video

Qwen/Qwen2-VL-2B-Instruct-AWQ

qwen2-vl-7b

qwen/Qwen2-VL-7B

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl-generation

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils

vision, video

Qwen/Qwen2-VL-7B

qwen2-vl-7b-instruct

qwen/Qwen2-VL-7B-Instruct

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils

vision, video

Qwen/Qwen2-VL-7B-Instruct

qwen2-vl-7b-instruct-gptq-int4

qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5

vision, video

Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4

qwen2-vl-7b-instruct-gptq-int8

qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5

vision, video

Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8

qwen2-vl-7b-instruct-awq

qwen/Qwen2-VL-7B-Instruct-AWQ

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils, autoawq

vision, video

Qwen/Qwen2-VL-7B-Instruct-AWQ

qwen2-vl-72b

qwen/Qwen2-VL-72B

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl-generation

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils

vision, video

Qwen/Qwen2-VL-72B

qwen2-vl-72b-instruct

qwen/Qwen2-VL-72B-Instruct

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils

vision, video

Qwen/Qwen2-VL-72B-Instruct

qwen2-vl-72b-instruct-gptq-int4

qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5

vision, video

Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4

qwen2-vl-72b-instruct-gptq-int8

qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5

vision, video

Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8

qwen2-vl-72b-instruct-awq

qwen/Qwen2-VL-72B-Instruct-AWQ

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

βœ”

βœ”

✘

✘

transformers>=4.45.dev.0, qwen_vl_utils, autoawq

vision, video

Qwen/Qwen2-VL-72B-Instruct-AWQ

glm4v-9b-chat

ZhipuAI/glm-4v-9b

^(transformer.encoder)(?!.*(lm_head|output|emb|wte|shared)).*

glm4v

✘

✘

✘

✘

transformers>=4.42

vision

THUDM/glm-4v-9b

glm-edge-v-2b

ZhipuAI/glm-edge-v-2b

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

glm-edge-v

βœ”

✘

✘

✘

transformers>=4.46

vision

THUDM/glm-edge-v-2b

glm-edge-v-5b

ZhipuAI/glm-edge-v-5b

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

glm-edge-v

βœ”

✘

✘

✘

transformers>=4.46

vision

THUDM/glm-edge-v-5b

llama3_2-11b-vision

LLM-Research/Llama-3.2-11B-Vision

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama3_2-vision-generation

βœ”

βœ”

✘

✘

transformers>=4.45

vision

meta-llama/Llama-3.2-11B-Vision

llama3_2-11b-vision-instruct

LLM-Research/Llama-3.2-11B-Vision-Instruct

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama3_2-vision

βœ”

βœ”

✘

✘

transformers>=4.45

vision

meta-llama/Llama-3.2-11B-Vision-Instruct

llama3_2-90b-vision

LLM-Research/Llama-3.2-90B-Vision

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama3_2-vision-generation

βœ”

βœ”

✘

✘

transformers>=4.45

vision

meta-llama/Llama-3.2-90B-Vision

llama3_2-90b-vision-instruct

LLM-Research/Llama-3.2-90B-Vision-Instruct

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama3_2-vision

βœ”

βœ”

✘

✘

transformers>=4.45

vision

meta-llama/Llama-3.2-90B-Vision-Instruct

llama3_1-8b-omni

ICTNLP/Llama-3.1-8B-Omni

^(model.layers|model.speech_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama3_1-omni

βœ”

✘

✘

✘

whisper, openai-whisper

audio

ICTNLP/Llama-3.1-8B-Omni

idefics3-8b-llama3

AI-ModelScope/Idefics3-8B-Llama3

^(model.text_model|model.connector)(?!.*(lm_head|output|emb|wte|shared)).*

idefics3

βœ”

✘

✘

✘

transformers>=4.45

vision

HuggingFaceM4/Idefics3-8B-Llama3

llava1_5-7b-instruct

swift/llava-1.5-7b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava1_5

βœ”

βœ”

✘

✘

transformers>=4.36

vision

llava-hf/llava-1.5-7b-hf

llava1_5-13b-instruct

swift/llava-1.5-13b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava1_5

βœ”

βœ”

✘

✘

transformers>=4.36

vision

llava-hf/llava-1.5-13b-hf

llava1_6-mistral-7b-instruct

swift/llava-v1.6-mistral-7b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-mistral

βœ”

βœ”

✘

✘

transformers>=4.39

vision

llava-hf/llava-v1.6-mistral-7b-hf

llava1_6-vicuna-7b-instruct

swift/llava-v1.6-vicuna-7b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-vicuna

βœ”

βœ”

✘

✘

transformers>=4.39

vision

llava-hf/llava-v1.6-vicuna-7b-hf

llava1_6-vicuna-13b-instruct

swift/llava-v1.6-vicuna-13b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-vicuna

βœ”

βœ”

✘

✘

transformers>=4.39

vision

llava-hf/llava-v1.6-vicuna-13b-hf

llava1_6-llama3_1-8b-instruct

swift/llava-llama3.1-8b

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-next-llama3

βœ”

✘

✘

✘

transformers>=4.41

vision

-

llava1_6-yi-34b-instruct

swift/llava-v1.6-34b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-yi

βœ”

βœ”

✘

✘

transformers>=4.39

vision

llava-hf/llava-v1.6-34b-hf

llama3-llava-next-8b-hf

swift/llama3-llava-next-8b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama-llava-next-hf

βœ”

βœ”

✘

✘

transformers>=4.39

vision

llava-hf/llama3-llava-next-8b-hf

llava-next-72b-hf

AI-ModelScope/llava-next-72b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama-qwen-hf

βœ”

βœ”

✘

✘

transformers>=4.39

vision

llava-hf/llava-next-72b-hf

llava-next-110b-hf

AI-ModelScope/llava-next-110b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama-qwen-hf

βœ”

βœ”

✘

✘

transformers>=4.39

vision

llava-hf/llava-next-110b-hf

llava-onevision-qwen2-0_5b-ov

AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-onevision-qwen

βœ”

✘

✘

✘

transformers>=4.45

vision, video

llava-hf/llava-onevision-qwen2-0.5b-ov-hf

llava-onevision-qwen2-7b-ov

AI-ModelScope/llava-onevision-qwen2-7b-ov-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-onevision-qwen

βœ”

✘

✘

✘

transformers>=4.45

vision, video

llava-hf/llava-onevision-qwen2-7b-ov-hf

llava-onevision-qwen2-72b-ov

AI-ModelScope/llava-onevision-qwen2-72b-ov-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-onevision-qwen

βœ”

✘

✘

✘

transformers>=4.45

vision, video

llava-hf/llava-onevision-qwen2-72b-ov-hf

llama3-llava-next-8b

AI-Modelscope/llama3-llava-next-8b

^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama3-llava-next

βœ”

✘

✘

✘

vision

lmms-lab/llama3-llava-next-8b

llava-next-72b

AI-Modelscope/llava-next-72b

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-qwen

βœ”

✘

✘

✘

vision

lmms-lab/llava-next-72b

llava-next-110b

AI-Modelscope/llava-next-110b

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-qwen

βœ”

✘

✘

✘

vision

lmms-lab/llava-next-110b

llava-next-video-7b-instruct

swift/LLaVA-NeXT-Video-7B-hf

^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).*

llava-next-video

βœ”

βœ”

✘

✘

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-7B-hf

llava-next-video-7b-32k-instruct

swift/LLaVA-NeXT-Video-7B-32K-hf

^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).*

llava-next-video

βœ”

βœ”

✘

✘

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-7B-32K-hf

llava-next-video-7b-dpo-instruct

swift/LLaVA-NeXT-Video-7B-DPO-hf

^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).*

llava-next-video

βœ”

βœ”

✘

✘

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-7B-DPO-hf

llava-next-video-34b-instruct

swift/LLaVA-NeXT-Video-34B-hf

^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).*

llava-next-video-yi

βœ”

βœ”

✘

✘

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-34B-hf

yi-vl-6b-chat

01ai/Yi-VL-6B

^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).*

yi-vl

βœ”

✘

✘

✘

transformers>=4.34

vision

01-ai/Yi-VL-6B

yi-vl-34b-chat

01ai/Yi-VL-34B

^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).*

yi-vl

βœ”

✘

✘

✘

transformers>=4.34

vision

01-ai/Yi-VL-34B

llava-llama3-8b-v1_1

AI-ModelScope/llava-llama-3-8b-v1_1-transformers

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-llama-instruct

βœ”

βœ”

✘

✘

transformers>=4.36

vision

xtuner/llava-llama-3-8b-v1_1-transformers

internlm-xcomposer2-7b-chat

Shanghai_AI_Laboratory/internlm-xcomposer2-7b

attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3

internlm-xcomposer2

βœ”

✘

βœ”

✘

vision

internlm/internlm-xcomposer2-7b

internlm-xcomposer2-4khd-7b-chat

Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b

attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3

internlm-xcomposer2-4khd

βœ”

✘

βœ”

✘

vision

internlm/internlm-xcomposer2-4khd-7b

internlm-xcomposer2_5-7b-chat

Shanghai_AI_Laboratory/internlm-xcomposer2d5-7b

attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3

internlm-xcomposer2_5

βœ”

✘

βœ”

✘

vision

internlm/internlm-xcomposer2d5-7b

internvl-chat-v1_5

AI-ModelScope/InternVL-Chat-V1-5

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl

βœ”

βœ”

βœ”

✘

transformers>=4.35, timm

vision

OpenGVLab/InternVL-Chat-V1-5

internvl-chat-v1_5-int8

AI-ModelScope/InternVL-Chat-V1-5-int8

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl

βœ”

✘

✘

✘

transformers>=4.35, timm

vision

OpenGVLab/InternVL-Chat-V1-5-int8

mini-internvl-chat-2b-v1_5

OpenGVLab/Mini-InternVL-Chat-2B-V1-5

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl

βœ”

βœ”

βœ”

✘

transformers>=4.35, timm

vision

OpenGVLab/Mini-InternVL-Chat-2B-V1-5

mini-internvl-chat-4b-v1_5

OpenGVLab/Mini-InternVL-Chat-4B-V1-5

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl-phi3

βœ”

βœ”

✘

✘

transformers>=4.35,<4.42, timm

vision

OpenGVLab/Mini-InternVL-Chat-4B-V1-5

internvl2-1b

OpenGVLab/InternVL2-1B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

βœ”

βœ”

βœ”

✘

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-1B

internvl2-2b

OpenGVLab/InternVL2-2B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

βœ”

βœ”

βœ”

✘

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-2B

internvl2-4b

OpenGVLab/InternVL2-4B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2-phi3

βœ”

βœ”

βœ”

✘

transformers>=4.36,<4.42, timm

vision, video

OpenGVLab/InternVL2-4B

internvl2-8b

OpenGVLab/InternVL2-8B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

βœ”

βœ”

βœ”

✘

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-8B

internvl2-26b

OpenGVLab/InternVL2-26B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

βœ”

βœ”

βœ”

✘

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-26B

internvl2-40b

OpenGVLab/InternVL2-40B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

βœ”

βœ”

βœ”

✘

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-40B

internvl2-llama3-76b

OpenGVLab/InternVL2-Llama3-76B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

βœ”

βœ”

βœ”

✘

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Llama3-76B

internvl2-2b-awq

OpenGVLab/InternVL2-2B-AWQ

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

βœ”

βœ”

βœ”

✘

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-2B-AWQ

internvl2-8b-awq

OpenGVLab/InternVL2-8B-AWQ

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

βœ”

βœ”

βœ”

✘

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-8B-AWQ

internvl2-26b-awq

OpenGVLab/InternVL2-26B-AWQ

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

βœ”

βœ”

βœ”

✘

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-26B-AWQ

internvl2-40b-awq

OpenGVLab/InternVL2-40B-AWQ

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

βœ”

βœ”

βœ”

✘

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-40B-AWQ

internvl2-llama3-76b-awq

OpenGVLab/InternVL2-Llama3-76B-AWQ

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

βœ”

βœ”

βœ”

✘

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Llama3-76B-AWQ

deepseek-janus-1_3b

deepseek-ai/Janus-1.3B

^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).*

deepseek-janus

βœ”

✘

✘

✘

vision

deepseek-ai/Janus-1.3B

deepseek-vl-1_3b-chat

deepseek-ai/deepseek-vl-1.3b-chat

^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).*

deepseek-vl

βœ”

✘

βœ”

✘

vision

deepseek-ai/deepseek-vl-1.3b-chat

deepseek-vl-7b-chat

deepseek-ai/deepseek-vl-7b-chat

^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).*

deepseek-vl

βœ”

✘

βœ”

✘

vision

deepseek-ai/deepseek-vl-7b-chat

ovis1_6-gemma2-9b

AIDC-AI/Ovis1.6-Gemma2-9B

^(llm)(?!.*(lm_head|output|emb|wte|shared)).*

ovis1_6

βœ”

✘

✘

✘

transformers>=4.42

vision

AIDC-AI/Ovis1.6-Gemma2-9B

paligemma-3b-pt-224

AI-ModelScope/paligemma-3b-pt-224

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

paligemma

βœ”

βœ”

✘

✘

transformers>=4.41

vision

google/paligemma-3b-pt-224

paligemma-3b-pt-448

AI-ModelScope/paligemma-3b-pt-448

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

paligemma

βœ”

βœ”

✘

✘

transformers>=4.41

vision

google/paligemma-3b-pt-448

paligemma-3b-pt-896

AI-ModelScope/paligemma-3b-pt-896

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

paligemma

βœ”

βœ”

✘

✘

transformers>=4.41

vision

google/paligemma-3b-pt-896

paligemma-3b-mix-224

AI-ModelScope/paligemma-3b-mix-224

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

paligemma

βœ”

βœ”

✘

✘

transformers>=4.41

vision

google/paligemma-3b-mix-224

paligemma-3b-mix-448

AI-ModelScope/paligemma-3b-mix-448

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

paligemma

βœ”

βœ”

✘

✘

transformers>=4.41

vision

google/paligemma-3b-mix-448

minicpm-v-3b-chat

OpenBMB/MiniCPM-V

^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).*

minicpm-v

βœ”

✘

✘

✘

timm, transformers<4.42

vision

openbmb/MiniCPM-V

minicpm-v-v2-chat

OpenBMB/MiniCPM-V-2

^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).*

minicpm-v

βœ”

✘

✘

✘

timm, transformers<4.42

vision

openbmb/MiniCPM-V-2

minicpm-v-v2_5-chat

OpenBMB/MiniCPM-Llama3-V-2_5

^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).*

minicpm-v-v2_5

βœ”

βœ”

✘

✘

timm, transformers>=4.36

vision

openbmb/MiniCPM-Llama3-V-2_5

minicpm-v-v2_6-chat

OpenBMB/MiniCPM-V-2_6

^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).*

minicpm-v-v2_6

βœ”

βœ”

✘

✘

timm, transformers>=4.36

vision, video

openbmb/MiniCPM-V-2_6

pixtral-12b

AI-ModelScope/pixtral-12b

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

pixtral

✘

✘

✘

✘

transformers>=4.45

vision

mistral-community/pixtral-12b

mplug-owl2-chat

iic/mPLUG-Owl2

q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1

mplug-owl2

βœ”

✘

✘

✘

transformers<4.35, icecream

vision

MAGAer13/mplug-owl2-llama2-7b

mplug-owl2_1-chat

iic/mPLUG-Owl2.1

c_attn.multiway.0, c_attn.multiway.1

mplug-owl2

βœ”

✘

✘

✘

transformers<4.35, icecream

vision

Mizukiluke/mplug_owl_2_1

mplug-owl3-1b-chat

iic/mPLUG-Owl3-1B-241014

^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).*

mplug_owl3

βœ”

✘

✘

✘

transformers>=4.36, icecream

vision, video

mPLUG/mPLUG-Owl3-1B-241014

mplug-owl3-2b-chat

iic/mPLUG-Owl3-2B-241014

^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).*

mplug_owl3

βœ”

✘

✘

✘

transformers>=4.36, icecream

vision, video

mPLUG/mPLUG-Owl3-2B-241014

mplug-owl3-7b-chat

iic/mPLUG-Owl3-7B-240728

^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).*

mplug_owl3

βœ”

✘

✘

✘

transformers>=4.36, icecream

vision, video

mPLUG/mPLUG-Owl3-7B-240728

mplug-owl3v-7b-chat

iic/mPLUG-Owl3-7B-241101

^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).*

mplug_owl3v

βœ”

✘

✘

✘

transformers>=4.36, icecream

vision, video

mPLUG/mPLUG-Owl3-7B-241101

phi3-vision-128k-instruct

LLM-Research/Phi-3-vision-128k-instruct

^(model.layers|model.vision_embed_tokens.img_projection)(?!.*(lm_head|output|emb|wte|shared)).*

phi3-vl

βœ”

βœ”

✘

✘

transformers>=4.36

vision

microsoft/Phi-3-vision-128k-instruct

phi3_5-vision-instruct

LLM-Research/Phi-3.5-vision-instruct

^(model.layers|model.vision_embed_tokens.img_projection)(?!.*(lm_head|output|emb|wte|shared)).*

phi3-vl

βœ”

βœ”

✘

✘

transformers>=4.36

vision

microsoft/Phi-3.5-vision-instruct

cogvlm-17b-chat

ZhipuAI/cogvlm-chat

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

cogvlm

✘

✘

✘

✘

transformers<4.42

vision

THUDM/cogvlm-chat-hf

cogvlm2-19b-chat

ZhipuAI/cogvlm2-llama3-chinese-chat-19B

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

cogvlm

✘

✘

βœ”

✘

transformers<4.42

vision

THUDM/cogvlm2-llama3-chinese-chat-19B

cogvlm2-en-19b-chat

ZhipuAI/cogvlm2-llama3-chat-19B

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

cogvlm

✘

✘

βœ”

✘

transformers<4.42

vision

THUDM/cogvlm2-llama3-chat-19B

cogvlm2-video-13b-chat

ZhipuAI/cogvlm2-video-llama3-chat

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

cogvlm2-video

✘

✘

✘

✘

decord, pytorchvideo, transformers>=4.42

vision, video

THUDM/cogvlm2-video-llama3-chat

cogagent-18b-chat

ZhipuAI/cogagent-chat

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

cogagent-chat

✘

✘

✘

✘

timm

vision

THUDM/cogagent-chat-hf

cogagent-18b-instruct

ZhipuAI/cogagent-vqa

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

cogagent-instruct

✘

✘

✘

✘

timm

vision

THUDM/cogagent-vqa-hf

molmoe-1b

LLM-Research/MolmoE-1B-0924

^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).*

molmo

βœ”

✘

✘

✘

transformers>=4.45.0

vision

allenai/MolmoE-1B-0924

molmo-7b-o

LLM-Research/Molmo-7B-O-0924

^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).*

molmo

βœ”

✘

✘

✘

transformers>=4.45.0

vision

allenai/Molmo-7B-O-0924

molmo-7b-d

LLM-Research/Molmo-7B-D-0924

^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).*

molmo

βœ”

✘

✘

✘

transformers>=4.45.0

vision

allenai/Molmo-7B-D-0924

molmo-72b

LLM-Research/Molmo-72B-0924

^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).*

molmo

βœ”

✘

✘

✘

transformers>=4.45.0

vision

allenai/Molmo-72B-0924

emu3-chat

BAAI/Emu3-Chat

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

emu3-chat

βœ”

✘

✘

✘

transformers>=4.44.0

vision

BAAI/Emu3-Chat

florence-2-base

AI-ModelScope/Florence-2-base

^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).*

florence

βœ”

✘

✘

✘

vision

microsoft/Florence-2-base

florence-2-base-ft

AI-ModelScope/Florence-2-base-ft

^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).*

florence

βœ”

✘

✘

✘

vision

microsoft/Florence-2-base-ft

florence-2-large

AI-ModelScope/Florence-2-large

^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).*

florence

βœ”

✘

✘

✘

vision

microsoft/Florence-2-large

florence-2-large-ft

AI-ModelScope/Florence-2-large-ft

^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).*

florence

βœ”

✘

✘

✘

vision

microsoft/Florence-2-large-ft

got-ocr2

stepfun-ai/GOT-OCR2_0

^(model.layers|model.mm_projector_vary)(?!.*(lm_head|output|emb|wte|shared)).*

got_ocr2

βœ”

✘

✘

✘

audio

stepfun-ai/GOT-OCR2_0

Datasets

The table below introduces the datasets supported by SWIFT:

  • Dataset Name: The dataset name registered in SWIFT.

  • Dataset ID: The dataset id in ModelScope.

  • Size: The data row count of the dataset.

  • Statistic: Dataset statistics. We use the number of tokens for statistics, which helps adjust the max_length hyperparameter. We concatenate the training and validation sets of the dataset and then compute the statistics. We use qwen’s tokenizer to tokenize the dataset. Different tokenizers produce different statistics. If you want to obtain token statistics for tokenizers of other models, you can use the script to get them yourself.

Dataset Name Dataset ID Subsets Dataset Size Statistic (token) Tags HF Dataset ID
πŸ”₯ms-bench iic/ms_bench 316820 346.9Β±443.2, min=22, max=30960 chat, general, multi-round -
πŸ”₯alpaca-en AI-ModelScope/alpaca-gpt4-data-en 52002 176.2Β±125.8, min=26, max=740 chat, general vicgalle/alpaca-gpt4
πŸ”₯alpaca-zh AI-ModelScope/alpaca-gpt4-data-zh 48818 162.1Β±93.9, min=26, max=856 chat, general llm-wizard/alpaca-gpt4-data-zh
multi-alpaca damo/nlp_polylm_multialpaca_sft ar
de
es
fr
id
ja
ko
pt
ru
th
vi
131867 112.9Β±50.6, min=26, max=1226 chat, general, multilingual -
instinwild wyj123456/instinwild default
subset
103695 145.4Β±60.7, min=28, max=1434 - -
cot-en YorickHe/CoT 74771 122.7Β±64.8, min=51, max=8320 chat, general -
cot-zh YorickHe/CoT_zh 74771 117.5Β±70.8, min=43, max=9636 chat, general -
instruct-en wyj123456/instruct 888970 269.1Β±331.5, min=26, max=7254 chat, general -
firefly-zh AI-ModelScope/firefly-train-1.1M 1649399 178.1Β±260.4, min=26, max=12516 chat, general YeungNLP/firefly-train-1.1M
gpt4all-en wyj123456/GPT4all 806199 302.7Β±384.5, min=27, max=7391 chat, general -
sharegpt swift/sharegpt common-zh
computer-zh
unknow-zh
common-en
computer-en
96566 933.3Β±864.8, min=21, max=66412 chat, general, multi-round -
tulu-v2-sft-mixture AI-ModelScope/tulu-v2-sft-mixture 5119 520.7Β±437.6, min=68, max=2549 chat, multilingual, general, multi-round allenai/tulu-v2-sft-mixture
wikipedia-zh AI-ModelScope/wikipedia-cn-20230720-filtered 254547 568.4Β±713.2, min=37, max=78678 text-generation, general, pretrained pleisto/wikipedia-cn-20230720-filtered
open-orca AI-ModelScope/OpenOrca 994896 382.3Β±417.4, min=31, max=8740 chat, multilingual, general -
πŸ”₯sharegpt-gpt4 AI-ModelScope/sharegpt_gpt4 default
V3_format
zh_38K_format
72684 1047.6Β±1313.1, min=22, max=66412 chat, multilingual, general, multi-round, gpt4 -
deepctrl-sft AI-ModelScope/deepctrl-sft-data default
en
14149024 389.8Β±628.6, min=21, max=626237 chat, general, sft, multi-round -
πŸ”₯coig-cqia AI-ModelScope/COIG-CQIA chinese_traditional
coig_pc
exam
finance
douban
human_value
logi_qa
ruozhiba
segmentfault
wiki
wikihow
xhs
zhihu
44694 703.8Β±654.2, min=33, max=19288 general -
πŸ”₯ruozhiba AI-ModelScope/ruozhiba post-annual
title-good
title-norm
85658 39.9Β±13.1, min=21, max=559 pretrain -
long-alpaca-12k AI-ModelScope/LongAlpaca-12k 11998 9619.0Β±8295.8, min=36, max=78925 longlora, QA Yukang/LongAlpaca-12k
lmsys-chat-1m AI-ModelScope/lmsys-chat-1m - Dataset is too huge, please click the original link to view the dataset stat. chat, em lmsys/lmsys-chat-1m
πŸ”₯ms-agent iic/ms_agent 26336 650.9Β±217.2, min=209, max=2740 chat, agent, multi-round -
πŸ”₯ms-agent-for-agentfabric AI-ModelScope/ms_agent_for_agentfabric default
addition
30000 617.8Β±199.1, min=251, max=2657 chat, agent, multi-round -
ms-agent-multirole iic/MSAgent-MultiRole 9500 447.6Β±84.9, min=145, max=1101 chat, agent, multi-round, role-play, multi-agent -
πŸ”₯toolbench-for-alpha-umi shenweizhou/alpha-umi-toolbench-processed-v2 backbone
caller
planner
summarizer
1448337 1439.7Β±853.9, min=123, max=18467 chat, agent -
damo-agent-zh damo/MSAgent-Bench 386984 956.5Β±407.3, min=326, max=19001 chat, agent, multi-round -
damo-agent-zh-mini damo/MSAgent-Bench 20845 1326.4Β±329.6, min=571, max=4304 chat, agent, multi-round -
agent-instruct-all-en huangjintao/AgentInstruct_copy alfworld
db
kg
mind2web
os
webshop
1866 1144.3Β±635.5, min=206, max=6412 chat, agent, multi-round -
πŸ”₯msagent-pro iic/MSAgent-Pro 21905 1524.5Β±921.3, min=64, max=16770 chat, agent, multi-round -
toolbench swift/ToolBench 124345 3669.5Β±1600.9, min=1047, max=22581 chat, agent, multi-round -
code-alpaca-en wyj123456/code_alpaca_en 20016 100.2Β±60.1, min=29, max=1776 - sahil2801/CodeAlpaca-20k
πŸ”₯leetcode-python-en AI-ModelScope/leetcode-solutions-python 2359 727.1Β±235.9, min=259, max=2146 chat, coding -
πŸ”₯codefuse-python-en codefuse-ai/CodeExercise-Python-27k 27224 483.6Β±193.9, min=45, max=3082 chat, coding -
πŸ”₯codefuse-evol-instruction-zh codefuse-ai/Evol-instruction-66k 66862 439.6Β±206.3, min=37, max=2983 chat, coding -
medical-en swift/medical_zh en 117617 257.4Β±89.1, min=36, max=2564 chat, medical -
medical-zh swift/medical_zh zh 1950972 167.2Β±219.7, min=26, max=27351 chat, medical -
πŸ”₯disc-med-sft-zh AI-ModelScope/DISC-Med-SFT 441767 354.1Β±193.1, min=25, max=2231 chat, medical Flmc/DISC-Med-SFT
lawyer-llama-zh AI-ModelScope/lawyer_llama_data 21476 194.4Β±91.7, min=27, max=924 chat, law Skepsun/lawyer_llama_data
tigerbot-law-zh AI-ModelScope/tigerbot-law-plugin 55895 109.9Β±126.4, min=37, max=18878 text-generation, law, pretrained TigerResearch/tigerbot-law-plugin
πŸ”₯disc-law-sft-zh AI-ModelScope/DISC-Law-SFT 166758 533.7Β±495.4, min=30, max=15169 chat, law ShengbinYue/DISC-Law-SFT
πŸ”₯blossom-math-zh AI-ModelScope/blossom-math-v2 10000 169.3Β±58.7, min=35, max=563 chat, math Azure99/blossom-math-v2
school-math-zh AI-ModelScope/school_math_0.25M 248480 157.7Β±72.2, min=33, max=3450 chat, math, quality BelleGroup/school_math_0.25M
open-platypus-en AI-ModelScope/Open-Platypus 24926 367.9Β±254.8, min=30, max=3951 chat, math, quality garage-bAInd/Open-Platypus
text2sql-en AI-ModelScope/texttosqlv2_25000_v2 25000 274.6Β±326.4, min=38, max=1975 chat, sql Clinton/texttosqlv2_25000_v2
πŸ”₯sql-create-context-en AI-ModelScope/sql-create-context 78577 80.2Β±17.8, min=36, max=456 chat, sql b-mc2/sql-create-context
synthetic-text-to-sql AI-ModelScope/synthetic_text_to_sql default 100000 283.4Β±115.8, min=61, max=1356 nl2sql, en gretelai/synthetic_text_to_sql
πŸ”₯advertise-gen-zh lvjianjin/AdvertiseGen 98399 130.6Β±21.7, min=51, max=241 text-generation shibing624/AdvertiseGen
πŸ”₯dureader-robust-zh modelscope/DuReader_robust-QG 17899 241.1Β±137.4, min=60, max=1416 text-generation -
cmnli-zh modelscope/clue cmnli 404024 82.6Β±16.6, min=51, max=199 text-generation, classification clue
πŸ”₯jd-sentiment-zh DAMO_NLP/jd 50000 66.0Β±83.2, min=39, max=4039 text-generation, classification -
πŸ”₯hc3-zh simpleai/HC3-Chinese baike
open_qa
nlpcc_dbqa
finance
medicine
law
psychology
39781 176.8Β±81.5, min=57, max=3051 text-generation, classification Hello-SimpleAI/HC3-Chinese
πŸ”₯hc3-en simpleai/HC3 finance
medicine
11021 298.3Β±138.7, min=65, max=2267 text-generation, classification Hello-SimpleAI/HC3
dolly-15k AI-ModelScope/databricks-dolly-15k default 15011 199.2Β±267.8, min=22, max=8615 multi-task, en, quality databricks/databricks-dolly-15k
zhihu-kol OmniData/Zhihu-KOL default - Dataset is too huge, please click the original link to view the dataset stat. zhihu, qa wangrui6/Zhihu-KOL
zhihu-kol-filtered OmniData/Zhihu-KOL-More-Than-100-Upvotes default 271261 952.0Β±1727.2, min=25, max=98658 zhihu, qa bzb2023/Zhihu-KOL-More-Than-100-Upvotes
finance-en wyj123456/finance_en 68911 135.6Β±134.3, min=26, max=3525 chat, financial ssbuild/alpaca_finance_en
poetry-zh modelscope/chinese-poetry-collection 390309 55.2Β±9.4, min=23, max=83 text-generation, poetry -
webnovel-zh AI-ModelScope/webnovel_cn 50000 1478.9Β±11526.1, min=100, max=490484 chat, novel zxbsmk/webnovel_cn
generated-chat-zh AI-ModelScope/generated_chat_0.4M 396004 273.3Β±52.0, min=32, max=873 chat, character-dialogue BelleGroup/generated_chat_0.4M
πŸ”₯self-cognition swift/self-cognition 134 53.6Β±18.6, min=29, max=121 chat, self-cognition modelscope/self-cognition
πŸ”₯swift-mix swift/swift-sft-mixture sharegpt
firefly
codefuse
metamathqa
- Dataset is too huge, please click the original link to view the dataset stat. chat, sft, general -
cls-fudan-news-zh damo/zh_cls_fudan-news 4959 3234.4Β±2547.5, min=91, max=19548 chat, classification -
ner-jave-zh damo/zh_ner-JAVE 1266 118.3Β±45.5, min=44, max=223 chat, ner -
coco-en modelscope/coco_2014_caption coco_2014_caption 454617 299.8Β±2.8, min=295, max=352 chat, multi-modal, vision -
πŸ”₯coco-en-mini modelscope/coco_2014_caption coco_2014_caption 40504 299.8Β±2.6, min=295, max=338 chat, multi-modal, vision -
coco-en-2 modelscope/coco_2014_caption coco_2014_caption 454617 36.8Β±2.8, min=32, max=89 chat, multi-modal, vision -
πŸ”₯coco-en-2-mini modelscope/coco_2014_caption coco_2014_caption 40504 36.8Β±2.6, min=32, max=75 chat, multi-modal, vision -
capcha-images AI-ModelScope/captcha-images 8000 31.0Β±0.0, min=31, max=31 chat, multi-modal, vision -
latex-ocr-print AI-ModelScope/LaTeX_OCR default 17918 362.7Β±34.8, min=294, max=528 chat, ocr, multi-modal, vision linxy/LaTeX_OCR
latex-ocr-handwrite AI-ModelScope/LaTeX_OCR synthetic_handwrite 95424 375.1Β±59.4, min=292, max=2115 chat, ocr, multi-modal, vision linxy/LaTeX_OCR
aishell1-zh speech_asr/speech_asr_aishell1_trainsets 141600 152.2Β±36.8, min=63, max=419 chat, multi-modal, audio -
πŸ”₯aishell1-zh-mini speech_asr/speech_asr_aishell1_trainsets 14526 152.2Β±35.6, min=74, max=359 chat, multi-modal, audio -
πŸ”₯video-chatgpt swift/VideoChatGPT Generic
Temporal
Consistency
3206 88.4Β±48.3, min=32, max=399 chat, multi-modal, video lmms-lab/VideoChatGPT
egoschema AI-ModelScope/egoschema Subset 101 191.6Β±80.7, min=96, max=435 chat, multi-modal, video lmms-lab/egoschema
llava-video-178k lmms-lab/LLaVA-Video-178K 0_30_s_academic_v0_1
0_30_s_youtube_v0_1
1_2_m_academic_v0_1
1_2_m_youtube_v0_1
2_3_m_academic_v0_1
2_3_m_youtube_v0_1
30_60_s_academic_v0_1
30_60_s_youtube_v0_1
- Dataset is too huge, please click the original link to view the dataset stat. chat, multi-modal, video lmms-lab/LLaVA-Video-178K
moviechat-1k-test AI-ModelScope/MovieChat-1K-test 486 36.1Β±4.3, min=27, max=42 chat, multi-modal, video Enxin/MovieChat-1K-test
hh-rlhf AI-ModelScope/hh-rlhf harmless-base
helpful-base
helpful-online
helpful-rejection-sampled
127459 245.4Β±190.7, min=22, max=1999 rlhf, dpo, pairwise -
πŸ”₯hh-rlhf-cn AI-ModelScope/hh_rlhf_cn hh_rlhf
harmless_base_cn
harmless_base_en
helpful_base_cn
helpful_base_en
355920 171.2Β±122.7, min=22, max=3078 rlhf, dpo, pairwise -
orpo-dpo-mix-40k AI-ModelScope/orpo-dpo-mix-40k default 43666 548.3Β±397.4, min=28, max=8483 dpo, orpo, en, quality mlabonne/orpo-dpo-mix-40k
stack-exchange-paired AI-ModelScope/stack-exchange-paired 4483004 534.5Β±594.6, min=31, max=56588 hfrl, dpo, pairwise lvwerra/stack-exchange-paired
shareai-llama3-dpo-zh-en-emoji hjh0119/shareAI-Llama3-DPO-zh-en-emoji default 2449 334.0Β±162.8, min=36, max=1801 rlhf, dpo, pairwise -
ultrafeedback-kto AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto default 230720 11.0Β±0.0, min=11, max=11 rlhf, kto -
rlaif-v swift/RLAIF-V-Dataset default 83132 119.8Β±52.6, min=28, max=556 rlhf, dpo, multi-modal, en openbmb/RLAIF-V-Dataset
pileval swift/pile-val-backup 214670 1612.3Β±8856.2, min=11, max=1208955 text-generation, awq mit-han-lab/pile-val-backup
mantis-instruct swift/Mantis-Instruct birds-to-words
chartqa
coinstruct
contrastive_caption
docvqa
dreamsim
dvqa
iconqa
imagecode
llava_665k_multi
lrv_multi
multi_vqa
nextqa
nlvr2
spot-the-diff
star
visual_story_telling
655351 825.7Β±812.5, min=284, max=13563 chat, multi-modal, vision, quality TIGER-Lab/Mantis-Instruct
llava-data-instruct swift/llava-data llava_instruct 364100 189.0Β±142.1, min=33, max=5183 sft, multi-modal, quality TIGER-Lab/llava-data
midefics swift/MideficsDataset 3800 201.3Β±70.2, min=60, max=454 medical, en, vqa WinterSchool/MideficsDataset
gqa None train_all_instructions - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, vqa, quality lmms-lab/GQA
text-caps swift/TextCaps 18145 38.2Β±4.4, min=31, max=73 multi-modal, en, caption, quality HuggingFaceM4/TextCaps
refcoco-unofficial-caption swift/refcoco 46215 44.7Β±3.2, min=36, max=71 multi-modal, en, caption jxu124/refcoco
refcoco-unofficial-grounding swift/refcoco 46215 45.2Β±3.1, min=37, max=69 multi-modal, en, grounding jxu124/refcoco
refcocog-unofficial-caption swift/refcocog 44799 49.7Β±4.7, min=37, max=88 multi-modal, en, caption jxu124/refcocog
refcocog-unofficial-grounding swift/refcocog 44799 50.1Β±4.7, min=37, max=90 multi-modal, en, grounding jxu124/refcocog
a-okvqa swift/A-OKVQA 18201 45.8Β±7.9, min=32, max=100 multi-modal, en, vqa, quality HuggingFaceM4/A-OKVQA
okvqa swift/OK-VQA_train 9009 34.4Β±3.3, min=28, max=59 multi-modal, en, vqa, quality Multimodal-Fatima/OK-VQA_train
ocr-vqa swift/OCR-VQA 186753 35.6Β±6.6, min=29, max=193 multi-modal, en, ocr-vqa howard-hou/OCR-VQA
grit swift/GRIT - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, caption-grounding, quality zzliang/GRIT
llava-instruct-mix swift/llava-instruct-mix-vsft 13640 179.8Β±120.2, min=30, max=962 multi-modal, en, vqa, quality HuggingFaceH4/llava-instruct-mix-vsft
lnqa swift/lnqa - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, ocr-vqa, quality vikhyatk/lnqa
science-qa swift/ScienceQA 8315 100.3Β±59.5, min=38, max=638 multi-modal, science, vqa, quality derek-thomas/ScienceQA
guanaco AI-ModelScope/GuanacoDataset default 31561 250.1Β±70.3, min=89, max=1436 chat, zh JosephusCheung/GuanacoDataset
mind2web swift/Multimodal-Mind2Web 1009 297522.4Β±325496.2, min=8592, max=3499715 agent, multi-modal osunlp/Multimodal-Mind2Web
sharegpt-4o-image AI-ModelScope/ShareGPT-4o image_caption 57289 638.7Β±157.9, min=47, max=4640 vqa, multi-modal OpenGVLab/ShareGPT-4o
pixelprose swift/pixelprose - Dataset is too huge, please click the original link to view the dataset stat. caption, multi-modal, vision tomg-group-umd/pixelprose
m3it AI-ModelScope/M3IT coco
vqa-v2
shapes
shapes-rephrased
coco-goi-rephrased
snli-ve
snli-ve-rephrased
okvqa
a-okvqa
viquae
textcap
docvqa
science-qa
imagenet
imagenet-open-ended
imagenet-rephrased
coco-goi
clevr
clevr-rephrased
nlvr
coco-itm
coco-itm-rephrased
vsr
vsr-rephrased
mocheg
mocheg-rephrased
coco-text
fm-iqa
activitynet-qa
msrvtt
ss
coco-cn
refcoco
refcoco-rephrased
multi30k
image-paragraph-captioning
visual-dialog
visual-dialog-rephrased
iqa
vcr
visual-mrc
ivqa
msrvtt-qa
msvd-qa
gqa
text-vqa
ocr-vqa
st-vqa
flickr8k-cn
- Dataset is too huge, please click the original link to view the dataset stat. chat, multi-modal, vision -
sharegpt4v AI-ModelScope/ShareGPT4V ShareGPT4V
ShareGPT4V-PT
- Dataset is too huge, please click the original link to view the dataset stat. chat, multi-modal, vision -
llava-instruct-150k AI-ModelScope/LLaVA-Instruct-150K 624610 490.4Β±180.2, min=288, max=5438 chat, multi-modal, vision -
llava-pretrain AI-ModelScope/LLaVA-Pretrain default - Dataset is too huge, please click the original link to view the dataset stat. vqa, multi-modal, quality liuhaotian/LLaVA-Pretrain
sa1b-dense-caption Tongyi-DataEngine/SA1B-Dense-Caption - Dataset is too huge, please click the original link to view the dataset stat. zh, multi-modal, vqa -
sa1b-paired-caption Tongyi-DataEngine/SA1B-Paired-Captions-Images - Dataset is too huge, please click the original link to view the dataset stat. zh, multi-modal, vqa -
alpaca-cleaned AI-ModelScope/alpaca-cleaned 51760 177.9Β±126.4, min=26, max=1044 chat, general, bench, quality yahma/alpaca-cleaned
aya-collection swift/aya_collection aya_dataset 202364 494.0Β±6911.3, min=21, max=3044268 multi-lingual, qa CohereForAI/aya_collection
belle-generated-chat-0.4M AI-ModelScope/generated_chat_0.4M 396004 273.3Β±52.0, min=32, max=873 common, zh BelleGroup/generated_chat_0.4M
belle-math-0.25M AI-ModelScope/school_math_0.25M 248480 157.7Β±72.2, min=33, max=3450 math, zh BelleGroup/school_math_0.25M
belle-train-0.5M-CN AI-ModelScope/train_0.5M_CN 519255 129.1Β±91.5, min=27, max=6507 common, zh, quality BelleGroup/train_0.5M_CN
belle-train-1M-CN AI-ModelScope/train_1M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_1M_CN
belle-train-2M-CN AI-ModelScope/train_2M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_2M_CN
belle-train-3.5M-CN swift/train_3.5M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_3.5M_CN
c4 None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality allenai/c4
chart-qa swift/ChartQA 28299 43.1Β±5.5, min=29, max=77 en, vqa, quality HuggingFaceM4/ChartQA
chinese-c4 swift/chinese-c4 - Dataset is too huge, please click the original link to view the dataset stat. pretrain, zh, quality shjwudp/chinese-c4
cinepile swift/cinepile - Dataset is too huge, please click the original link to view the dataset stat. vqa, en, youtube, video tomg-group-umd/cinepile
classical-chinese-translate swift/classical_chinese_translate 6655 344.0Β±76.4, min=61, max=815 chat, play-ground -
codealpaca-20k AI-ModelScope/CodeAlpaca-20k 20016 100.2Β±60.1, min=29, max=1776 code, en HuggingFaceH4/CodeAlpaca_20K
cosmopedia None auto_math_text
khanacademy
openstax
stanford
stories
web_samples_v1
web_samples_v2
wikihow
- Dataset is too huge, please click the original link to view the dataset stat. multi-domain, en, qa HuggingFaceTB/cosmopedia
cosmopedia-100k swift/cosmopedia-100k 100000 1024.5Β±243.1, min=239, max=2981 multi-domain, en, qa HuggingFaceTB/cosmopedia-100k
dolma swift/dolma v1_7 - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality allenai/dolma
dolphin swift/dolphin flan1m-alpaca-uncensored
flan5m-alpaca-uncensored
- Dataset is too huge, please click the original link to view the dataset stat. en cognitivecomputations/dolphin
duet AI-ModelScope/Duet-v0.5 5000 1157.4Β±189.3, min=657, max=2344 CoT, en G-reen/Duet-v0.5
evol-instruct-v2 AI-ModelScope/WizardLM_evol_instruct_V2_196k 109184 480.9Β±333.1, min=26, max=4942 chat, en WizardLM/WizardLM_evol_instruct_V2_196k
fineweb None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality HuggingFaceFW/fineweb
gen-qa swift/GenQA - Dataset is too huge, please click the original link to view the dataset stat. qa, quality, multi-task tomg-group-umd/GenQA
github-code swift/github-code - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality codeparrot/github-code
gpt4v-dataset swift/gpt4v-dataset 12356 217.9Β±68.3, min=35, max=596 en, caption, multi-modal, quality laion/gpt4v-dataset
guanaco-belle-merge AI-ModelScope/guanaco_belle_merge_v1.0 693987 134.2Β±92.0, min=24, max=6507 QA, zh Chinese-Vicuna/guanaco_belle_merge_v1.0
infinity-instruct swift/Infinity-Instruct - Dataset is too huge, please click the original link to view the dataset stat. qa, quality, multi-task BAAI/Infinity-Instruct
llava-med-zh-instruct swift/llava-med-zh-instruct-60k 56649 207.7Β±67.6, min=37, max=657 zh, medical, vqa BUAADreamer/llava-med-zh-instruct-60k
πŸ”₯longwriter-6k ZhipuAI/LongWriter-6k 6000 4887.2Β±2879.2, min=117, max=30354 long, chat, sft THUDM/LongWriter-6k
πŸ”₯longwriter-6k-filtered swift/longwriter-6k-filtered 666 4108.9Β±2636.9, min=1190, max=17050 long, chat, sft -
math-instruct AI-ModelScope/MathInstruct 262283 254.4Β±183.5, min=11, max=4383 math, cot, en, quality TIGER-Lab/MathInstruct
math-plus TIGER-Lab/MATH-plus train 893929 287.1Β±158.7, min=24, max=2919 qa, math, en, quality TIGER-Lab/MATH-plus
moondream2-coyo-5M swift/moondream2-coyo-5M-captions - Dataset is too huge, please click the original link to view the dataset stat. caption, pretrain, quality isidentical/moondream2-coyo-5M-captions
no-robots swift/no_robots 9485 298.7Β±246.4, min=40, max=6739 multi-task, quality, human-annotated HuggingFaceH4/no_robots
open-hermes swift/OpenHermes-2.5 - Dataset is too huge, please click the original link to view the dataset stat. cot, en, quality teknium/OpenHermes-2.5
open-o1 AI-ModelScope/OpenO1-SFT default 203579 615.5Β±659.6, min=11, max=27509 chat, general, o1 O1-OPEN/OpenO1-SFT
open-orca-chinese AI-ModelScope/OpenOrca-Chinese - Dataset is too huge, please click the original link to view the dataset stat. QA, zh, general, quality yys/OpenOrca-Chinese
orca_dpo_pairs swift/orca_dpo_pairs 12859 366.9Β±251.9, min=30, max=2010 rlhf, quality Intel/orca_dpo_pairs
path-vqa swift/path-vqa 19654 34.8Β±7.3, min=27, max=85 multi-modal, vqa, medical flaviagiammarino/path-vqa
pile AI-ModelScope/pile - Dataset is too huge, please click the original link to view the dataset stat. pretrain EleutherAI/pile
poison-mpts iic/100PoisonMpts 906 150.6Β±80.8, min=39, max=656 poison-management, zh -
πŸ”₯qwen2-pro-en AI-ModelScope/Magpie-Qwen2-Pro-200K-English 200000 605.4Β±287.3, min=221, max=4267 chat, sft, en Magpie-Align/Magpie-Qwen2-Pro-200K-English
πŸ”₯qwen2-pro-filtered AI-ModelScope/Magpie-Qwen2-Pro-300K-Filtered 300000 555.8Β±286.6, min=148, max=4267 chat, sft Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered
πŸ”₯qwen2-pro-zh AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese 200000 446.2Β±246.4, min=74, max=4101 chat, sft, zh Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
redpajama-data-1t swift/RedPajama-Data-1T - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality togethercomputer/RedPajama-Data-1T
redpajama-data-v2 swift/RedPajama-Data-V2 - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality togethercomputer/RedPajama-Data-V2
refinedweb None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality tiiuae/falcon-refinedweb
rwkv-pretrain-web mapjack/openwebtext_dataset - Dataset is too huge, please click the original link to view the dataset stat. pretrain, zh, quality -
sft-nectar AI-ModelScope/SFT-Nectar 131192 396.4Β±272.1, min=44, max=10732 cot, en, quality AstraMindAI/SFT-Nectar
skypile AI-ModelScope/SkyPile-150B - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality, zh Skywork/SkyPile-150B
slim-orca swift/SlimOrca 517982 399.1Β±370.2, min=35, max=8756 quality, en Open-Orca/SlimOrca
slim-pajama-627b None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality cerebras/SlimPajama-627B
starcoder AI-ModelScope/starcoderdata - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality bigcode/starcoderdata
tagengo-gpt4 swift/tagengo-gpt4 78057 472.3Β±292.9, min=22, max=3521 chat, multi-lingual, quality lightblue/tagengo-gpt4
the-stack AI-ModelScope/the-stack - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality bigcode/the-stack
ultrachat-200k swift/ultrachat_200k 207865 1195.4Β±573.7, min=76, max=4470 chat, en, quality HuggingFaceH4/ultrachat_200k
vqa-v2 swift/VQAv2 443757 31.8Β±2.2, min=27, max=58 en, vqa, quality HuggingFaceM4/VQAv2
web-instruct-sub swift/WebInstructSub - Dataset is too huge, please click the original link to view the dataset stat. qa, en, math, quality, multi-domain, science TIGER-Lab/WebInstructSub
wikipedia swift/wikipedia - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality wikipedia
wikipedia-cn-filtered AI-ModelScope/wikipedia-cn-20230720-filtered - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality pleisto/wikipedia-cn-20230720-filtered
zhihu-rlhf AI-ModelScope/zhihu_rlhf_3k 3460 594.5Β±365.9, min=31, max=1716 rlhf, dpo, zh liyucheng/zhihu_rlhf_3k