OLLaMA Export Documentation

SWIFT now supports exporting OLLaMA Model files, integrated into the swift export command.

Contents

Environment Setup

# Set pip global mirror (to speed up downloads)
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# Install ms-swift
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'

No additional modules are needed for OLLaMA export, as SWIFT only exports the ModelFile. Users can handle subsequent operations.

Export

The OLLaMA export command line is as follows:

# model_type
swift export --model_type llama3-8b-instruct --to_ollama true --ollama_output_dir llama3-8b-instruct-ollama
# ckpt_dir, note that for lora training, add --merge_lora true
swift export --ckpt_dir /mnt/workspace/yzhao/tastelikefeet/swift/output/qwen-7b-chat/v141-20240331-110833/checkpoint-10942 --to_ollama true --ollama_output_dir qwen-7b-chat-ollama --merge_lora true

After execution, the following log will be printed:

[INFO:swift] Exporting to ollama:
[INFO:swift] If you have a gguf file, try to pass the file by :--gguf_file /xxx/xxx.gguf, else SWIFT will use the original(merged) model dir
[INFO:swift] Downloading the model from ModelScope Hub, model_id: LLM-Research/Meta-Llama-3-8B-Instruct
[WARNING:modelscope] Authentication has expired, please re-login with modelscope login --token "YOUR_SDK_TOKEN" if you need to access private models or datasets.
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /mnt/workspace/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct
[INFO:swift] Save Modelfile done, you can start ollama by:
[INFO:swift] > ollama serve
[INFO:swift] In another terminal:
[INFO:swift] > ollama create my-custom-model -f /mnt/workspace/yzhao/tastelikefeet/swift/llama3-8b-instruct-ollama/Modelfile
[INFO:swift] > ollama run my-custom-model
[INFO:swift] End time of running main: 2024-08-09 17:17:48.768722

Check the Modelfile:

FROM /mnt/workspace/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct
TEMPLATE """{{ if .System }}<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ else }}<|begin_of_text|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ end }}{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|eot_id|>"
PARAMETER temperature 0.3
PARAMETER top_k 20
PARAMETER top_p 0.7
PARAMETER repeat_penalty 1.0

Users can modify the generated file for subsequent inference.

Using OLLaMA

To use the above file, install OLLaMA:

# https://github.com/ollama/ollama
curl -fsSL https://ollama.com/install.sh | sh

Start OLLaMA:

ollama serve

In another terminal, run:

ollama create my-custom-model -f /mnt/workspace/yzhao/tastelikefeet/swift/llama3-8b-instruct-ollama/Modelfile

The following log will be printed after execution:

transferring model data
unpacking model metadata
processing tensors
converting model
creating new layer sha256:37b0404fb276acb2e5b75f848673566ce7048c60280470d96009772594040706
creating new layer sha256:2ecd014a372da71016e575822146f05d89dc8864522fdc88461c1e7f1532ba06
creating new layer sha256:ddc2a243c4ec10db8aed5fbbc5ac82a4f8425cdc4bd3f0c355373a45bc9b6cb0
creating new layer sha256:fc776bf39fa270fa5e2ef7c6782068acd858826e544fce2df19a7a8f74f3f9df
writing manifest
success

You can then use the command name for inference:

ollama run my-custom-model
>>> who are you?
I'm LLaMA, a large language model trained by a team of researchers at Meta AI. My primary function is to understand and respond to human
input in a helpful and informative way. I'm a type of AI designed to simulate conversation, answer questions, and even generate text based
on a given prompt or topic.

I'm not a human, but rather a computer program designed to mimic human-like conversation. I don't have personal experiences, emotions, or
physical presence, but I'm here to provide information, answer your questions, and engage in conversation to the best of my abilities.

I'm constantly learning and improving my responses based on the interactions I have with users like you, so please bear with me if I make
any mistakes or don't quite understand what you're asking. I'm here to help and provide assistance, so feel free to ask me anything!

Points to Note

  1. Some models may report an error during:

ollama create my-custom-model -f /mnt/workspace/yzhao/tastelikefeet/swift/qwen-7b-chat-ollama/Modelfile

Error message:

Error: Models based on 'QWenLMHeadModel' are not yet supported

This is because the conversion in OLLaMA does not support all types of models. You can perform gguf export yourself and modify the FROM field in the Modelfile:

# Detailed conversion steps can be found at: https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/README.md
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
# The model directory can be found in the `swift export` command log, similar to:
# Using model_dir: /mnt/workspace/yzhao/tastelikefeet/swift/output/qwen-7b-chat/v141-20240331-110833/checkpoint-10942-merged
python convert_hf_to_gguf.py /mnt/workspace/yzhao/tastelikefeet/swift/output/qwen-7b-chat/v141-20240331-110833/checkpoint-10942-merged

Then re-execute:

ollama create my-custom-model -f /mnt/workspace/yzhao/tastelikefeet/swift/qwen-7b-chat-ollama/Modelfile