# LmDeploy Inference Acceleration and Deployment lmdeploy github: [https://github.com/InternLM/lmdeploy](https://github.com/InternLM/lmdeploy). MLLM that support inference acceleration using lmdeploy can be found at [Supported Models](../Instruction/Supported-models-datasets.md#MLLM). ## Table of Contents - [Environment Preparation](#environment-preparation) - [Inference Acceleration](#inference-acceleration) - [Deployment](#deployment) ## Environment Preparation GPU devices: A10, 3090, V100, A100 are all supported. ```bash # Set pip global mirror (speeds up downloads) pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/ # Install ms-swift git clone https://github.com/modelscope/swift.git cd swift pip install -e '.[llm]' # There is a correspondence between lmdeploy and CUDA versions. Please follow the installation instructions at `https://github.com/InternLM/lmdeploy#installation`. pip install lmdeploy ``` ## Inference Acceleration ### Using Python ```python import os os.environ['CUDA_VISIBLE_DEVICES'] = '0' # from swift.hub import HubApi # _api = HubApi() # _api.login('') # https://modelscope.cn/my/myaccesstoken from swift.llm import ( ModelType, get_lmdeploy_engine, get_default_template_type, get_template, inference_lmdeploy, inference_stream_lmdeploy ) # ModelType.qwen_vl_chat, ModelType.deepseek_vl_1_3b_chat # ModelType.internlm_xcomposer2_5_7b_chat, ModelType.minicpm_v_v2_5_chat model_type = ModelType.internvl2_2b model_id_or_path = None lmdeploy_engine = get_lmdeploy_engine(model_type, model_id_or_path=model_id_or_path) template_type = get_default_template_type(model_type) template = get_template(template_type, lmdeploy_engine.hf_tokenizer) lmdeploy_engine.generation_config.max_new_tokens = 256 generation_info = {} request_list = [{'query': 'Describe the image.', 'images': ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png']}, {'query': 'who are you?'}, {'query': ( 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png' 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png' 'What is the difference bewteen the two images?' )}] resp_list = inference_lmdeploy(lmdeploy_engine, template, request_list, generation_info=generation_info) for request, resp in zip(request_list, resp_list): print(f"query: {request['query']}") print(f"response: {resp['response']}") print(generation_info) # stream request_list = [{'query': '