diff --git a/docs/en/get_started/index.rst b/docs/en/get_started/index.rst index 4343ee9ab1..a611f93cae 100644 --- a/docs/en/get_started/index.rst +++ b/docs/en/get_started/index.rst @@ -6,3 +6,9 @@ On Other Platforms :caption: NPU(Huawei) ascend/get_started.md + +.. toctree:: + :maxdepth: 1 + :caption: PPU + + ppu/get_started.md diff --git a/docs/en/get_started/ppu/get_started.md b/docs/en/get_started/ppu/get_started.md new file mode 100644 index 0000000000..ab636990f8 --- /dev/null +++ b/docs/en/get_started/ppu/get_started.md @@ -0,0 +1,74 @@ +# Get Started with PPU + +The usage of lmdeploy on a ppu device is almost the same as its usage on CUDA with PytorchEngine in lmdeploy. +Please read the original [Get Started](../get_started.md) guide before reading this tutorial. + +## Installation + +Please refer to [dlinfer installation guide](https://github.com/DeepLink-org/dlinfer#%E5%AE%89%E8%A3%85%E6%96%B9%E6%B3%95). + +## Offline batch inference + +> \[!TIP\] +> Graph mode is supported on ppu. +> Users can set `eager_mode=False` to enable graph mode, or set `eager_mode=True` to disable graph mode. + +### LLM inference + +Set `device_type="ppu"` in the `PytorchEngineConfig`: + +```python +from lmdeploy import pipeline +from lmdeploy import PytorchEngineConfig + +pipe = pipeline("internlm/internlm2_5-7b-chat", + backend_config=PytorchEngineConfig(tp=1, device_type="ppu", eager_mode=False)) +question = ['Hi, pls intro yourself', 'Shanghai is'] +response = pipe(question) +print(response) +``` + +### VLM inference + +Set `device_type="ppu"` in the `PytorchEngineConfig`: + +```python +from lmdeploy import pipeline, PytorchEngineConfig +from lmdeploy.vl import load_image + +pipe = pipeline('OpenGVLab/InternVL2-2B', + backend_config=PytorchEngineConfig(tp=1, device_type='ppu', eager_mode=False)) +image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') +response = pipe(('describe this image', image)) +print(response) +``` + +## Online serving + +> \[!TIP\] +> Graph mode is supported on ppu. +> Graph mode is default enabled in online serving. Users can add `--eager-mode` to disable graph mode. + +### Serve an LLM model + +Add `--device ppu` in the serve command. + +```bash +lmdeploy serve api_server --backend pytorch --device ppu --eager-mode internlm/internlm2_5-7b-chat +``` + +### Serve a VLM model + +Add `--device ppu` in the serve command + +```bash +lmdeploy serve api_server --backend pytorch --device ppu --eager-mode OpenGVLab/InternVL2-2B +``` + +## Inference with Command line Interface + +Add `--device ppu` in the serve command. + +```bash +lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ppu --eager-mode +``` diff --git a/docs/en/supported_models/supported_models.md b/docs/en/supported_models/supported_models.md index 1724615573..2a57c3d3e8 100644 --- a/docs/en/supported_models/supported_models.md +++ b/docs/en/supported_models/supported_models.md @@ -141,3 +141,23 @@ The following tables detail the models supported by LMDeploy's TurboMind engine | InternVL3 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes | | CogVLM2-chat | 19B | MLLM | Yes | No | - | - | - | | GLM4V | 9B | MLLM | Yes | No | - | - | - | + +## PyTorchEngine on PPU + +| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) | +| :------------: | :-------: | :--: | :--------------: | :--------------: | +| Llama2 | 7B - 70B | LLM | Yes | Yes | +| Llama3 | 8B | LLM | Yes | Yes | +| Llama3.1 | 8B | LLM | Yes | Yes | +| InternLM2 | 7B - 20B | LLM | Yes | Yes | +| InternLM2.5 | 7B - 20B | LLM | Yes | Yes | +| InternLM3 | 8B | LLM | Yes | Yes | +| Mixtral | 8x7B | LLM | Yes | Yes | +| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | +| QWen2(.5) | 7B | LLM | Yes | Yes | +| QWen2-MoE | A14.57B | LLM | Yes | Yes | +| QWen3 | 0.6B-235B | LLM | Yes | Yes | +| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | +| InternVL2 | 1B-40B | MLLM | Yes | Yes | +| InternVL2.5 | 1B-78B | MLLM | Yes | Yes | +| InternVL3 | 1B-78B | MLLM | Yes | Yes | diff --git a/docs/zh_cn/get_started/ascend/get_started.md b/docs/zh_cn/get_started/ascend/get_started.md index e076e09fe5..6f1d9fc824 100644 --- a/docs/zh_cn/get_started/ascend/get_started.md +++ b/docs/zh_cn/get_started/ascend/get_started.md @@ -1,6 +1,6 @@ # 华为昇腾(Atlas 800T A2 & Atlas 300I Duo) -我们基于 LMDeploy 的 PytorchEngine,增加了华为昇腾设备的支持。所以,在华为昇腾上使用 LDMeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)。 +我们基于 LMDeploy 的 PytorchEngine,增加了华为昇腾设备的支持。所以,在华为昇腾上使用 LMDeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)。 支持的模型列表在[这里](../../supported_models/supported_models.md#PyTorchEngine-华为昇腾平台). diff --git a/docs/zh_cn/get_started/index.rst b/docs/zh_cn/get_started/index.rst index 35affc13ce..e1e91f8408 100644 --- a/docs/zh_cn/get_started/index.rst +++ b/docs/zh_cn/get_started/index.rst @@ -6,3 +6,9 @@ :caption: NPU(Huawei) ascend/get_started.md + +.. toctree:: + :maxdepth: 1 + :caption: PPU + + ppu/get_started.md diff --git a/docs/zh_cn/get_started/ppu/get_started.md b/docs/zh_cn/get_started/ppu/get_started.md new file mode 100644 index 0000000000..256f89e2a6 --- /dev/null +++ b/docs/zh_cn/get_started/ppu/get_started.md @@ -0,0 +1,72 @@ +# 在阿里平头哥上快速开始 + +我们基于 LMDeploy 的 PytorchEngine,增加了平头哥设备的支持。所以,在平头哥上使用 LMDeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)。 + +## 安装 + +安装请参考 [dlinfer 安装方法](https://github.com/DeepLink-org/dlinfer#%E5%AE%89%E8%A3%85%E6%96%B9%E6%B3%95)。 + +## 离线批处理 + +> \[!TIP\] +> 图模式已支持。用户可以设定`eager_mode=False`来开启图模式,或者设定`eager_mode=True`来关闭图模式。 + +### LLM 推理 + +将`device_type="ppu"`加入`PytorchEngineConfig`的参数中。 + +```python +from lmdeploy import pipeline +from lmdeploy import PytorchEngineConfig + +pipe = pipeline("internlm/internlm2_5-7b-chat", + backend_config=PytorchEngineConfig(tp=1, device_type="ppu", eager_mode=True)) +question = ["Shanghai is", "Please introduce China", "How are you?"] +response = pipe(question) +print(response) +``` + +### VLM 推理 + +将`device_type="ppu"`加入`PytorchEngineConfig`的参数中。 + +```python +from lmdeploy import pipeline, PytorchEngineConfig +from lmdeploy.vl import load_image + +pipe = pipeline('OpenGVLab/InternVL2-2B', + backend_config=PytorchEngineConfig(tp=1, device_type='ppu', eager_mode=True)) +image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') +response = pipe(('describe this image', image)) +print(response) +``` + +## 在线服务 + +> \[!TIP\] +> 图模式已支持。 +> 在线服务时,图模式默认开启,用户可以添加`--eager-mode`来关闭图模式。 + +### LLM 模型服务 + +将`--device ppu`加入到服务启动命令中。 + +```bash +lmdeploy serve api_server --backend pytorch --device ppu --eager-mode internlm/internlm2_5-7b-chat +``` + +### VLM 模型服务 + +将`--device ppu`加入到服务启动命令中。 + +```bash +lmdeploy serve api_server --backend pytorch --device ppu --eager-mode OpenGVLab/InternVL2-2B +``` + +## 使用命令行与LLM模型对话 + +将`--device ppu`加入到服务启动命令中。 + +```bash +lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ppu --eager-mode +``` diff --git a/docs/zh_cn/supported_models/supported_models.md b/docs/zh_cn/supported_models/supported_models.md index 76e0cfb38f..903f4bfb23 100644 --- a/docs/zh_cn/supported_models/supported_models.md +++ b/docs/zh_cn/supported_models/supported_models.md @@ -141,3 +141,23 @@ | InternVL3 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes | | CogVLM2-chat | 19B | MLLM | Yes | No | - | - | - | | GLM4V | 9B | MLLM | Yes | No | - | - | - | + +## PyTorchEngine 阿里平头哥平台 + +| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) | +| :------------: | :-------: | :--: | :--------------: | :--------------: | +| Llama2 | 7B - 70B | LLM | Yes | Yes | +| Llama3 | 8B | LLM | Yes | Yes | +| Llama3.1 | 8B | LLM | Yes | Yes | +| InternLM2 | 7B - 20B | LLM | Yes | Yes | +| InternLM2.5 | 7B - 20B | LLM | Yes | Yes | +| InternLM3 | 8B | LLM | Yes | Yes | +| Mixtral | 8x7B | LLM | Yes | Yes | +| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | +| QWen2(.5) | 7B | LLM | Yes | Yes | +| QWen2-MoE | A14.57B | LLM | Yes | Yes | +| QWen3 | 0.6B-235B | LLM | Yes | Yes | +| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | +| InternVL2 | 1B-40B | MLLM | Yes | Yes | +| InternVL2.5 | 1B-78B | MLLM | Yes | Yes | +| InternVL3 | 1B-78B | MLLM | Yes | Yes |