diff --git a/README.md b/README.md index faaa1cba3..f81f15a0c 100644 --- a/README.md +++ b/README.md @@ -26,6 +26,7 @@ ______________________________________________________________________
2024 +- \[2024/07\] Support llama3.1 - \[2024/07\] Support [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) full-series models, [InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md) and [function call](docs/en/serving/api_server_tools.md) of InternLM2.5 - \[2024/06\] PyTorch engine support DeepSeek-V2 and several VLMs, such as CogVLM2, Mini-InternVL, LlaVA-Next - \[2024/05\] Balance vision model when deploying VLMs with multiple GPUs @@ -110,6 +111,7 @@ For detailed inference benchmarks in more devices and more settings, please refe
  • Llama (7B - 65B)
  • Llama2 (7B - 70B)
  • Llama3 (8B, 70B)
  • +
  • Llama3.1 (8B)
  • InternLM (7B - 20B)
  • InternLM2 (7B - 20B)
  • InternLM2.5 (7B)
  • diff --git a/README_zh-CN.md b/README_zh-CN.md index 4805f970c..6d029954e 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -26,6 +26,7 @@ ______________________________________________________________________
    2024 +- \[2024/07\] 支持 llama3.1 - \[2024/07\] 支持 [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) 全系列模型,[InternLM-XComposer2.5](docs/zh_cn/multi_modal/xcomposer2d5.md) 模型和 InternLM2.5 的 [function call 功能](docs/zh_cn/serving/api_server_tools.md) - \[2024/06\] PyTorch engine 支持了 DeepSeek-V2 和若干 VLM 模型推理, 比如 CogVLM2,Mini-InternVL,LlaVA-Next - \[2024/05\] 在多 GPU 上部署 VLM 模型时,支持把视觉部分的模型均分到多卡上 @@ -111,6 +112,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
  • Llama (7B - 65B)
  • Llama2 (7B - 70B)
  • Llama3 (8B, 70B)
  • +
  • Llama3.1 (8B)
  • InternLM (7B - 20B)
  • InternLM2 (7B - 20B)
  • InternLM2.5 (7B)
  • diff --git a/docs/en/supported_models/supported_models.md b/docs/en/supported_models/supported_models.md index 13f805d9d..63aa94c64 100644 --- a/docs/en/supported_models/supported_models.md +++ b/docs/en/supported_models/supported_models.md @@ -7,6 +7,7 @@ | Llama | 7B - 65B | Yes | Yes | Yes | Yes | | Llama2 | 7B - 70B | Yes | Yes | Yes | Yes | | Llama3 | 8B, 70B | Yes | Yes | Yes | Yes | +| Llama3.1 | 8B | Yes | Yes | Yes | Yes | | InternLM | 7B - 20B | Yes | Yes | Yes | Yes | | InternLM2 | 7B - 20B | Yes | Yes | Yes | Yes | | InternLM2.5 | 7B | Yes | Yes | Yes | Yes | @@ -44,6 +45,7 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha | Llama | 7B - 65B | Yes | No | Yes | | Llama2 | 7B - 70B | Yes | No | Yes | | Llama3 | 8B, 70B | Yes | No | Yes | +| Llama3.1 | 8B | Yes | No | - | | InternLM | 7B - 20B | Yes | No | Yes | | InternLM2 | 7B - 20B | Yes | No | - | | InternLM2.5 | 7B | Yes | No | - | diff --git a/docs/zh_cn/supported_models/supported_models.md b/docs/zh_cn/supported_models/supported_models.md index cac611acb..72628558a 100644 --- a/docs/zh_cn/supported_models/supported_models.md +++ b/docs/zh_cn/supported_models/supported_models.md @@ -7,6 +7,7 @@ | Llama | 7B - 65B | Yes | Yes | Yes | Yes | | Llama2 | 7B - 70B | Yes | Yes | Yes | Yes | | Llama3 | 8B, 70B | Yes | Yes | Yes | Yes | +| Llama3.1 | 8B | Yes | Yes | Yes | Yes | | InternLM | 7B - 20B | Yes | Yes | Yes | Yes | | InternLM2 | 7B - 20B | Yes | Yes | Yes | Yes | | InternLM2.5 | 7B | Yes | Yes | Yes | Yes | @@ -44,6 +45,7 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att | Llama | 7B - 65B | Yes | No | Yes | | Llama2 | 7B - 70B | Yes | No | Yes | | Llama3 | 8B, 70B | Yes | No | Yes | +| Llama3.1 | 8B | Yes | No | - | | InternLM | 7B - 20B | Yes | No | Yes | | InternLM2 | 7B - 20B | Yes | No | - | | InternLM2.5 | 7B | Yes | No | - |