
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance. LightLLM harnesses the strengths of numerous well-regarded open-source implementations, including but not limited to FasterTransformer, TGI, vLLM, and FlashAttention.
English Docs | 中文文档 | Blogs
- [2025/02] 🔥 LightLLM v1.0.0 release, achieving the fastest DeepSeek-R1 serving performance on single H200 machine.
Learn more in the release blogs: v1.0.0 blog.
Please refer to the FAQ for more information.
We welcome any coopoeration and contribution. If there is a project requires lightllm's support, please contact us via email or create a pull request.
-
LazyLLM: Easyest and lazyest way for building multi-agent LLMs applications.
Once you have installed
lightllm
andlazyllm
, and then you can use the following code to build your own chatbot:from lazyllm import TrainableModule, deploy, WebModule # Model will be download automatically if you have an internet connection m = TrainableModule('internlm2-chat-7b').deploy_method(deploy.lightllm) WebModule(m).start().wait()
Documents: https://lazyllm.readthedocs.io/
For further information and discussion, join our discord server. Welcome to be a member and look forward to your contribution!
This repository is released under the Apache-2.0 license.
We learned a lot from the following projects when developing LightLLM.