Skip to content

Commit ed34935

Browse files
authored
Add SGLang Ascend doc (#90)
* Add sglang doc
1 parent 69985a3 commit ed34935

File tree

5 files changed

+325
-1
lines changed

5 files changed

+325
-1
lines changed

_static/images/sglang.png

393 KB
Loading

index.rst

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
sources/lm_deploy/index.rst
4040
sources/torchchat/index.rst
4141
sources/torchtitan/index.rst
42+
sources/sglang/index.rst
4243

4344

4445
选择您的偏好,并按照 :doc:`快速安装昇腾环境<sources/ascend/quick_install>` 的安装指导进行操作。
@@ -392,6 +393,24 @@
392393
<span class="split">|</span>
393394
<a href="sources/torchtitan/quick_start.html">快速上手</a>
394395
</div>
395-
</div>
396+
</div>
397+
<!-- Card 20 -->
398+
<div class="box rounded-lg p-4 flex flex-col items-center">
399+
<div class="flex items-center mb-4">
400+
<div class="img w-16 h-16 rounded-md mr-4" style="background-image: url('_static/images/sglang.png')"></div>
401+
<div>
402+
<h2 class="text-lg font-semibold">SGLang</h2>
403+
<p class="text-gray-600 desc">用于LLM和VLM的高速服务框架</p>
404+
</div>
405+
</div>
406+
<div class="flex-grow"></div>
407+
<div class="flex space-x-4 text-blue-600">
408+
<a href="https://github.com/sgl-project/sglang">官方链接</a>
409+
<span class="split">|</span>
410+
<a href="sources/sglang/install.html">安装指南</a>
411+
<span class="split">|</span>
412+
<a href="sources/sglang/quick_start.html">快速上手</a>
413+
</div>
414+
</div>
396415
</div>
397416
</div>

sources/sglang/index.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
SGLang
2+
============
3+
4+
.. toctree::
5+
:maxdepth: 2
6+
7+
install.rst
8+
quick_start.rst

sources/sglang/install.rst

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
安装指南
2+
==============
3+
4+
本教程面向使用 SGLang & 昇腾的开发者,帮助完成昇腾环境下 SGLang 的安装。截至 2025 年 9 月,该项目涉及的如下组件正在活跃开发中,建议使用最新版本,并注意版本以及设备兼容性。
5+
6+
昇腾环境安装
7+
------------
8+
9+
请根据已有昇腾产品型号及 CPU 架构等按照 :doc:`快速安装昇腾环境指引 <../ascend/quick_install>` 进行昇腾环境安装。
10+
11+
.. warning::
12+
CANN 推荐版本为 8.2.RC1 以上,安装 CANN 时,请同时安装 Kernel 算子包以及 nnal ARM 平台加速库软件包。
13+
14+
15+
SGLang 安装
16+
----------------------
17+
18+
方法1:使用源码安装 SGLang
19+
~~~~~~~~~~~~~~~~~~~~~~
20+
21+
22+
Python 环境创建
23+
^^^^^^^^^^^^^^^^^^^^^^
24+
25+
.. code-block:: shell
26+
:linenos:
27+
28+
# Create a new conda environment, and only python 3.11 is supported
29+
conda create --name sglang_npu python=3.11
30+
# Activate the virtual environment
31+
conda activate sglang_npu
32+
33+
安装 python 依赖
34+
^^^^^^^^^^^^^^^^^^^^^^
35+
36+
.. code-block:: shell
37+
:linenos:
38+
39+
pip install attrs==24.2.0 numpy==1.26.4 scipy==1.13.1 decorator==5.1.1 psutil==6.0.0 pytest==8.3.2 pytest-xdist==3.6.1 pyyaml
40+
41+
42+
MemFabric Adaptor 安装
43+
^^^^^^^^^^^^^^^^^^^^^^
44+
45+
MemFabric Adaptor 是 Mooncake Transfer Engine 在昇腾 NPU 集群上实现 KV cache 传输的替代方案。
46+
47+
48+
目前,MemFabric Adaptor 仅支持 aarch64 架构的设备。请根据实际架构选择安装:
49+
50+
.. code-block:: shell
51+
:linenos:
52+
53+
MF_WHL_NAME="mf_adapter-1.0.0-cp311-cp311-linux_aarch64.whl"
54+
MEMFABRIC_URL="https://sglang-ascend.obs.cn-east-3.myhuaweicloud.com/sglang/${MF_WHL_NAME}"
55+
wget -O "${MF_WHL_NAME}" "${MEMFABRIC_URL}" && pip install "./${MF_WHL_NAME}"
56+
57+
58+
torch-npu 安装
59+
^^^^^^^^^^^^^^^^^^^^^^
60+
61+
按照 :doc:`torch-npu 安装指引 <../pytorch/install>` 本项目由于 NPUGraph 和 Triton-Ascend 的限制,目前仅支持安装 2.6.0 版本 torch 和 torch-npu,后续会推出更通用的版本方案。
62+
63+
.. code-block:: shell
64+
:linenos:
65+
66+
# Install torch 2.6.0 and torchvision 0.21.0 on CPU only
67+
PYTORCH_VERSION=2.6.0
68+
TORCHVISION_VERSION=0.21.0
69+
pip install torch==$PYTORCH_VERSION torchvision==$TORCHVISION_VERSION --index-url https://download.pytorch.org/whl/cpu
70+
71+
# Install torch_npu 2.6.0 or you can just pip install torch_npu==2.6.0
72+
PTA_VERSION="v7.1.0.2-pytorch2.6.0"
73+
PTA_NAME="torch_npu-2.6.0.post2-cp311-cp311-manylinux_2_28_aarch64.whl"
74+
PTA_URL="https://gitcode.com/ascend/pytorch/releases/download/${PTA_VERSION}/${PTA_WHL_NAME}"
75+
wget -O "${PTA_NAME}" "${PTA_URL}" && pip install "./${PTA_NAME}"
76+
77+
安装完成后,可以通过以下代码验证 torch_npu 是否安装成功:
78+
79+
.. code-block:: shell
80+
:linenos:
81+
82+
import torch
83+
# import torch_npu # In torch 2.6.0,no need to import torch_npu explicitly
84+
85+
x = torch.randn(2, 2).npu()
86+
y = torch.randn(2, 2).npu()
87+
z = x.mm(y)
88+
89+
print(z)
90+
91+
程序能够成功打印矩阵 Z 的值即为安装成功。
92+
93+
vLLM 安装
94+
^^^^^^^^^^^^^^^^^^^^^^
95+
96+
vLLM 目前仍是昇腾 NPU 上的一个主要前提条件。基于 torch==2.6.0 版本,vLLM 需要从源码编译安装 v0.8.5 版本。
97+
98+
.. code-block:: shell
99+
:linenos:
100+
101+
VLLM_TAG=v0.8.5
102+
git clone --depth 1 https://github.com/vllm-project/vllm.git --branch $VLLM_TAG
103+
cd vllm
104+
VLLM_TARGET_DEVICE="empty" pip install -v -e .
105+
cd ..
106+
107+
Triton-Ascend 安装
108+
^^^^^^^^^^^^^^^^^^^^^^
109+
110+
Triton Ascend还在频繁更新。为能使用最新功能特性,建议拉取代码进行源码安装。详细安装步骤请参考 `安装指南 <https://gitcode.com/Ascend/triton-ascend/blob/master/docs/sources/getting-started/installation.md>`_。
111+
112+
或者选择安装 Triton Ascend nightly 包:
113+
114+
.. code-block:: shell
115+
:linenos:
116+
117+
pip install -i https://test.pypi.org/simple/ "triton-ascend<3.2.0rc" --pre --no-cache-dir
118+
119+
120+
安装 Deep-ep 与 sgl-kernel-npu:
121+
^^^^^^^^^^^^^^^^^^^^^^
122+
123+
.. code-block:: shell
124+
:linenos:
125+
126+
pip install wheel==0.45.1
127+
git clone https://github.com/sgl-project/sgl-kernel-npu.git
128+
129+
# Add environment variables
130+
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/runtime/lib64/stub:$LD_LIBRARY_PATH
131+
source /usr/local/Ascend/ascend-toolkit/set_env.sh
132+
cd sgl-kernel-npu
133+
134+
# Compile and install deep-ep, sgl-kernel-npu
135+
bash build.sh
136+
pip install output/deep_ep*.whl output/sgl_kernel_npu*.whl --no-cache-dir
137+
cd ..
138+
rm -rf sgl-kernel-npu
139+
140+
# Link to the deep_ep_cpp.*.so file
141+
cd "$(pip show deep-ep | grep -E '^Location:' | awk '{print $2}')" && ln -s deep_ep/deep_ep_cpp*.so
142+
143+
144+
源码安装 SGLang:
145+
^^^^^^^^^^^^^^^^^^^^^^
146+
147+
.. code-block:: shell
148+
:linenos:
149+
150+
# Use the last release branch
151+
git clone -b v0.5.3rc0 https://github.com/sgl-project/sglang.git
152+
cd sglang
153+
154+
pip install --upgrade pip
155+
# Install SGLang with NPU support
156+
pip install -e python[srt_npu]
157+
cd ..
158+
159+
160+
161+
方法2:使用 docker 镜像安装 SGLang
162+
~~~~~~~~~~~~~~~~~~~~~~
163+
164+
注意:--privileged 和 --network=host 是 RDMA 所必需的,而 RDMA 通常也是 Ascend NPU 集群的必备组件。
165+
166+
以下 Docker 命令基于 Atlas 800I A3 机型。若使用 Atlas 800I A2 机型,请确保仅将 davinci [0-7] 映射到容器中。
167+
168+
.. code-block:: shell
169+
:linenos:
170+
171+
# Clone the SGLang repository
172+
git clone https://github.com/sgl-project/sglang.git
173+
cd sglang/docker
174+
175+
# Build the docker image
176+
docker build -t <image_name> -f Dockerfile.npu .
177+
178+
alias drun='docker run -it --rm --privileged --network=host --ipc=host --shm-size=16g \
179+
--device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2 --device=/dev/davinci3 \
180+
--device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 \
181+
--device=/dev/davinci8 --device=/dev/davinci9 --device=/dev/davinci10 --device=/dev/davinci11 \
182+
--device=/dev/davinci12 --device=/dev/davinci13 --device=/dev/davinci14 --device=/dev/davinci15 \
183+
--device=/dev/davinci_manager --device=/dev/hisi_hdc \
184+
--volume /usr/local/sbin:/usr/local/sbin --volume /usr/local/Ascend/driver:/usr/local/Ascend/driver \
185+
--volume /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \
186+
--volume /etc/ascend_install.info:/etc/ascend_install.info \
187+
--volume /var/queue_schedule:/var/queue_schedule --volume ~/.cache/:/root/.cache/'
188+
189+
# Run the docker container and start the SGLang server
190+
drun --env "HF_TOKEN=<secret>" \
191+
<image_name> \
192+
python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --attention-backend ascend --host 0.0.0.0 --port 30000
193+

sources/sglang/quick_start.rst

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
快速开始
2+
==================
3+
4+
.. note::
5+
6+
阅读本篇前,请确保已按照 :doc:`安装教程 <./install>` 准备好昇腾环境及 SGLang !
7+
8+
本篇教程将介绍如何使用 SGLang 进行快速开发,帮助您快速上手 SGLang。
9+
10+
本文档帮助昇腾开发者快速使用 SGLang × 昇腾 进行 LLM 推理服务。可以访问 `这篇官方文档 <https://docs.sglang.ai/>`_ 获取更多信息。
11+
12+
概览
13+
------------------------
14+
15+
SGLang 是一款适用于 LLM 和 VLM 的高速服务框架。通过协同设计后端运行时环境与前端语言,让用户与模型的交互更快速、更可控。
16+
17+
使用 SGLang 启动服务
18+
------------------------
19+
20+
以下示例展示了如何使用 SGLang 启动一个简单的会话生成服务:
21+
22+
启动一个 server:
23+
24+
.. code-block:: shell
25+
:linenos:
26+
27+
# Launch the SGLang server on NPU
28+
python -m sglang.launch_server --model Qwen/Qwen2.5-0.5B-Instruct \
29+
--device npu --port 8000 --attention-backend ascend \
30+
--host 0.0.0.0 --trust-remote-code
31+
32+
启动成功后,将看到类似如下的日志输出:
33+
34+
.. code-block:: shell
35+
:linenos:
36+
37+
INFO: Started server process [89394]
38+
INFO: Waiting for application startup.
39+
INFO: Application startup complete.
40+
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
41+
INFO: 127.0.0.1:40106 - "GET /get_model_info HTTP/1.1" 200 OK
42+
Prefill batch. #new-seq: 1, #new-token: 128, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0,
43+
INFO: 127.0.0.1:40108 - "POST /generate HTTP/1.1" 200 OK
44+
The server is fired up and ready to roll!
45+
46+
使用 curl 进行测试:
47+
48+
.. code-block:: shell
49+
:linenos:
50+
51+
curl -s http://localhost:8000/v1/chat/completions \
52+
-H "Content-Type: application/json" \
53+
-d '{
54+
"model": "qwen/qwen2.5-0.5b-instruct",
55+
"messages": [
56+
{
57+
"role": "user",
58+
"content": "What is the capital of France?"
59+
}
60+
]
61+
}'
62+
63+
将看到类似如下返回结果:
64+
65+
.. code-block:: shell
66+
:linenos:
67+
68+
{"id":"3f2f1aa779b544c19f01c08b803bf4ef","object":"chat.completion","created":1759136880,"model":"qwen/qwen2.5-0.5b-instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The capital of France is Paris.","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":"stop","matched_stop":151645}],"usage":{"prompt_tokens":36,"total_tokens":44,"completion_tokens":8,"prompt_tokens_details":null,"reasoning_tokens":0},"metadata":{"weight_version":"default"}}
69+
70+
使用 SGLang 进行推理验证
71+
------------------------
72+
73+
以下代码展示了如何使用 SGLang 进行推理验证:
74+
75+
.. code-block:: shell
76+
:linenos:
77+
78+
# example.py
79+
import torch
80+
81+
import sglang as sgl
82+
83+
def main():
84+
85+
prompts = [
86+
"Hello, my name is",
87+
"The Independence Day of the United States is",
88+
"The capital of Germany is",
89+
"The full form of AI is",
90+
] * 1
91+
92+
llm = sgl.Engine(model_path="/Qwen2.5/Qwen2.5-0.5B-Instruct", device="npu", attention_backend="ascend")
93+
94+
sampling_params = {"temperature": 0.8, "top_p": 0.95, "max_new_tokens": 100}
95+
96+
outputs = llm.generate(prompts, sampling_params)
97+
for prompt, output in zip(prompts, outputs):
98+
print("===============================")
99+
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
100+
101+
if __name__ == '__main__':
102+
main()
103+
104+
运行 example.py 进行测试,查看是否得到输出即可验证 SGLang 是否安装成功。

0 commit comments

Comments
 (0)