|
| 1 | +安装指南 |
| 2 | +============== |
| 3 | + |
| 4 | +本教程面向使用 SGLang & 昇腾的开发者,帮助完成昇腾环境下 SGLang 的安装。截至 2025 年 9 月,该项目涉及的如下组件正在活跃开发中,建议使用最新版本,并注意版本以及设备兼容性。 |
| 5 | + |
| 6 | +昇腾环境安装 |
| 7 | +------------ |
| 8 | + |
| 9 | +请根据已有昇腾产品型号及 CPU 架构等按照 :doc:`快速安装昇腾环境指引 <../ascend/quick_install>` 进行昇腾环境安装。 |
| 10 | + |
| 11 | +.. warning:: |
| 12 | + CANN 推荐版本为 8.2.RC1 以上,安装 CANN 时,请同时安装 Kernel 算子包以及 nnal ARM 平台加速库软件包。 |
| 13 | + |
| 14 | + |
| 15 | +SGLang 安装 |
| 16 | +---------------------- |
| 17 | + |
| 18 | +方法1:使用源码安装 SGLang |
| 19 | +~~~~~~~~~~~~~~~~~~~~~~ |
| 20 | + |
| 21 | + |
| 22 | +Python 环境创建 |
| 23 | +^^^^^^^^^^^^^^^^^^^^^^ |
| 24 | + |
| 25 | +.. code-block:: shell |
| 26 | + :linenos: |
| 27 | +
|
| 28 | + # Create a new conda environment, and only python 3.11 is supported |
| 29 | + conda create --name sglang_npu python=3.11 |
| 30 | + # Activate the virtual environment |
| 31 | + conda activate sglang_npu |
| 32 | +
|
| 33 | +安装 python 依赖 |
| 34 | +^^^^^^^^^^^^^^^^^^^^^^ |
| 35 | + |
| 36 | +.. code-block:: shell |
| 37 | + :linenos: |
| 38 | +
|
| 39 | + pip install attrs==24.2.0 numpy==1.26.4 scipy==1.13.1 decorator==5.1.1 psutil==6.0.0 pytest==8.3.2 pytest-xdist==3.6.1 pyyaml |
| 40 | +
|
| 41 | +
|
| 42 | +MemFabric Adaptor 安装 |
| 43 | +^^^^^^^^^^^^^^^^^^^^^^ |
| 44 | + |
| 45 | +MemFabric Adaptor 是 Mooncake Transfer Engine 在昇腾 NPU 集群上实现 KV cache 传输的替代方案。 |
| 46 | + |
| 47 | + |
| 48 | +目前,MemFabric Adaptor 仅支持 aarch64 架构的设备。请根据实际架构选择安装: |
| 49 | + |
| 50 | +.. code-block:: shell |
| 51 | + :linenos: |
| 52 | +
|
| 53 | + MF_WHL_NAME="mf_adapter-1.0.0-cp311-cp311-linux_aarch64.whl" |
| 54 | + MEMFABRIC_URL="https://sglang-ascend.obs.cn-east-3.myhuaweicloud.com/sglang/${MF_WHL_NAME}" |
| 55 | + wget -O "${MF_WHL_NAME}" "${MEMFABRIC_URL}" && pip install "./${MF_WHL_NAME}" |
| 56 | +
|
| 57 | +
|
| 58 | +torch-npu 安装 |
| 59 | +^^^^^^^^^^^^^^^^^^^^^^ |
| 60 | + |
| 61 | +按照 :doc:`torch-npu 安装指引 <../pytorch/install>` 本项目由于 NPUGraph 和 Triton-Ascend 的限制,目前仅支持安装 2.6.0 版本 torch 和 torch-npu,后续会推出更通用的版本方案。 |
| 62 | + |
| 63 | +.. code-block:: shell |
| 64 | + :linenos: |
| 65 | +
|
| 66 | + # Install torch 2.6.0 and torchvision 0.21.0 on CPU only |
| 67 | + PYTORCH_VERSION=2.6.0 |
| 68 | + TORCHVISION_VERSION=0.21.0 |
| 69 | + pip install torch==$PYTORCH_VERSION torchvision==$TORCHVISION_VERSION --index-url https://download.pytorch.org/whl/cpu |
| 70 | +
|
| 71 | + # Install torch_npu 2.6.0 or you can just pip install torch_npu==2.6.0 |
| 72 | + PTA_VERSION="v7.1.0.2-pytorch2.6.0" |
| 73 | + PTA_NAME="torch_npu-2.6.0.post2-cp311-cp311-manylinux_2_28_aarch64.whl" |
| 74 | + PTA_URL="https://gitcode.com/ascend/pytorch/releases/download/${PTA_VERSION}/${PTA_WHL_NAME}" |
| 75 | + wget -O "${PTA_NAME}" "${PTA_URL}" && pip install "./${PTA_NAME}" |
| 76 | +
|
| 77 | +安装完成后,可以通过以下代码验证 torch_npu 是否安装成功: |
| 78 | + |
| 79 | +.. code-block:: shell |
| 80 | + :linenos: |
| 81 | +
|
| 82 | + import torch |
| 83 | + # import torch_npu # In torch 2.6.0,no need to import torch_npu explicitly |
| 84 | +
|
| 85 | + x = torch.randn(2, 2).npu() |
| 86 | + y = torch.randn(2, 2).npu() |
| 87 | + z = x.mm(y) |
| 88 | +
|
| 89 | + print(z) |
| 90 | +
|
| 91 | +程序能够成功打印矩阵 Z 的值即为安装成功。 |
| 92 | +
|
| 93 | +vLLM 安装 |
| 94 | +^^^^^^^^^^^^^^^^^^^^^^ |
| 95 | +
|
| 96 | +vLLM 目前仍是昇腾 NPU 上的一个主要前提条件。基于 torch==2.6.0 版本,vLLM 需要从源码编译安装 v0.8.5 版本。 |
| 97 | +
|
| 98 | +.. code-block:: shell |
| 99 | + :linenos: |
| 100 | +
|
| 101 | + VLLM_TAG=v0.8.5 |
| 102 | + git clone --depth 1 https://github.com/vllm-project/vllm.git --branch $VLLM_TAG |
| 103 | + cd vllm |
| 104 | + VLLM_TARGET_DEVICE="empty" pip install -v -e . |
| 105 | + cd .. |
| 106 | +
|
| 107 | +Triton-Ascend 安装 |
| 108 | +^^^^^^^^^^^^^^^^^^^^^^ |
| 109 | +
|
| 110 | +Triton Ascend还在频繁更新。为能使用最新功能特性,建议拉取代码进行源码安装。详细安装步骤请参考 `安装指南 <https://gitcode.com/Ascend/triton-ascend/blob/master/docs/sources/getting-started/installation.md>`_。 |
| 111 | +
|
| 112 | +或者选择安装 Triton Ascend nightly 包: |
| 113 | +
|
| 114 | +.. code-block:: shell |
| 115 | + :linenos: |
| 116 | +
|
| 117 | + pip install -i https://test.pypi.org/simple/ "triton-ascend<3.2.0rc" --pre --no-cache-dir |
| 118 | +
|
| 119 | +
|
| 120 | +安装 Deep-ep 与 sgl-kernel-npu: |
| 121 | +^^^^^^^^^^^^^^^^^^^^^^ |
| 122 | +
|
| 123 | +.. code-block:: shell |
| 124 | + :linenos: |
| 125 | + |
| 126 | + pip install wheel==0.45.1 |
| 127 | + git clone https://github.com/sgl-project/sgl-kernel-npu.git |
| 128 | +
|
| 129 | + # Add environment variables |
| 130 | + export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/runtime/lib64/stub:$LD_LIBRARY_PATH |
| 131 | + source /usr/local/Ascend/ascend-toolkit/set_env.sh |
| 132 | + cd sgl-kernel-npu |
| 133 | +
|
| 134 | + # Compile and install deep-ep, sgl-kernel-npu |
| 135 | + bash build.sh |
| 136 | + pip install output/deep_ep*.whl output/sgl_kernel_npu*.whl --no-cache-dir |
| 137 | + cd .. |
| 138 | + rm -rf sgl-kernel-npu |
| 139 | +
|
| 140 | + # Link to the deep_ep_cpp.*.so file |
| 141 | + cd "$(pip show deep-ep | grep -E '^Location:' | awk '{print $2}')" && ln -s deep_ep/deep_ep_cpp*.so |
| 142 | +
|
| 143 | +
|
| 144 | +源码安装 SGLang: |
| 145 | +^^^^^^^^^^^^^^^^^^^^^^ |
| 146 | +
|
| 147 | +.. code-block:: shell |
| 148 | + :linenos: |
| 149 | +
|
| 150 | + # Use the last release branch |
| 151 | + git clone -b v0.5.3rc0 https://github.com/sgl-project/sglang.git |
| 152 | + cd sglang |
| 153 | +
|
| 154 | + pip install --upgrade pip |
| 155 | + # Install SGLang with NPU support |
| 156 | + pip install -e python[srt_npu] |
| 157 | + cd .. |
| 158 | +
|
| 159 | +
|
| 160 | +
|
| 161 | +方法2:使用 docker 镜像安装 SGLang |
| 162 | +~~~~~~~~~~~~~~~~~~~~~~ |
| 163 | +
|
| 164 | +注意:--privileged 和 --network=host 是 RDMA 所必需的,而 RDMA 通常也是 Ascend NPU 集群的必备组件。 |
| 165 | +
|
| 166 | +以下 Docker 命令基于 Atlas 800I A3 机型。若使用 Atlas 800I A2 机型,请确保仅将 davinci [0-7] 映射到容器中。 |
| 167 | +
|
| 168 | +.. code-block:: shell |
| 169 | + :linenos: |
| 170 | +
|
| 171 | + # Clone the SGLang repository |
| 172 | + git clone https://github.com/sgl-project/sglang.git |
| 173 | + cd sglang/docker |
| 174 | +
|
| 175 | + # Build the docker image |
| 176 | + docker build -t <image_name> -f Dockerfile.npu . |
| 177 | +
|
| 178 | + alias drun='docker run -it --rm --privileged --network=host --ipc=host --shm-size=16g \ |
| 179 | + --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2 --device=/dev/davinci3 \ |
| 180 | + --device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 \ |
| 181 | + --device=/dev/davinci8 --device=/dev/davinci9 --device=/dev/davinci10 --device=/dev/davinci11 \ |
| 182 | + --device=/dev/davinci12 --device=/dev/davinci13 --device=/dev/davinci14 --device=/dev/davinci15 \ |
| 183 | + --device=/dev/davinci_manager --device=/dev/hisi_hdc \ |
| 184 | + --volume /usr/local/sbin:/usr/local/sbin --volume /usr/local/Ascend/driver:/usr/local/Ascend/driver \ |
| 185 | + --volume /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \ |
| 186 | + --volume /etc/ascend_install.info:/etc/ascend_install.info \ |
| 187 | + --volume /var/queue_schedule:/var/queue_schedule --volume ~/.cache/:/root/.cache/' |
| 188 | +
|
| 189 | + # Run the docker container and start the SGLang server |
| 190 | + drun --env "HF_TOKEN=<secret>" \ |
| 191 | + <image_name> \ |
| 192 | + python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --attention-backend ascend --host 0.0.0.0 --port 30000 |
| 193 | +
|
0 commit comments