[Bug] Exception: Capture cuda graph failed: Triton Error [CUDA]: device kernel image is invalid #1558

a136214808 · 2024-10-03T04:46:23Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

My environment is A100*8 and cuda version is 118, and when I install the sglang in order, I can't run it smoothly. Because I am not the owner of the server, so I can't change the cuda environment. So, I want to know whether there is special installation requirements for cu118.（I try two servers and they both fail）

My orders are as follows:
pip install --upgrade pip
pip install "sglang[all]"

Install FlashInfer CUDA kernels

pip install flashinfer -i https://flashinfer.ai/whl/cu118/torch2.4/

Reproduction

command: CUDA_VISIBLE_DEVICES=3 python -m sglang.launch_server --model-path /disk1/qwen2.5/Qwen2.5-7B-Instruct --port 30000 --enable-torch-compile --attention-backend triton --sampling-backend pytorch

bug:
......
File "/disk1/young/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 468, in init_cuda_graphs
self.cuda_graph_runner = CudaGraphRunner(self)
File "/disk1/young/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 153, in init
raise Exception(
Exception: Capture cuda graph failed: Triton Error [CUDA]: device kernel image is invalid
Possible solutions:

disable cuda graph by --disable-cuda-graph
set --mem-fraction-static to a smaller value (e.g., 0.8 or 0.7)
disable torch compile by not using --enable-torch-compile
Open an issue on GitHub https://github.com/sgl-project/sglang/issues/new/choose

Environment

Python: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3,4,5: NVIDIA A100 80GB PCIe
GPU 0,1,2,3,4,5 Compute Capability: 8.0
CUDA_HOME: /usr/local/cuda-11.8
NVCC: Cuda compilation tools, release 11.8, V11.8.89
CUDA Driver Version: 515.105.01
PyTorch: 2.4.0+cu118
sglang: 0.3.2
flashinfer: 0.1.6+cu118torch2.4
triton: 3.0.0
transformers: 4.45.1
requests: 2.32.3
tqdm: 4.66.5
numpy: 1.26.4
aiohttp: 3.10.8
fastapi: 0.115.0
hf_transfer: 0.1.8
huggingface_hub: 0.25.1
interegular: 0.3.3
packaging: 24.1
PIL: 10.4.0
psutil: 6.0.0
pydantic: 2.9.2
uvicorn: 0.31.0
uvloop: 0.20.0
zmq: 26.2.0
vllm: 0.5.5
multipart: 0.0.12
openai: 1.51.0
anthropic: 0.34.2
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 CPU Affinity NUMA Affinity
GPU0 X PIX PXB PXB PXB PXB 0-15,32-47 0
GPU1 PIX X PXB PXB PXB PXB 0-15,32-47 0
GPU2 PXB PXB X PXB PXB PXB 0-15,32-47 0
GPU3 PXB PXB PXB X PXB PXB 0-15,32-47 0
GPU4 PXB PXB PXB PXB X PXB 0-15,32-47 0
GPU5 PXB PXB PXB PXB PXB X 0-15,32-47 0

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

ulimit soft: 1048576

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Exception: Capture cuda graph failed: Triton Error [CUDA]: device kernel image is invalid #1558

[Bug] Exception: Capture cuda graph failed: Triton Error [CUDA]: device kernel image is invalid #1558

a136214808 commented Oct 3, 2024

[Bug] Exception: Capture cuda graph failed: Triton Error [CUDA]: device kernel image is invalid #1558

[Bug] Exception: Capture cuda graph failed: Triton Error [CUDA]: device kernel image is invalid #1558

Comments

a136214808 commented Oct 3, 2024

Checklist

Describe the bug

Install FlashInfer CUDA kernels

Reproduction

Environment