Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pyproject.toml: add dependancy "ninja" #2522

Closed
wants to merge 1 commit into from

Conversation

adarshxs
Copy link
Contributor

Motivation

fixes : #2514
and the following:

[2024-12-19 14:44:19 TP0] Load weight end. type=LlamaForCausalLM, dtype=torch.bfloat16, avail mem=41.70 GB
[2024-12-19 14:44:20 TP0] Memory pool end. avail mem=5.26 GB
[2024-12-19 14:44:20 TP0] Capture cuda graph begin. This can take up to several minutes.
  0%|                                                                                                                                              | 0/6 [00:00<?, ?it/s]2024-12-19 14:44:21,421 - INFO - flashinfer.jit: Loading JIT ops: batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_64_posenc_0_use_swa_False_use_logits_cap_False
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
  0%|                                                                                                                                              | 0/6 [00:01<?, ?it/s]
[2024-12-19 14:44:21 TP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 209, in __init__
    self.capture()
  File "/workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 275, in capture
    ) = self.capture_one_batch_size(bs, forward)
  File "/workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 304, in capture_one_batch_size
    self.model_runner.attn_backend.init_forward_metadata_capture_cuda_graph(
  File "/workspace/sglang/python/sglang/srt/layers/attention/flashinfer_backend.py", line 194, in init_forward_metadata_capture_cuda_graph
    self.indices_updater_decode.update(
  File "/workspace/sglang/python/sglang/srt/layers/attention/flashinfer_backend.py", line 378, in update_single_wrapper
    self.call_begin_forward(
  File "/workspace/sglang/python/sglang/srt/layers/attention/flashinfer_backend.py", line 478, in call_begin_forward
    wrapper.begin_forward(
  File "/usr/local/lib/python3.10/dist-packages/flashinfer/decode.py", line 788, in plan
    self._cached_module = get_batch_decode_module(
  File "/usr/local/lib/python3.10/dist-packages/flashinfer/decode.py", line 148, in get_batch_decode_module
    mod = gen_batch_decode_module(*args)
  File "/usr/local/lib/python3.10/dist-packages/flashinfer/jit/attention.py", line 173, in gen_batch_decode_module
    return load_cuda_ops(uri, source_paths)
  File "/usr/local/lib/python3.10/dist-packages/flashinfer/jit/core.py", line 112, in load_cuda_ops
    module = torch_cpp_ext.load(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1312, in load
    return _jit_compile(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1722, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1804, in _write_ninja_file_and_build_library
    verify_ninja_availability()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1853, in verify_ninja_availability
    raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1531, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/workspace/sglang/python/sglang/srt/managers/scheduler.py", line 192, in __init__
    self.tp_worker = TpWorkerClass(
  File "/workspace/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 62, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
  File "/workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 62, in __init__
    self.model_runner = ModelRunner(
  File "/workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 184, in __init__
    self.init_cuda_graphs()
  File "/workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 639, in init_cuda_graphs
    self.cuda_graph_runner = CudaGraphRunner(self)
  File "/workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 211, in __init__
    raise Exception(
Exception: Capture cuda graph failed: Ninja is required to load C++ extensions
Possible solutions:
1. disable cuda graph by --disable-cuda-graph
2. set --mem-fraction-static to a smaller value (e.g., 0.8 or 0.7)
3. disable torch compile by not using --enable-torch-compile
Open an issue on GitHub https://github.com/sgl-project/sglang/issues/new/choose 

@zhyncs
Copy link
Member

zhyncs commented Dec 19, 2024

duplicate with #2501

@zhyncs zhyncs added the duplicate This issue or pull request already exists label Dec 19, 2024
@zhyncs
Copy link
Member

zhyncs commented Dec 19, 2024

Thank you for your contribution, even though similar feature has already been implemented in other PR!

@zhyncs zhyncs closed this Dec 19, 2024
@adarshxs
Copy link
Contributor Author

my bad I hadn't synced my fork. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] RuntimeRrror: Ninja is required to load c++ extensions
2 participants