webui多卡VLLM推理错误 #6468

CingyQ · 2024-12-28T16:00:03Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.1.dev0
Platform: Linux-6.8.0-45-generic-x86_64-with-glibc2.35
Python version: 3.11.10
PyTorch version: 2.4.0+cu121 (GPU)
Transformers version: 4.45.0
Datasets version: 2.21.0
Accelerate version: 0.34.2
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA GeForce RTX 4090
DeepSpeed version: 0.15.1
vLLM version: 0.6.1.dev238+ge2c6e0a82

Reproduction

WARNING 12-28 23:56:52 multiproc_gpu_executor.py:53] Reducing Torch parallelism from 72 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 12-28 23:56:52 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
(VllmWorkerProcess pid=2861291) INFO 12-28 23:56:52 multiproc_worker_utils.py:218] Worker ready; awaiting tasks
(VllmWorkerProcess pid=2861291) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] Exception in worker VllmWorkerProcess while processing method init_device: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method, Traceback (most recent call last):
(VllmWorkerProcess pid=2861291) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] File "/home/zyx/anaconda3/envs/llama/lib/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 226, in _run_worker_process
(VllmWorkerProcess pid=2861291) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=2861291) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=2861291) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] File "/home/zyx/anaconda3/envs/llama/lib/python3.11/site-packages/vllm/worker/worker.py", line 166, in init_device
(VllmWorkerProcess pid=2861291) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] torch.cuda.set_device(self.device)
(VllmWorkerProcess pid=2861291) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] File "/home/zyx/anaconda3/envs/llama/lib/python3.11/site-packages/torch/cuda/init.py", line 420, in set_device
(VllmWorkerProcess pid=2861291) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] torch._C._cuda_setDevice(device)
(VllmWorkerProcess pid=2861291) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] File "/home/zyx/anaconda3/envs/llama/lib/python3.11/site-packages/torch/cuda/init.py", line 300, in _lazy_init
(VllmWorkerProcess pid=2861291) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] raise RuntimeError(
(VllmWorkerProcess pid=2861291) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
(VllmWorkerProcess pid=2861291) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233]
(VllmWorkerProcess pid=2861290) INFO 12-28 23:56:52 multiproc_worker_utils.py:218] Worker ready; awaiting tasks
(VllmWorkerProcess pid=2861292) INFO 12-28 23:56:52 multiproc_worker_utils.py:218] Worker ready; awaiting tasks
(VllmWorkerProcess pid=2861290) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] Exception in worker VllmWorkerProcess while processing method init_device: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method, Traceback (most recent call last):
(VllmWorkerProcess pid=2861290) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] File "/home/zyx/anaconda3/envs/llama/lib/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 226, in _run_worker_process
(VllmWorkerProcess pid=2861290) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=2861290) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=2861290) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] File "/home/zyx/anaconda3/envs/llama/lib/python3.11/site-packages/vllm/worker/worker.py", line 166, in init_device
(VllmWorkerProcess pid=2861290) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] torch.cuda.set_device(self.device)
(VllmWorkerProcess pid=2861290) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] File "/home/zyx/anaconda3/envs/llama/lib/python3.11/site-packages/torch/cuda/init.py", line 420, in set_device
(VllmWorkerProcess pid=2861290) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] torch._C._cuda_setDevice(device)
(VllmWorkerProcess pid=2861290) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] File "/home/zyx/anaconda3/envs/llama/lib/python3.11/site-packages/torch/cuda/init.py", line 300, in _lazy_init
(VllmWorkerProcess pid=2861290) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] raise RuntimeError(
(VllmWorkerProcess pid=2861290) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
(VllmWorkerProcess pid=2861290) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233]
(VllmWorkerProcess pid=2861292) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] Exception in worker VllmWorkerProcess while processing method init_device: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method, Traceback (most recent call last):
(VllmWorkerProcess pid=2861292) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] File "/home/zyx/anaconda3/envs/llama/lib/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 226, in _run_worker_process
(VllmWorkerProcess pid=2861292) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=2861292) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=2861292) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] File "/home/zyx/anaconda3/envs/llama/lib/python3.11/site-packages/vllm/worker/worker.py", line 166, in init_device
(VllmWorkerProcess pid=2861292) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] torch.cuda.set_device(self.device)
(VllmWorkerProcess pid=2861292) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] File "/home/zyx/anaconda3/envs/llama/lib/python3.11/site-packages/torch/cuda/init.py", line 420, in set_device
(VllmWorkerProcess pid=2861292) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] torch._C._cuda_setDevice(device)
(VllmWorkerProcess pid=2861292) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] File "/home/zyx/anaconda3/envs/llama/lib/python3.11/site-packages/torch/cuda/init.py", line 300, in _lazy_init
(VllmWorkerProcess pid=2861292) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] raise RuntimeError(
(VllmWorkerProcess pid=2861292) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
(VllmWorkerProcess pid=2861292) ERROR 12-28 23:56:52 multiproc_worker_utils.py:233]
^CKeyboard interruption in main thread... closing server.
(VllmWorkerProcess pid=2861291) INFO 12-28 23:58:41 multiproc_worker_utils.py:244] Worker exiting
(VllmWorkerProcess pid=2861292) INFO 12-28 23:58:41 multiproc_worker_utils.py:244] Worker exiting
(VllmWorkerProcess pid=2861290) INFO 12-28 23:58:41 multiproc_worker_utils.py:244] Worker exiting

Expected behavior

No response

Others

No response

The text was updated successfully, but these errors were encountered:

github-actions bot added the pending This problem is yet to be addressed label Dec 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webui多卡VLLM推理错误 #6468

webui多卡VLLM推理错误 #6468

CingyQ commented Dec 28, 2024

webui多卡VLLM推理错误 #6468

webui多卡VLLM推理错误 #6468

Comments

CingyQ commented Dec 28, 2024

Reminder

System Info

Reproduction

Expected behavior

Others