Bug: When --parallel 4 is turned ON, the inferring result is apparently like fool .But when --parallel 4 is turned OFF everything is OK ? #8935
Labels
bug-unconfirmed
high severity
Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)
stale
What happened?
#####CMD which Works Normally:
CUDA_VISIBLE_DEVICES=0 ./llama-server -m /home/ubuntu/.cache/huggingface/hub/models--MaziyarPanahi--Meta-Llama-3.1-8B-Instruct-GGUF/snapshots/1f301d86d760b435a11a56de3863bc0121bfb98f/Meta-Llama-3.1-8B-Instruct.Q8_0.gguf --gpu-layers 33 -cb --ctx-size 16128 --flash-attn --batch-size 512 --chat-template llama3 --port 8866 --host 0.0.0.0
#####CMD which Works NOT Normally:
CUDA_VISIBLE_DEVICES=0 ./llama-server -m /home/ubuntu/.cache/huggingface/hub/models--MaziyarPanahi--Meta-Llama-3.1-8B-Instruct-GGUF/snapshots/1f301d86d760b435a11a56de3863bc0121bfb98f/Meta-Llama-3.1-8B-Instruct.Q8_0.gguf --gpu-layers 33 -cb --parallel 4 --ctx-size 16128 --flash-attn --batch-size 512 --chat-template llama3 --port 8866 --host 0.0.0.0
ubuntu@VM-0-16-ubuntu:~$ nvidia-smi
Thu Aug 8 21:22:25 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-SXM2-32GB Off | 00000000:00:08.0 Off | 0 |
| N/A 34C P0 39W / 300W | 10194MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 35134 C ./llama-server 10192MiB |
+---------------------------------------------------------------------------------------+
Name and Version
ubuntu@VM-0-16-ubuntu:
/llama.cpp$ ^C/llama.cpp$ ./llama-cli --versionubuntu@VM-0-16-ubuntu:
version: 3549 (afd27f0)
built with cc (Ubuntu 9.5.0-1ubuntu1~22.04) 9.5.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
The text was updated successfully, but these errors were encountered: