Model deepseek-r1-distill-qwen-14b does not work on NVidia RTX A6000 48GB #4710

huksley · 2025-01-28T17:28:10Z

LocalAI version:

LocalAI Version d9204ea (d9204ea)

Environment, CPU architecture, OS, and Version:

x86, Ubuntu 24.04, CUDA 12.6

Describe the bug

Using docker compose and downloading model. gpt4 and gp4o works.
Downloaded model successfully and when I go to Chat => Select model deepseek-r1-distill-qwen-14b and write to chat,
After sometime, no response generated

To Reproduce

install with docker compose
download model
go to Chat => Select model deepseek-r1-distill-qwen-14b
write to chat
no response, no loading progress indicator

Expected behavior

chat works

Logs

Additional context

Running it using DollarDeploy and this docker compose setup: https://github.com/dollardeploy/templates/tree/main/local-ai-nvidia-cuda-12

The text was updated successfully, but these errors were encountered:

testingNetqa · 2025-01-28T21:23:55Z

Same here.

localai  | 10:15PM INF [llama-cpp] Attempting to load
localai  | 10:15PM INF Loading model 'deepseek-r1-distill-qwen-14b' with backend llama-cpp
localai  | 10:15PM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback', error: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc = 
localai  | 10:15PM INF [llama-cpp] Fails: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc = 
localai  | 10:15PM INF [llama-ggml] Attempting to load
localai  | 10:15PM INF Loading model 'deepseek-r1-distill-qwen-14b' with backend llama-ggml
localai  | 10:15PM INF [llama-ggml] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = failed loading model
localai  | 10:15PM INF [llama-cpp-fallback] Attempting to load
localai  | 10:15PM INF Loading model 'deepseek-r1-distill-qwen-14b' with backend llama-cpp-fallback
localai  | 10:15PM INF [llama-cpp-fallback] Fails: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc = 
localai  | 10:15PM INF [piper] Attempting to load
localai  | 10:15PM INF Loading model 'deepseek-r1-distill-qwen-14b' with backend piper
localai  | 10:16PM INF [piper] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = unsupported model type /build/models/DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf (should end with .onnx)
localai  | 10:16PM INF [stablediffusion] Attempting to load
localai  | 10:16PM INF Loading model 'deepseek-r1-distill-qwen-14b' with backend stablediffusion
localai  | 10:16PM INF [stablediffusion] Loads OK
localai  | Error rpc error: code = Unknown desc = unimplemented

cientista · 2025-01-29T10:54:59Z

Hi, same here but using deepseek-r1-distill-qwen-7b.

21:07PM INF [llama-cpp] Attempting to load
21:07PM INF Loading model 'deepseek-r1-distill-qwen-7b' with backend llama-cpp
21:07PM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback', error: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
21:07PM INF [llama-cpp] Fails: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
21:07PM INF [llama-ggml] Attempting to load
21:07PM INF Loading model 'deepseek-r1-distill-qwen-7b' with backend llama-ggml
21:07PM INF [llama-ggml] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = failed loading model
21:07PM INF [llama-cpp-fallback] Attempting to load
21:07PM INF Loading model 'deepseek-r1-distill-qwen-7b' with backend llama-cpp-fallback
21:07PM INF [llama-cpp-fallback] Fails: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
21:07PM INF [silero-vad] Attempting to load
21:07PM INF Loading model 'deepseek-r1-distill-qwen-7b' with backend silero-vad
21:07PM INF [silero-vad] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = create silero detector: failed to create session: Load model from /build/models/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf failed:Protobuf parsing failed.
21:07PM INF [stablediffusion] Attempting to load
21:07PM INF Loading model 'deepseek-r1-distill-qwen-7b' with backend stablediffusion
21:07PM INF [stablediffusion] Loads OK
Error rpc error: code = Unknown desc = unimplemented

etlweather · 2025-01-31T08:46:54Z

Using LM Studio with model DeepSeek-R1-Distill-Qwen-14B-GGUF/DeepSeek-R1-Distill-Qwen-14B-Q8_0.gguf, this works fine.

Using bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF with LocalAI 2.25.0 in Docker with cublas-cuda12, I get the same - model loaded and then Error rpc error: code = Unknown desc = unimplemented

scimitar4444 · 2025-02-04T06:17:43Z

Maybe this will help:

abetlen/llama-cpp-python#1900

I came across it through this bug report.

oobabooga/text-generation-webui#6679

huksley · 2025-02-07T22:33:27Z

Looks like configuration of model are wrong - "...gguf (should end with .onnx)

huksley added bug Something isn't working unconfirmed labels Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model deepseek-r1-distill-qwen-14b does not work on NVidia RTX A6000 48GB #4710

Model deepseek-r1-distill-qwen-14b does not work on NVidia RTX A6000 48GB #4710

huksley commented Jan 28, 2025 •

edited

Loading

testingNetqa commented Jan 28, 2025

cientista commented Jan 29, 2025 •

edited

Loading

etlweather commented Jan 31, 2025

scimitar4444 commented Feb 4, 2025

huksley commented Feb 7, 2025

Model deepseek-r1-distill-qwen-14b does not work on NVidia RTX A6000 48GB #4710

Model deepseek-r1-distill-qwen-14b does not work on NVidia RTX A6000 48GB #4710

Comments

huksley commented Jan 28, 2025 • edited Loading

testingNetqa commented Jan 28, 2025

cientista commented Jan 29, 2025 • edited Loading

etlweather commented Jan 31, 2025

scimitar4444 commented Feb 4, 2025

huksley commented Feb 7, 2025

huksley commented Jan 28, 2025 •

edited

Loading

cientista commented Jan 29, 2025 •

edited

Loading