Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model deepseek-r1-distill-qwen-14b does not work on NVidia RTX A6000 48GB #4710

Open
huksley opened this issue Jan 28, 2025 · 5 comments
Open
Labels
bug Something isn't working unconfirmed

Comments

@huksley
Copy link

huksley commented Jan 28, 2025

LocalAI version:

LocalAI Version d9204ea (d9204ea)

Environment, CPU architecture, OS, and Version:

x86, Ubuntu 24.04, CUDA 12.6

Describe the bug

Using docker compose and downloading model. gpt4 and gp4o works.
Downloaded model successfully and when I go to Chat => Select model deepseek-r1-distill-qwen-14b and write to chat,
After sometime, no response generated

To Reproduce

  1. install with docker compose
  2. download model
  3. go to Chat => Select model deepseek-r1-distill-qwen-14b
  4. write to chat
  5. no response, no loading progress indicator

Expected behavior

chat works

Logs

Additional context

Running it using DollarDeploy and this docker compose setup: https://github.com/dollardeploy/templates/tree/main/local-ai-nvidia-cuda-12

@huksley huksley added bug Something isn't working unconfirmed labels Jan 28, 2025
@testingNetqa
Copy link

Same here.

localai  | 10:15PM INF [llama-cpp] Attempting to load
localai  | 10:15PM INF Loading model 'deepseek-r1-distill-qwen-14b' with backend llama-cpp
localai  | 10:15PM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback', error: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc = 
localai  | 10:15PM INF [llama-cpp] Fails: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc = 
localai  | 10:15PM INF [llama-ggml] Attempting to load
localai  | 10:15PM INF Loading model 'deepseek-r1-distill-qwen-14b' with backend llama-ggml
localai  | 10:15PM INF [llama-ggml] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = failed loading model
localai  | 10:15PM INF [llama-cpp-fallback] Attempting to load
localai  | 10:15PM INF Loading model 'deepseek-r1-distill-qwen-14b' with backend llama-cpp-fallback
localai  | 10:15PM INF [llama-cpp-fallback] Fails: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc = 
localai  | 10:15PM INF [piper] Attempting to load
localai  | 10:15PM INF Loading model 'deepseek-r1-distill-qwen-14b' with backend piper
localai  | 10:16PM INF [piper] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = unsupported model type /build/models/DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf (should end with .onnx)
localai  | 10:16PM INF [stablediffusion] Attempting to load
localai  | 10:16PM INF Loading model 'deepseek-r1-distill-qwen-14b' with backend stablediffusion
localai  | 10:16PM INF [stablediffusion] Loads OK
localai  | Error rpc error: code = Unknown desc = unimplemented

@cientista
Copy link

cientista commented Jan 29, 2025

Hi, same here but using deepseek-r1-distill-qwen-7b.

21:07PM INF [llama-cpp] Attempting to load
21:07PM INF Loading model 'deepseek-r1-distill-qwen-7b' with backend llama-cpp
21:07PM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback', error: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
21:07PM INF [llama-cpp] Fails: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
21:07PM INF [llama-ggml] Attempting to load
21:07PM INF Loading model 'deepseek-r1-distill-qwen-7b' with backend llama-ggml
21:07PM INF [llama-ggml] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = failed loading model
21:07PM INF [llama-cpp-fallback] Attempting to load
21:07PM INF Loading model 'deepseek-r1-distill-qwen-7b' with backend llama-cpp-fallback
21:07PM INF [llama-cpp-fallback] Fails: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
21:07PM INF [silero-vad] Attempting to load
21:07PM INF Loading model 'deepseek-r1-distill-qwen-7b' with backend silero-vad
21:07PM INF [silero-vad] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = create silero detector: failed to create session: Load model from /build/models/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf failed:Protobuf parsing failed.
21:07PM INF [stablediffusion] Attempting to load
21:07PM INF Loading model 'deepseek-r1-distill-qwen-7b' with backend stablediffusion
21:07PM INF [stablediffusion] Loads OK
Error rpc error: code = Unknown desc = unimplemented

@etlweather
Copy link

Using LM Studio with model DeepSeek-R1-Distill-Qwen-14B-GGUF/DeepSeek-R1-Distill-Qwen-14B-Q8_0.gguf, this works fine.

Using bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF with LocalAI 2.25.0 in Docker with cublas-cuda12, I get the same - model loaded and then Error rpc error: code = Unknown desc = unimplemented

@scimitar4444
Copy link

Maybe this will help:

abetlen/llama-cpp-python#1900

I came across it through this bug report.

oobabooga/text-generation-webui#6679

@huksley
Copy link
Author

huksley commented Feb 7, 2025

Looks like configuration of model are wrong - "...gguf (should end with .onnx)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unconfirmed
Projects
None yet
Development

No branches or pull requests

5 participants