Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deepseek on Onnxruntime-genai python #1243

Closed
suyash-narain opened this issue Feb 12, 2025 · 9 comments
Closed

Deepseek on Onnxruntime-genai python #1243

suyash-narain opened this issue Feb 12, 2025 · 9 comments

Comments

@suyash-narain
Copy link

Hi,

I am trying to execute Deepseek model using python examples on an aarch64 device using the same methodoly as executing phi3.
I source the ONNX model from: https://huggingface.co/onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX/tree/main/onnx

I use model_quantized.onnx since this model is optimized for CPU. This is a qwen2 model type.
Since there is no genai_config file, I generate the same which is attached.

genai_config.json

Then I try to follow the instructions mentioned in this PR: #1215

Do note, in this PR, It is mentioned to pull the model from huggingface repo using:

# Download the model directly using the huggingface cli
huggingface-cli download onnxruntime/DeepSeek-R1-Distill-ONNX --include 'deepseek-r1-distill-qwen-1.5B/*' --local-dir .

But this huggingface path does not exist.

nevertheless, on using the ONNX model from the above first mentioned source and using the genai_config.json as attached, when i try to execute model-chat.py or model-qa.py scripts present in python examples (onnxruntime-genai examples), i get the below error:

python3 model-qa.py -m deepseek/ -e cpu
Traceback (most recent call last):
File "/home/model-qa.py", line 150, in
main(args)
File "/home/model-qa.py", line 20 in main
tokenizer = og.Tokenizer(model)
Runtime Error: [json.exception.type_error.302] type must be string, but is null

I am not sure what exactly needs to be a string and what is null? My config file has all strings in proper order and no value is NULL.

I get the same error when executing model-chat.py as well.
my model and json files are inside deepseek directory which i give as input to the model.

I face no issues when executing phi3/phi3.5 models but when using deepseek it errors out at tokenizer.

When i enable verbose, it does show that model has been loaded, so where does this json error crops up from?

Is there any other huggingface repository we need to source deepseek models from?

thanks

@kunal-vaishnavi
Copy link
Contributor

But this huggingface path does not exist.

The Hugging Face repo is private because we are still waiting on some internal checks before we can make it public.

I am not sure what exactly needs to be a string and what is null? My config file has all strings in proper order and no value is NULL.

The tokenizer error is explained here. Using the latest transformers version and the latest pre-release version of ONNX Runtime GenAI should fix this.

Is there any other huggingface repository we need to source deepseek models from?

Until the Hugging Face repo is made public, you can use the following steps to build and run the DeepSeek distilled models with ONNX Runtime GenAI.

  1. Install the pre-release version of ONNX Runtime GenAI. To do this, add --pre to your pip install command.
  2. Run the model builder to produce the ONNX model and genai_config.json. Here is an example.
# Replace `deepseek-ai/DeepSeek-R1-Distill-*`with one of the official DeepSeek distilled Hugging Face repo names.
# The list of options can be found here: https://huggingface.co/deepseek-ai/DeepSeek-R1#deepseek-r1-distill-models
python -m onnxruntime_genai.models.builder -m deepseek-ai/DeepSeek-R1-Distill-* -o ./deepseek_r1_distill_int4_cpu -p int4 -e cpu
  1. Run an example inference script such as model-chat.py or model-qa.py.

@suyash-narain
Copy link
Author

suyash-narain commented Feb 13, 2025

Hi @kunal-vaishnavi

Do note, i am using onnxruntime-genai v0.5.2 and onnxruntime v1.20.1
my transformers version on my aarch64 target is v4.48.3

I built the model using model builder and genai_config.json was generated. But i still get the same error:

python3 model-qa.py -m deepseek_r1_distill_int4_cpu/ -e cpu
Traceback (most recent call last):
  File "/home/model-qa.py", line 150, in <module>
    main(args)
  File "/home/model-qa.py", line 20, in main
    tokenizer = og.Tokenizer(model)
RuntimeError: [json.exception.type_error.302] type must be string, but is null

what am i missing here? Based on the tokenizer error you mentioned before, this issue was supposed to be resolved by onnxrutime-genai v0.5.0

do i need to rebuild the mainline to obtain onnxrutime-genai pre-release and v0.5.2 is not enough?

thanks

@suyash-narain
Copy link
Author

suyash-narain commented Feb 13, 2025

Update:
i rebuilt onnxruntime-genai with mainline and i got it to work.
Thanks for your help.

I do want to know if there is a way to disable output for deepseek in ort? the real output starts after the statements are complete.
would providing a chat template help remove ?

btw will we see an android app example for deepseek for ORT as well?

thanks

@kunal-vaishnavi
Copy link
Contributor

Update: i rebuilt onnxruntime-genai with mainline and i got it to work. Thanks for your help.

Glad to hear it works!

I do want to know if there is a way to disable output for deepseek in ort? the real output starts after the statements are complete. would providing a chat template help remove ?

Yes, providing a chat template will help improve the output quality.

btw will we see an android app example for deepseek for ORT as well?

There is an example for running LLMs such as Phi-3 or DeepSeek on mobile here. You can use DeepSeek instead of Phi-3 in the example. It appears out-of-date as the ONNX Runtime GenAI version mentioned is v0.4.0 and computeLogits is now deprecated as of v0.6.0. You may need to update the code or raise an issue there.

In the meantime, there is a chat app example here that you can use.

@suyash-narain
Copy link
Author

suyash-narain commented Feb 14, 2025

Hi @kunal-vaishnavi

Is there a chat template you can suggest for deepseek?

I tried using chat_template "<|begin▁of▁sentence|><|User|>{input}<|Assistant|>"
but it still ends up showing etc.

Is there a chat template for deepseek which would make the outputs similar to phi3?

secondly, is there any way to create int8 deepseek model using modelbuilder? modelbuilder only supports int4, fp16 and fp32

thanks

@kunal-vaishnavi
Copy link
Contributor

The chat template you tried is the one that is used here for DeepSeek. Here are the recommended usage suggestions for DeepSeek. There appears to be a discussion here about its thought process. You can also raise an issue in the Hugging Face repo about this as the creators of DeepSeek may know more.

INT8 precision has not been a focus in the model builder because INT4 precision has made the models smaller while preserving accuracy. You can create an FP32 model and then use ONNX Runtime's quantization tools to produce an INT8 model.

@suyash-narain
Copy link
Author

@kunal-vaishnavi
thanks for your response.
onnx quantization tools take into account only model.onnx
how will the weights be quantized which are present in model.onnx.data?

@kunal-vaishnavi
Copy link
Contributor

The model.onnx file contains references to the weights stored in the model.onnx.data file. If you load model.onnx in Netron and click on a node that contains weights (e.g. MatMul), you should see a reference to model.onnx.data there. When model.onnx is loaded (e.g. onnx.load), the weights referenced in model.onnx.data will be loaded as well. If onnx.load(..., load_external_data=False) is used, then the weights in model.onnx.data will not be loaded.

For example, the quantize_dynamic method calls load_model_with_shape_infer and that method calls onnx.load, which will load the external weights.

@ddiddi
Copy link

ddiddi commented Feb 23, 2025

Will this support the new perplexity r1 1776 model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants