-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deepseek on Onnxruntime-genai python #1243
Comments
The Hugging Face repo is private because we are still waiting on some internal checks before we can make it public.
The tokenizer error is explained here. Using the latest
Until the Hugging Face repo is made public, you can use the following steps to build and run the DeepSeek distilled models with ONNX Runtime GenAI.
# Replace `deepseek-ai/DeepSeek-R1-Distill-*`with one of the official DeepSeek distilled Hugging Face repo names.
# The list of options can be found here: https://huggingface.co/deepseek-ai/DeepSeek-R1#deepseek-r1-distill-models
python -m onnxruntime_genai.models.builder -m deepseek-ai/DeepSeek-R1-Distill-* -o ./deepseek_r1_distill_int4_cpu -p int4 -e cpu
|
Do note, i am using onnxruntime-genai v0.5.2 and onnxruntime v1.20.1 I built the model using model builder and genai_config.json was generated. But i still get the same error:
what am i missing here? Based on the tokenizer error you mentioned before, this issue was supposed to be resolved by onnxrutime-genai v0.5.0 do i need to rebuild the mainline to obtain onnxrutime-genai pre-release and v0.5.2 is not enough? thanks |
Update: I do want to know if there is a way to disable output for deepseek in ort? the real output starts after the statements are complete. btw will we see an android app example for deepseek for ORT as well? thanks |
Glad to hear it works!
Yes, providing a chat template will help improve the output quality.
There is an example for running LLMs such as Phi-3 or DeepSeek on mobile here. You can use DeepSeek instead of Phi-3 in the example. It appears out-of-date as the ONNX Runtime GenAI version mentioned is v0.4.0 and In the meantime, there is a chat app example here that you can use. |
Is there a chat template you can suggest for deepseek? I tried using Is there a chat template for deepseek which would make the outputs similar to phi3? secondly, is there any way to create int8 deepseek model using modelbuilder? modelbuilder only supports int4, fp16 and fp32 thanks |
The chat template you tried is the one that is used here for DeepSeek. Here are the recommended usage suggestions for DeepSeek. There appears to be a discussion here about its thought process. You can also raise an issue in the Hugging Face repo about this as the creators of DeepSeek may know more. INT8 precision has not been a focus in the model builder because INT4 precision has made the models smaller while preserving accuracy. You can create an FP32 model and then use ONNX Runtime's quantization tools to produce an INT8 model. |
@kunal-vaishnavi |
The For example, the |
Will this support the new perplexity r1 1776 model? |
Hi,
I am trying to execute Deepseek model using python examples on an aarch64 device using the same methodoly as executing phi3.
I source the ONNX model from: https://huggingface.co/onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX/tree/main/onnx
I use model_quantized.onnx since this model is optimized for CPU. This is a qwen2 model type.
Since there is no genai_config file, I generate the same which is attached.
genai_config.json
Then I try to follow the instructions mentioned in this PR: #1215
Do note, in this PR, It is mentioned to pull the model from huggingface repo using:
But this huggingface path does not exist.
nevertheless, on using the ONNX model from the above first mentioned source and using the genai_config.json as attached, when i try to execute model-chat.py or model-qa.py scripts present in python examples (onnxruntime-genai examples), i get the below error:
I am not sure what exactly needs to be a string and what is null? My config file has all strings in proper order and no value is NULL.
I get the same error when executing model-chat.py as well.
my model and json files are inside deepseek directory which i give as input to the model.
I face no issues when executing phi3/phi3.5 models but when using deepseek it errors out at tokenizer.
When i enable verbose, it does show that model has been loaded, so where does this json error crops up from?
Is there any other huggingface repository we need to source deepseek models from?
thanks
The text was updated successfully, but these errors were encountered: