-
Notifications
You must be signed in to change notification settings - Fork 795
Open
Labels
module: xnnpackIssues related to xnnpack delegation and the code under backends/xnnpack/Issues related to xnnpack delegation and the code under backends/xnnpack/
Description
I have followed the quantization documentation for the XNN backend.
Using that document, I have successfully quantized a Matmul executorch model.
Then, I have started with llm model with XNN backend.
For it, I am using the export script as :
quantization_script.py
From the above script, I have successfully generated a quantized model.
As a next step, I perform inference for FP32 as well as the int8 quantized model, and the output logs are as follows.
Inference code: inference.py
For FP32:

For int8 quantized model:
Here, I can't understand why I am getting the error. Can you please suggest a solution for this?
Metadata
Metadata
Assignees
Labels
module: xnnpackIssues related to xnnpack delegation and the code under backends/xnnpack/Issues related to xnnpack delegation and the code under backends/xnnpack/