Skip to content

Issues with torch_quant_to_onnx.py #724

@HaohuiHU

Description

@HaohuiHU

Describe the bug

I am testing the torch_quant_to_onnx.py script in the Model Optimizer tool on PyCharm in a Windows 11 system, attempting to quantize the vit_base_patch16_224 model to int8 type.
During debugging, I found an error when exporting to ONNX format in the export_to_onnx function:
onnx_bytes, _ = get_onnx_bytes_and_metadata( model=model, dummy_input=(input_tensor,), weights_dtype=weights_dtype, model_name=model_name, )

The following problem occurred:
D:\Software\Miniconda\envs\BasicEnv-copy\lib\site-packages\modelopt\torch\utils\cpp_extension.py:88: UserWarning: Ninja is required to load C++ extensions
Unable to load extension modelopt_cuda_ext and falling back to CPU version.
warnings.warn(fail_msg)
D:\Software\Miniconda\envs\BasicEnv-copy\lib\site-packages\torch_init_.py:2150: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert condition, message
D:\Software\Miniconda\envs\BasicEnv-copy\lib\site-packages\modelopt\torch\quantization\nn\modules\tensor_quantizer.py:878: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if inputs.numel() == 0:

Steps/Code to reproduce bug

Expected behavior

I am trying to perform post-training quantization (PTQ) and quantization-aware training (QAT) on a CNN network, save it in ONNX format, then perform INT8 quantization using TensorRT, and finally deploy it on an RTX 3060 Ti for inference.

Who can help?

@ajrasane @cjluo-nv Could you please help me solve this problem?

System information

  • Container used (if applicable): No
  • OS:Windows 11
  • CPU architecture (x86_64, aarch64): X86
  • GPU name (e.g. H100, A100, L40S): NVIDIA GeForce RTX 3060 Ti
  • GPU memory size: 8.0 GB
  • Number of GPUs: 1
  • Library versions (if applicable):
    • Python: 3.10.18
    • ModelOpt version or commit hash: 0.40.0
    • CUDA: 11.8
    • PyTorch: 2.7.1+cu118
    • Transformers: 4.57.3
    • TensorRT-LLM: ?
    • ONNXRuntime: 1.23.0
    • TensorRT: 10.8.1
  • Any other details that may help: ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions