-
Notifications
You must be signed in to change notification settings - Fork 228
Description
Describe the bug
I am testing the torch_quant_to_onnx.py script in the Model Optimizer tool on PyCharm in a Windows 11 system, attempting to quantize the vit_base_patch16_224 model to int8 type.
During debugging, I found an error when exporting to ONNX format in the export_to_onnx function:
onnx_bytes, _ = get_onnx_bytes_and_metadata( model=model, dummy_input=(input_tensor,), weights_dtype=weights_dtype, model_name=model_name, )
The following problem occurred:
D:\Software\Miniconda\envs\BasicEnv-copy\lib\site-packages\modelopt\torch\utils\cpp_extension.py:88: UserWarning: Ninja is required to load C++ extensions
Unable to load extension modelopt_cuda_ext and falling back to CPU version.
warnings.warn(fail_msg)
D:\Software\Miniconda\envs\BasicEnv-copy\lib\site-packages\torch_init_.py:2150: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert condition, message
D:\Software\Miniconda\envs\BasicEnv-copy\lib\site-packages\modelopt\torch\quantization\nn\modules\tensor_quantizer.py:878: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if inputs.numel() == 0:
Steps/Code to reproduce bug
Expected behavior
I am trying to perform post-training quantization (PTQ) and quantization-aware training (QAT) on a CNN network, save it in ONNX format, then perform INT8 quantization using TensorRT, and finally deploy it on an RTX 3060 Ti for inference.
Who can help?
@ajrasane @cjluo-nv Could you please help me solve this problem?
System information
- Container used (if applicable): No
- OS:Windows 11
- CPU architecture (x86_64, aarch64): X86
- GPU name (e.g. H100, A100, L40S): NVIDIA GeForce RTX 3060 Ti
- GPU memory size: 8.0 GB
- Number of GPUs: 1
- Library versions (if applicable):
- Python: 3.10.18
- ModelOpt version or commit hash: 0.40.0
- CUDA: 11.8
- PyTorch: 2.7.1+cu118
- Transformers: 4.57.3
- TensorRT-LLM: ?
- ONNXRuntime: 1.23.0
- TensorRT: 10.8.1
- Any other details that may help: ?