Issues with torch_quant_to_onnx.py

## Describe the bug
I am testing the `torch_quant_to_onnx.py` script in the Model Optimizer tool on PyCharm in a Windows 11 system, attempting to quantize the `vit_base_patch16_224` model to int8 type.
During debugging, I found an error when exporting to ONNX format in the `export_to_onnx` function:
`onnx_bytes, _ = get_onnx_bytes_and_metadata(
model=model,
dummy_input=(input_tensor,),
weights_dtype=weights_dtype,
model_name=model_name,
)`

The following problem occurred: 
D:\Software\Miniconda\envs\BasicEnv-copy\lib\site-packages\modelopt\torch\utils\cpp_extension.py:88: UserWarning: Ninja is required to load C++ extensions
Unable to load extension modelopt_cuda_ext and falling back to CPU version. 
warnings.warn(fail_msg)
D:\Software\Miniconda\envs\BasicEnv-copy\lib\site-packages\torch\__init__.py:2150: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! 
assert condition, message
D:\Software\Miniconda\envs\BasicEnv-copy\lib\site-packages\modelopt\torch\quantization\nn\modules\tensor_quantizer.py:878: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! 
if inputs.numel() == 0:



### Steps/Code to reproduce bug




### Expected behavior
I am trying to perform post-training quantization (PTQ) and quantization-aware training (QAT) on a CNN network, save it in ONNX format, then perform INT8 quantization using TensorRT, and finally deploy it on an RTX 3060 Ti for inference.

### Who can help?
@ajrasane @cjluo-nv  Could you please help me solve this problem?

## System information
- Container used (if applicable):  No
- OS：Windows 11
- CPU architecture (x86_64, aarch64): X86
- GPU name (e.g. H100, A100, L40S): NVIDIA GeForce RTX 3060 Ti
- GPU memory size: 8.0 GB
- Number of GPUs: 1
- Library versions (if applicable):
  - Python: 3.10.18
  - ModelOpt version or commit hash: 0.40.0
  - CUDA: 11.8
  - PyTorch: 2.7.1+cu118
  - Transformers: 4.57.3
  - TensorRT-LLM:  ?
  - ONNXRuntime: 1.23.0
  - TensorRT: 10.8.1
- Any other details that may help: ?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues with torch_quant_to_onnx.py #724

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Who can help?

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues with torch_quant_to_onnx.py #724

Description

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Who can help?

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions