All useful sample codes of TensorRT models using ONNX
- RTX3060 (notebook)
- WSL
- Ubuntu 22.04.5 LTS
- cuda 12.8
conda deactivate conda env remove -n trte -y
conda create -n trte python=3.11 --yes
conda activate trte
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129
pip install cuda-python==12.9.2
pip install tensorrt-cu12
pip install onnx
pip install opencv-python
pip install timm
pip install matplotlib
pip install -U "nvidia-modelopt[all]"
# Check installation
python -c "import modelopt; print(modelopt.__version__)"
python -c "import modelopt.torch.quantization.extensions as ext; ext.precompile()"
-
Generation TensorRT Model by using ONNX
1.1 TensorRT CPP API
1.2 TensorRT Python API
1.3 Polygraphy -
Dynamic shapes for TensorRT
2.1 Dynamic batch
2.2 Dynamic input size
-
Custom Plugin
3.1 Adding a pre-processing layer by cuda -
Modifying an ONNX graph by ONNX GraphSurgeon
4.1 Extracting a feature map of the last Conv for Grad-Cam
4.2 Generating a TensorRT model with a custom plugin and ONNX -
TensorRT Model Optimizer
5.0 Train Base Model (resnet18)
5.1 Base TensorRT (fp16)
5.2 Explict Quantization (PTQ)
5.3 Explict Quantization (QAT)
5.4 Explict Quantization (ONNX PTQ)
5.5 Implicit Quantization (TensorRT PTQ)
5.6 Sparsity (2:4 sparsity)
5.7 Pruning
5.8 NAS(Neural Architecture Search)
5.9 Multiple Optimizations Techniques
5.9.1 (Pruning + Sparsity)
5.9.2 (Pruning + Sparsity + Quantization(QAT))
5.9.3 (NAS + Sparsity)
5.9.4 (NAS + Sparsity + Quantization(QAT))
Framework | PyTorch | TensorRT | TensorRT | TensorRT | TensorRT | TensorRT | TensorRT |
---|---|---|---|---|---|---|---|
Opti Technique | - | - | onnx ptq | tmo ptq | tmo qat | tmo sparsity | tmo pruning (flops 80%) |
Precision | fp16 | fp16 | int8 | int8 | int8 | fp16 | fp16 |
Top-1 Acc [%] | 84.58 | 84.54 | 84.5 | 84.2 | 84.42 | 83.28 | 82.76 |
Top-5 Acc [%] | 97.2 | 97.2 | 97 | 97.06 | 97.1 | 96.72 | 96.42 |
FPS [Frame/sec] | 406.27 | 1463.45 | 1897.46 | 1542.34 | 1572.81 | 1483.85 | 1573.2 |
Avg Latency [ms] | 2.46 | 0.68 | 0.53 | 0.65 | 0.64 | 0.67 | 0.64 |
GPU Mem [MB] | 286 | 138 | 124 | 124 | 138 | 138 | 130 |
- Super Resolution
6.1 Real-ESRGAN - Object Detection
7.1 yolo11 - Instance Segmentation
- Semantic Segmentation
- Depth Estimation
10.1 Depth Pro