Skip to content

yester31/TensorRT_Examples

Repository files navigation

Examples of TensorRT models using ONNX

All useful sample codes of TensorRT models using ONNX

0. Development Environment

  • RTX3060 (notebook)
  • WSL
  • Ubuntu 22.04.5 LTS
  • cuda 12.8

conda deactivate conda env remove -n trte -y

conda create -n trte python=3.11 --yes 
conda activate trte

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129
pip install cuda-python==12.9.2
pip install tensorrt-cu12
pip install onnx
pip install opencv-python
pip install timm
pip install matplotlib

pip install -U "nvidia-modelopt[all]"

# Check installation 
python -c "import modelopt; print(modelopt.__version__)"
python -c "import modelopt.torch.quantization.extensions as ext; ext.precompile()"

1. Basic step

  1. Generation TensorRT Model by using ONNX
    1.1 TensorRT CPP API
    1.2 TensorRT Python API
    1.3 Polygraphy

  2. Dynamic shapes for TensorRT
    2.1 Dynamic batch
    2.2 Dynamic input size

2. Intermediate step

  1. Custom Plugin
    3.1 Adding a pre-processing layer by cuda

  2. Modifying an ONNX graph by ONNX GraphSurgeon
    4.1 Extracting a feature map of the last Conv for Grad-Cam
    4.2 Generating a TensorRT model with a custom plugin and ONNX

  3. TensorRT Model Optimizer
    5.0 Train Base Model (resnet18)
    5.1 Base TensorRT (fp16)
    5.2 Explict Quantization (PTQ)
    5.3 Explict Quantization (QAT)
    5.4 Explict Quantization (ONNX PTQ)
    5.5 Implicit Quantization (TensorRT PTQ)
    5.6 Sparsity (2:4 sparsity)
    5.7 Pruning
    5.8 NAS(Neural Architecture Search)
    5.9 Multiple Optimizations Techniques
    5.9.1 (Pruning + Sparsity)
    5.9.2 (Pruning + Sparsity + Quantization(QAT))
    5.9.3 (NAS + Sparsity)
    5.9.4 (NAS + Sparsity + Quantization(QAT))

Framework PyTorch TensorRT TensorRT TensorRT TensorRT TensorRT TensorRT
Opti Technique - - onnx ptq tmo ptq tmo qat tmo sparsity tmo pruning (flops 80%)
Precision fp16 fp16 int8 int8 int8 fp16 fp16
Top-1 Acc [%] 84.58 84.54 84.5 84.2 84.42 83.28 82.76
Top-5 Acc [%] 97.2 97.2 97 97.06 97.1 96.72 96.42
FPS [Frame/sec] 406.27 1463.45 1897.46 1542.34 1572.81 1483.85 1573.2
Avg Latency [ms] 2.46 0.68 0.53 0.65 0.64 0.67 0.64
GPU Mem [MB] 286 138 124 124 138 138 130

3. Advanced step

  1. Super Resolution
    6.1 Real-ESRGAN
  2. Object Detection
    7.1 yolo11
  3. Instance Segmentation
  4. Semantic Segmentation
  5. Depth Estimation
    10.1 Depth Pro

4. reference