These scripts were last tested using the NGC TensorRT Container Version 20.06-py3. You can see the corresponding framework versions for this container here.
NOTE: This INT8 example is only valid for fixed-shape ONNX models at the moment.
INT8 Calibration on dynamic-shape models is now supported, however this example has not been updated to reflect that yet. For more details on INT8 Calibration for dynamic-shape models, please see the documentation.
See ./onnx_to_tensorrt.py -h
for full list of command line arguments.
./onnx_to_tensorrt.py --explicit-batch \
--onnx resnet50/model.onnx \
--fp16 \
--int8 \
--calibration-cache="caches/yolov6.cache" \
-o resnet50.int8.engine
See the INT8 Calibration section below for details on calibration using your own model or different data, where you don't have an existing calibration cache or want to create a new one.
See ImagenetCalibrator.py for a reference implementation of TensorRT's IInt8EntropyCalibrator2.
This class can be tweaked to work for other kinds of models, inputs, etc.
In the Quickstart section above, we made use of a pre-existing cache, caches/yolov6.cache, to save time for the sake of an example.
However, to calibrate using different data or a different model, you can do so with the --calibration-data
argument.
- This requires that you've mounted a dataset, such as Imagenet, to use for calibration.
- Add something like
-v /imagenet:/imagenet
to your Docker command in Step (1) to mount a dataset found locally at/imagenet
.
- Add something like
- You can specify your own
preprocess_func
by defining it inside ofImageCalibrator.py
# Path to dataset to use for calibration.
# **Not necessary if you already have a calibration cache from a previous run.
CALIBRATION_DATA="/imagenet"
# Truncate calibration images to a random sample of this amount if more are found.
# **Not necessary if you already have a calibration cache from a previous run.
MAX_CALIBRATION_SIZE=512
# Calibration cache to be used instead of calibration data if it already exists,
# or the cache will be created from the calibration data if it doesn't exist.
CACHE_FILENAME="caches/yolov6.cache"
# Path to ONNX model
ONNX_MODEL="model/yolov6.onnx"
# Path to write TensorRT engine to
OUTPUT="yolov6.int8.engine"
# Creates an int8 engine from your ONNX model, creating ${CACHE_FILENAME} based
# on your ${CALIBRATION_DATA}, unless ${CACHE_FILENAME} already exists, then
# it will use simply use that instead.
python3 onnx_to_tensorrt.py --fp16 --int8 -v \
--max_calibration_size=${MAX_CALIBRATION_SIZE} \
--calibration-data=${CALIBRATION_DATA} \
--calibration-cache=${CACHE_FILENAME} \
--preprocess_func=${PREPROCESS_FUNC} \
--explicit-batch \
--onnx ${ONNX_MODEL} -o ${OUTPUT}
In order to calibrate your model correctly, you should pre-process
your data the same way
that you would during inference.