Skip to content

Conversation

@andompesta
Copy link
Contributor

This is the main PR used to refactor TRT inference code to:

  • remove onnx tracing dependencies
  • support multiple trt-configuration for the same model
  • trt-build done using polygrafy subprocess for easier understanding

andompesta and others added 19 commits February 21, 2025 17:45
This PR is in charge of adding a trt-config class.
This class is responsible to build the trt engine given an onnx path and various trt-flags.

As the same model might need different trt-configurations depending on which precision is used, a registry is used to collect all the model configuration.
Based on the provided key the get_config method will return the appropriate model configuration to use.

Each configuration is a dataclass containing:

the needed trt flags
a from_model factory method to feed all needed parameters to the config class
an get_input_profile method that return the max and mix input supported by the build engine
Engine classes are changed to:

use trt-config class instead of mixin classes
@andompesta
Copy link
Contributor Author

andompesta commented Feb 21, 2025

merge 8 description

This PR is in charge of removing onnx-exporter class as it is not needed.
Onnx model comes from HF repo.

@andompesta
Copy link
Contributor Author

andompesta commented Feb 21, 2025

merge 9 description

This PR is in charge of adding a trt-config class.
This class is responsible to build the trt engine given an onnx path and various trt-flags.

As the same model might need different trt-configurations depending on which precision is used, a registry is used to collect all the model configuration.
Based on the provided key the get_config method will return the appropriate model configuration to use.

@timudk here are the main changes
Each configuration is a dataclass containing:

  • the needed trt flags
  • a from_model factory method to feed all needed parameters to the config class
  • an get_input_profile method that return the max and mix input supported by the build engine
  • build process is still based on polygraphy, but instead of using python API a subprocess CLI approach is used for clarity

Engine classes are changed to:

  • use trt-config class instead of mixin and exporter classes

@andompesta
Copy link
Contributor Author

andompesta commented Feb 21, 2025

merge 10 description

This PR implements the changes on trt-manager to use trt-config classes instead of exporters and mixin.

for each porvided model a trt-config is provided by _get_trt_configs method.
for each trt-config a trt-engine is build. trt-build is fully based on trt-config classes. There is no more exporter/model_config split
after trt-engines are build, a trt-runtime is initialized
finally engine classes are instantiated with a valid cuda-stream

@timudk this PR greatly simplify this class by reducing code lines by half.
I'm keeping trt-config and engine classes separate for clarity. They could be merged in a single, really big classes, which I believe it would not provide much benefit

@andompesta
Copy link
Contributor Author

andompesta commented Feb 21, 2025

merge 11 description

This PR change the CLI script to support the new TRT interface.

  1. A try ... except approach is used to solve tensorrt import: people that do not install trt dependencies would experience an error otherwise.
  2. Instead of using
trt: bool = False,
trt_transformer_precision: str = "bf16",

a different set of inputs are used:

  • trt_onnx_dir is used to specify the folder containing onnx models
  • trt_engine_dir specify the location where trt engines will be loaded from or build
  • trt_precision specify the precision of the pipeline. For now the supported precision are: bf16, fp8 and fp4
    All of these 3 args are needed to run trt engines. I found this approach to be easier rather than using a mix of input args and env-variables. @timudk please advise on which approach would be preferred on your side.

The following are additional inputs args that can be provided:

  • trt_batch_size the batch size used to optimize the engine, by default 1
  • trt_static_batch weather the engine is build with static batch size, by default yes
  • trt_static_shape weather the engine is build to support static image shape, by default yes

Note that:

  • Based on the provided trt_precision a set of input values are generated for the trt-manager class.
  • if TRT is not available, but input arguments are provided: an error is raised
  • all trt related input arguments start with trt- prefix

@andompesta andompesta marked this pull request as ready for review February 21, 2025 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant