Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inputs are reordered by TensorRT provider #22729

Open
BengtGustafsson opened this issue Nov 5, 2024 · 11 comments
Open

Inputs are reordered by TensorRT provider #22729

BengtGustafsson opened this issue Nov 5, 2024 · 11 comments
Assignees
Labels
ep:TensorRT issues related to TensorRT execution provider

Comments

@BengtGustafsson
Copy link
Contributor

Describe the issue

When loading the onnx file inside in2out3.zip there should be two inputs and the one at index 0 should be named input1 and at index 1 input2.

With CPU or DML provider enabled this works but with TensorRT provider they are in opposite order.

We have other onnx files with more than one input and/or output and there the reordering does not happen.

A guess is that the onnx parser used by tensorRT uses a hash-map or something that does not preserve the index to input mapping.

I'm not sure what the actual guarantees of your API are but I'm sure more users than us are relying on the index based ports are in the onnx data order.

To reproduce

Load the onnx into a ort::Session set up for TensorRT processing. Get the input names using code like:

   const int modelInputs = int(session->GetInputCount());
    Ort::AllocatorWithDefaultOptions allocator;
    for (int i = 0; i < modelInputs; i++) {
       std::cout << session->GetInputNameAllocated(i, allocator).get() << std::endl;
    }

It will output:
Input2
Input1

for TensorRT provider and the reverse (correctly) for other providers.

Urgency

No response

Platform

Windows

OS Version

Windows11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.19.2

ONNX Runtime API

C++

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

Trt 10.4.0.26

@github-actions github-actions bot added ep:TensorRT issues related to TensorRT execution provider ep:DML issues related to the DirectML execution provider labels Nov 5, 2024
@jywu-msft jywu-msft assigned chilo-ms and unassigned chilo-ms Nov 5, 2024
@jywu-msft
Copy link
Member

typically people feed inputs by name, rather than relying on position.
however, what you describe seems unexpected, since using different EP's shouldn't affect how model inputs are processed.
I tested your model with onnxruntime-gpu 1.19.2 python package (same underlying impl for retrieving input names)

>>> import onnxruntime as ort
>>> sess = ort.InferenceSession('in2out3.onnx', providers=['TensorrtExecutionProvider'])
>>> sess.get_inputs()[0].name
'input1'
>>> sess.get_inputs()[1].name
'input2'

@fdwr fdwr removed the ep:DML issues related to the DirectML execution provider label Nov 5, 2024
@BengtGustafsson
Copy link
Contributor Author

That's strange. Does the Python bindings use a C function which returns the complete input array instead of building it using the function GetInputNameAllocated that we use? Are you sure you use the same TrT version. As the error only appears on TrT provider I presume onnxruntime reads back the input list after letting TrT do the parsing. I think there are also two different onnx parsers you can use for TrT. I never understood the difference or which one to prefer, we only build onnxruntime without trying to modify the parsing. Previously we had a build based on TrT 8.x and ORT 1.17 which did not show this error.

@jywu-msft
Copy link
Member

+@chilo-ms can you help take a quick look?

@jywu-msft
Copy link
Member

btw, can you elaborate on "there are two different onnx parsers you can use for TRT"

@jywu-msft
Copy link
Member

it's still better to rely on input name rather than position. when you call InferenceSession.Run() you are feeding the values by input names anyway.

@chilo-ms
Copy link
Contributor

chilo-ms commented Nov 8, 2024

That's strange and I can't repro from my side.

On Linux, i tested following code:

#include "onnxruntime_c_api.h"
#include "onnxruntime_cxx_api.h"

#include <iostream>

int main(int argc, char *argv[]) {
  Ort::Env env = Ort::Env(ORT_LOGGING_LEVEL_VERBOSE, "Default");
  Ort::SessionOptions session_options;

  const auto& api = Ort::GetApi();
  OrtTensorRTProviderOptionsV2* trt_options;
  api.CreateTensorRTProviderOptions(&trt_options);

  std::unique_ptr<OrtTensorRTProviderOptionsV2, decltype(api.ReleaseTensorRTProviderOptions)> rel_trt_options(trt_options, api.ReleaseTensorRTProviderOptions);
  api.SessionOptionsAppendExecutionProvider_TensorRT_V2(static_cast<OrtSessionOptions*>(session_options), rel_trt_options.get());

  Ort::Session session(env, "/home/lochi/repro/in2out3/in2out3.onnx", session_options);

  const int modelInputs = int(session.GetInputCount());
  Ort::AllocatorWithDefaultOptions allocator;
  for (int i = 0; i < modelInputs; i++) {
     std::cout << session.GetInputNameAllocated(i, allocator).get() << std::endl;
  }
}

and the output is:
input1
input2

On Windows, i had an issue linking my test app against onnxruntime.dll, so i turned to onnxruntime_perf_test and add the session.GetInputNameAllocated(i, allocator).get() to it.
The output is also:
input1
input2

@chilo-ms
Copy link
Contributor

chilo-ms commented Nov 8, 2024

Also, i don't think it has anything to do with the TensorRT parser. The model inputs which kept inside ORT session won't be affected by EPs.
And TensorRT EP and TRT parser use the inputs from the session which is read only.

Re: I presume onnxruntime reads back the input list after letting TrT do the parsing
As far as i know ORT doesn't do this.

It needs more investigation, probably share the whole code to help us repro?

@BengtGustafsson
Copy link
Contributor Author

BengtGustafsson commented Nov 15, 2024

I did some research into this. I think it is a bug in the "fake" onnx file that points out the engine file after optimizing with trt_dump_ep_context_model = true;

As it seems, when loading this file back on the next run the inputs are reordered.

This is why your little test program that does not do caching works.

Note that we have another test network with two inputs, but only one output, which doesn't show this erroneous behaviour.

@BengtGustafsson
Copy link
Contributor Author

This is the resulting file that points out the engine file, if I understand your system correctly.

_ctx.zip

@BengtGustafsson
Copy link
Contributor Author

This was confirmed by listing its contents using our Julia integration of onnx:

ulia> using ONNXLowLevel
julia> onnx = load("/home/gunnar/Downloads/_ctx.onnx")
ModelProto
ir_version: 10 (IR_VERSION)
opset_import: (com.microsoft.nchwc, 1), 12, (com.ms.internal.nhwc, 12), (ai.onnx.ml, 5), (trt.plugins, 1000)
, (ai.onnx.training, 1), (ai.onnx.preview.training, 1), (com.microsoft.experimental, 1), (com.microsoft, 1), (
org.pytorch.aten, 1), (com.microsoft.dml, 1)
model_version: 0
graph: TRTKernel_graph_torch-jit-export_13403233791205608753_0
julia> onnx.graph.input
2-element Vector{ValueInfoProto}:
input2
input1

@chilo-ms
Copy link
Contributor

I can repro now and thanks for pointing out the issue.
The PR should fix the wrong input order issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:TensorRT issues related to TensorRT execution provider
Projects
None yet
Development

No branches or pull requests

4 participants