[Question] exported ONNX model does not result in same output as the original pytorch model #394

VineetTambe · 2023-07-31T18:56:56Z

❓ Question

I am trying to export the trained pytorch model to onnx so that I can deploy it.
But I am facing some issues where the output of the exported model is not the same as the pytorch model when I run a episode.
I have made sure that I set the model to eval mode before exporting.
I heavily modified the enjoy.py script to export and run the models.

Exporting to ONNX:

    torch_model = ALGOS[algo].load(
        model_path, custom_objects=custom_objects, device=args.device, **kwargs
    )
    torch_model.policy.eval()

    obs = env.reset()

    obs_tensor = torch_model.policy.obs_to_tensor(obs)[0]
      # Export the model
      torch.onnx.export(
          torch_model.policy,  # model being run
          obs_tensor,  # model input (or a tuple for multiple inputs)
          output_model_name,  # where to save the model (can be a file or file-like object)
          export_params=True,  # store the trained parameter weights inside the model file
          opset_version=10,  # the ONNX version to export the model to
          do_constant_folding=True,  # whether to execute constant folding for optimization
          input_names=["input"],  # the model's input names
          output_names=["output"],  # the model's output names
          dynamic_axes={
              "input": {0: "batch_size"},  # variable length axes
              "output": {0: "batch_size"},
          },
      )

Running inference using the onnx model:

    ort_session = onnxruntime.InferenceSession(onnx_model_path)
    ort_inputs = {ort_session.get_inputs()[0].name: obs}
    action = ort_session.run(None, ort_inputs)[0]
    obs, reward, done, infos = env.step(action)

The above are the only modifications done to enjoy.py in order to export and run the model. However, the results of the trained agent is not same.
Am I missing something obvious here? Any help would be greatly appriciated!

Checklist

I have checked that there is no similar issue in the repo
I have read the SB3 documentation
I have read the RL Zoo documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin · 2023-07-31T21:06:07Z

Hello,
could you be more specific on which algo/env you are using ?

VineetTambe · 2023-07-31T21:11:42Z

Hey,

I am using the qrdqn algo and a custom environment based on top of the minigrid env

araffin · 2023-08-01T11:56:16Z

I am using the qrdqn algo and a custom environment based on top of the minigrid env

Could you share the observation and action spaces?

You are probably missing pre-processing, see DLR-RM/stable-baselines3#1349 (comment)
(we welcome a PR that updates our doc).

VineetTambe · 2023-08-01T19:21:50Z

Could you share the observation and action spaces?

Observation space: Box(0, 255, (50,), uint8)
Action Space:          Discrete(4)

You are probably missing pre-processing
I tried doing what is done in the comment linked - which is create a new pytorch model class which has the the policy preprocessing step in the forward pass (please correct me if I am wrong here)

What exactly does the pre-processing entail? Is there anything more to it?
Because even after doing the above step I get the same incorrect results.
Is there any postprocessing step that I might be missing?

araffin · 2023-08-03T12:36:31Z

You are probably either missing image pre-processing (dividing by 255 before feeding to the network) or are not comparing to the greedy policy.

The following works and was tested comparing the quantiles returned:

import numpy as np
import torch as th
from sb3_contrib import QRDQN


model = QRDQN("MlpPolicy", "LunarLander-v2")
model.policy.to("cpu")
# Note: by default model.policy.quantile_net.forward() returns quantiles
onnxable_model = model.policy
observation_size = model.observation_space.shape[0]

dummy_input = th.randn(1, observation_size)
onnx_path = "qrdqn_model.onnx"
th.onnx.export(
    onnxable_model,
    dummy_input,
    onnx_path,
    opset_version=17,
    input_names=["input"],
)

##### Load and test with onnx

import numpy as np
import onnx
import onnxruntime as ort

onnx_model = onnx.load(onnx_path)
onnx.checker.check_model(onnx_model)

# observation = np.zeros((1, observation_size)).astype(np.float32)
observation = dummy_input.cpu().numpy()
ort_sess = ort.InferenceSession(onnx_path)
action = ort_sess.run(None, {"input": observation})[0]

print(action)
print(model.predict(observation, deterministic=True)[0])

VineetTambe added the question Further information is requested label Jul 31, 2023

araffin added the more information needed Please fill the issue template completely label Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] exported ONNX model does not result in same output as the original pytorch model #394

[Question] exported ONNX model does not result in same output as the original pytorch model #394

VineetTambe commented Jul 31, 2023 •

edited by araffin

Loading

araffin commented Jul 31, 2023

VineetTambe commented Jul 31, 2023

araffin commented Aug 1, 2023

VineetTambe commented Aug 1, 2023 •

edited

Loading

araffin commented Aug 3, 2023

[Question] exported ONNX model does not result in same output as the original pytorch model #394

[Question] exported ONNX model does not result in same output as the original pytorch model #394

Comments

VineetTambe commented Jul 31, 2023 • edited by araffin Loading

❓ Question

Checklist

araffin commented Jul 31, 2023

VineetTambe commented Jul 31, 2023

araffin commented Aug 1, 2023

VineetTambe commented Aug 1, 2023 • edited Loading

araffin commented Aug 3, 2023

VineetTambe commented Jul 31, 2023 •

edited by araffin

Loading

VineetTambe commented Aug 1, 2023 •

edited

Loading