Blank output issue with CUDAExecutionProvider - Onnx Model Converted to fp16 #23797

abhishetty7191 · 2025-02-24T08:54:30Z

Describe the issue

Issue description:
When running a UNet segmentation model that has been converted to ONNX format with FP16 precision, using the CUDAExecutionProvider, the output appears to be blank. This issue is encountered during the inference stage, where the model should ideally generate segmentation masks for the input images. However, instead of producing meaningful outputs, the model returns empty or blank results.

Expected Behavior:
The output with CUDAExecutionProvider should match with PyTorch output and other EPs(CPUExecutionProvider and TensorrtExecutionProvider) output.

Results:

To reproduce

Comparison of outputs of Pytorch, Onnxruntime with Execution Providers(EPs)(CPUExecutionProvider, CUDAExecutionProvider and TensorrtExecutionProvider)

from segmentation_models_pytorch import Unet
import torch

torch.manual_seed(0)

import torchvision.transforms as transforms
import cv2
import numpy as np
import onnxruntime as ort
import matplotlib.pyplot as plt

print(f"Torch version: {torch.__version__}")
print(f"Onnxruntime version: {ort.__version__}")

Torch version: 2.2.2+cu121
Onnxruntime version: 1.20.1

Preprocessing Transformations for Input Images

preprocess = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

Load and preprocess the input image using OpenCV

Image Reference

image_path = "n01644373_tree_frog.JPEG"  # Replace with your image path
input_image = cv2.imread(image_path)
input_image = cv2.cvtColor(input_image, cv2.COLOR_BGR2RGB)
input_image_resized = cv2.resize(input_image, (512, 512))
plt.imshow(input_image)
plt.axis("off")
plt.show()
input_data = preprocess(input_image_resized)
input_data = input_data.unsqueeze(0)
input_data = input_data.numpy()
# Check if a GPU is available and use it if possible
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Define the Unet model

def load_pytorch_model():
    model = Unet(
        encoder_name="timm-efficientnet-b2",
        in_channels=3,
        classes=5,
        encoder_weights="imagenet",
    )
    model.eval()  # Set the model to evaluation mode
    model.to(device)
    return model

Pytorch

Define Pytorch inference and post process

def pytorch_inference(model, input):

    input_batch = torch.Tensor(input).to(device)

    # Perform inference
    with torch.no_grad():
        output = model(input_batch)

    # Postprocess the output
    output = output.cpu().numpy()
    return output.argmax(axis=1)[0]

Final output - resize to original image size

def resize_output(input):
    return cv2.resize(input, (input_image.shape[1], input_image.shape[0]))

Pytorch Output

pytorch_model = load_pytorch_model()
pytorch_output = pytorch_inference(pytorch_model, input_data)
plt.imshow(resize_output(pytorch_output.astype(np.uint8) * 50), cmap="gray")
plt.axis("off")

Onnxruntime EPs

Define model conversion to onnx

Model is converted to fp16 with input and output of model kept as float32

def onnx_conversion(pytorch_model, output_path, enable_fp16=False):
    import onnx
    from onnxconverter_common import float16

    # FP32CastedModel class:
    class FP32CastedModel(torch.nn.Module):
        def __init__(self, model):
            super().__init__()
            self.model = model

        def __call__(self, input):
            with torch.no_grad():
                output = self.model(input)
            return output.to(torch.float32)

    random_tensor = torch.randn(
        size=(1, 3, 512, 512), requires_grad=True, dtype=torch.float32
    ).cuda()

    with torch.no_grad():
        torch.onnx.export(
            model=FP32CastedModel(pytorch_model),
            f=output_path,
            args=random_tensor,
            export_params=True,
            input_names=["input"],
            output_names=["output"],
            do_constant_folding=True,
            opset_version=17,
            dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}},
        )

    model = onnx.load(output_path)
    if enable_fp16:
        model = float16.convert_float_to_float16(model, keep_io_types=True)
        onnx.save(model, output_path)
    return model

Using pytorch model to generate onnx model

Here there will be warnings such as

UserWarning: the float32 number -3.94251777890986e-08 will be truncated to -1e-07 warnings.warn("the float32 number {} will be truncated to {}".format(neg_max, -min_positive_val))

enable_fp16 = True
onnx_model_name = "unet_model_fp16.onnx" if enable_fp16 else "unet_model.onnx"
onnx_model = onnx_conversion(pytorch_model, onnx_model_name, enable_fp16)

Define loading Onnx model separately with each EPs

For CUDAExecutionProvider, the flag cudnn_conv_use_max_workspace is set to 1.

For TensorrtExecutionProvider, the flag trt_fp16_enable is set True for fp16 model.

def load_onnx_model_cpu(model_path):
    """onnx CPU inference"""
    providers = [
        (
            "CPUExecutionProvider",
            {},
        )
    ]
    sess_options = ort.SessionOptions()
    # sess_options.log_severity_level = 0  # Verbose
    sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL

    return ort.InferenceSession(
        model_path, sess_options=sess_options, providers=providers
    )

def load_onnx_model_cuda(model_path):
    """onnx CUDA inference"""
    providers = [
        (
            "CUDAExecutionProvider",
            {
                "device_id": 0,
                "arena_extend_strategy": "kNextPowerOfTwo",
                "gpu_mem_limit": 10 * 1024 * 1024 * 1024,
                "cudnn_conv_algo_search": "HEURISTIC",
                "do_copy_in_default_stream": True,
                "cudnn_conv_use_max_workspace": 1,
            },
        )
    ]
    sess_options = ort.SessionOptions()
    # sess_options.log_severity_level = 0  # Verbose
    sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL

    return ort.InferenceSession(
        model_path, sess_options=sess_options, providers=providers
    )

def load_onnx_model_trt(model_path, enable_fp16=False):
    """onnx TRT inference"""
    providers = [
        (
            "TensorrtExecutionProvider",
            {
                "device_id": 0,
                "trt_fp16_enable": enable_fp16,
                "trt_builder_optimization_level": 5,
                "trt_max_workspace_size": 10 * 1024 * 1024 * 1024,
                "trt_engine_cache_enable": True,
                "trt_timing_cache_enable": True,
                "trt_engine_cache_path": "ModelCache/",
                "trt_timing_cache_path": "ModelCache/",
            },
        )
    ]
    sess_options = ort.SessionOptions()
    # sess_options.log_severity_level = 0  # Verbose
    sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL

    return ort.InferenceSession(
        model_path, sess_options=sess_options, providers=providers
    )

Loading model with all EPs separately

onnx_cpu = load_onnx_model_cpu(onnx_model_name)
onnx_cuda = load_onnx_model_cuda(onnx_model_name)
onnx_trt = load_onnx_model_trt(onnx_model_name, enable_fp16)

Inference with all Execution Providers(EPs) separately

def ort_inference(model, input):
    input_name = onnx_cuda.get_inputs()[0].name
    output = model.run(None, {input_name: input})
    return output[0].astype(np.float32).argmax(axis=1)[0]

# cpu inference
ort_output_cpu = ort_inference(onnx_cpu, input_data)
# cuda inference
ort_output_cuda = ort_inference(onnx_cuda, input_data)
# trt inference
ort_output_trt = ort_inference(onnx_trt, input_data)

Final Results - input image and predicted outputs

fig, axs = plt.subplots(2, 3)
axs[0][0].axis("off")
axs[0][1].axis("off")
axs[0][2].axis("off")
axs[1][0].axis("off")
axs[1][1].axis("off")
axs[1][2].axis("off")
axs[0][0].imshow(input_image)
axs[0][0].title.set_text("Input")
axs[0][1].imshow(resize_output(pytorch_output.astype(np.uint8) * 50), cmap="gray")
axs[0][1].title.set_text("Pytorch output")
axs[1][0].imshow(resize_output(ort_output_cpu.astype(np.uint8) * 50), cmap="gray")
axs[1][0].title.set_text("ort_cpu output")
axs[1][1].imshow(resize_output(ort_output_cuda.astype(np.uint8) * 50), cmap="gray")
axs[1][1].title.set_text("ort_cuda output")
axs[1][2].imshow(resize_output(ort_output_trt.astype(np.uint8) * 50), cmap="gray")
axs[1][2].title.set_text("ort_trt output")

Urgency

High, using TensorrtExecutionProvider for now

Platform

Windows

OS Version

11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.20.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

12.6

cuDNN Library Version

9.5.1

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blank output issue with CUDAExecutionProvider - Onnx Model Converted to fp16 #23797

Blank output issue with CUDAExecutionProvider - Onnx Model Converted to fp16 #23797

abhishetty7191 commented Feb 24, 2025 •

edited

Loading

Blank output issue with CUDAExecutionProvider - Onnx Model Converted to fp16 #23797

Blank output issue with CUDAExecutionProvider - Onnx Model Converted to fp16 #23797

Comments

abhishetty7191 commented Feb 24, 2025 • edited Loading

Describe the issue

To reproduce

Comparison of outputs of Pytorch, Onnxruntime with Execution Providers(EPs)(CPUExecutionProvider, CUDAExecutionProvider and TensorrtExecutionProvider)

Preprocessing Transformations for Input Images

Load and preprocess the input image using OpenCV

Define the Unet model

Pytorch

Define Pytorch inference and post process

Final output - resize to original image size

Pytorch Output

Onnxruntime EPs

Define model conversion to onnx

Using pytorch model to generate onnx model

Define loading Onnx model separately with each EPs

Loading model with all EPs separately

Inference with all Execution Providers(EPs) separately

Final Results - input image and predicted outputs

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

cuDNN Library Version

abhishetty7191 commented Feb 24, 2025 •

edited

Loading