CUDA memory increasing and process freeze [Performance] #22872

kkluonaitis · 2024-11-18T15:12:18Z

Describe the issue

In production I run long-t5 model for data procesing, tried using onnxruntime-gpu 1.19.0. I run 3 processes on the same instances, which share GPU resources, but all processes kinda freeze after gradual GPU memory increase. In nvidia-smi I saw a processes using some GPU memory (not all), but application logs just stopped. Rolled back to onnxruntime to 1.18.0, which works fine. Current dependencies do not allow to upgrade to 1.20.0. I know that sharing GPU between processes may not be the best practice, but this is cost efficient and worked until now.

Any ideas what could be eating up the memory?

To reproduce

The model I use:
https://huggingface.co/agemagician/mlong-t5-tglobal-large

Urgency

No response

Platform

Linux

OS Version

Amazon Linux AMI 2.0.20230606 x86_64

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.19.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.8

Model File

No response

Is this a quantized model?

No

kkluonaitis added the performance issues related to performance regressions label Nov 18, 2024

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA memory increasing and process freeze [Performance] #22872

CUDA memory increasing and process freeze [Performance] #22872

kkluonaitis commented Nov 18, 2024 •

edited

Loading

CUDA memory increasing and process freeze [Performance] #22872

CUDA memory increasing and process freeze [Performance] #22872

Comments

kkluonaitis commented Nov 18, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

kkluonaitis commented Nov 18, 2024 •

edited

Loading