Very slow inference on 3080 #64

jbrownkramer · 2024-11-04T15:06:30Z

I would expect the 3080 to be, say, 30 to 50% slower than a V100. But it takes 45 seconds to run inference. The GPU utilization is very low (10% or so). Any idea how this could be fixed?

jbrownkramer · 2024-11-04T15:53:43Z

I did model = model.cuda() and image = image.cuda() and I got higher GPU usage and more like 5 to 8 seconds of inference time.

Ran it again (without reloading the model or the image). Massive RAM usage and much longer inference time.

jbrownkramer · 2024-11-04T16:01:33Z

Setting precision=torch.float16 seems to fix all of this. The initial inference takes about 1.5s, and subsequent inferences take about .6s.

You could consider updating the model loading part in the python code snippet to

# Load model and preprocessing transform
model, transform = depth_pro.create_model_and_transforms(device=torch.device("cuda"),precision=torch.float16)
model.eval()

Itachi-6 · 2024-11-07T16:15:38Z

@jbrownkramer I also have the same problem of larger inference time for a single image. It takes around 30-40 sec to process 1 image. I used the input you gave but it is giving me OutOfMemory Error. Could you please help me on what to do now?

I have Nvidia GTX-1650 GPU.

jbrownkramer · 2024-11-08T15:55:35Z

I don't have a GTX-1650 at my disposal. You could try setting the precision even lower.

csyhping · 2024-11-14T07:16:14Z

Hi @jbrownkramer , I tried your suggestion on a 3090.

If set float16, it takes ~17s;
if default, it tasks ~18s.

And it seems they already set it to Half? #17

Do you have any idea about this? Thanks!

jbrownkramer · 2024-11-14T18:27:54Z

Hi @jbrownkramer , I tried your suggestion on a 3090.

If set float16, it takes ~17s; if default, it tasks ~18s.

And it seems they already set it to Half? #17

Do you have any idea about this? Thanks!

@csyhping in issue #17 the user is running through run.py, which calls create_model_and_transforms on gpu if you have it, and half precision. However if you look at the definition of create_model_and_transforms

def create_model_and_transforms(
    config: DepthProConfig = DEFAULT_MONODEPTH_CONFIG_DICT,
    device: torch.device = torch.device("cpu"),
    precision: torch.dtype = torch.float32,
) -> Tuple[DepthPro, Compose]:

You see that by default it is float32 and running on cpu. I guess make sure you're doing

model, transform = depth_pro.create_model_and_transforms(device=torch.device("cuda"),precision=torch.float16)
model.eval()

And verify that your GPU is being well utilized when you run inference.

csyhping · 2024-11-15T02:49:41Z

Hi @jbrownkramer , I tried your suggestion on a 3090.
If set float16, it takes ~17s; if default, it tasks ~18s.
And it seems they already set it to Half? #17
Do you have any idea about this? Thanks!

@csyhping in issue #17 the user is running through run.py, which calls create_model_and_transforms on gpu if you have it, and half precision. However if you look at the definition of create_model_and_transforms
def create_model_and_transforms(
    config: DepthProConfig = DEFAULT_MONODEPTH_CONFIG_DICT,
    device: torch.device = torch.device("cpu"),
    precision: torch.dtype = torch.float32,
) -> Tuple[DepthPro, Compose]:
You see that by default it is float32 and running on cpu. I guess make sure you're doing
model, transform = depth_pro.create_model_and_transforms(device=torch.device("cuda"),precision=torch.float16)
model.eval()
And verify that your GPU is being well utilized when you run inference.

@jbrownkramer , thanks for your reply. I rechecked my code and it worked.

if default, it takes ~1.5s
if torch.float16, it takes ~0.15s

BTW: Load model takes ~16s

Thanks for your help.

xulsup · 2024-11-18T14:02:48Z

@jbrownkramer

Thanks for your help.

model, transform = depth_pro.create_model_and_transforms(device=torch.device("cuda:0"),precision=torch.float16)
model.eval()

On my laptop with a 4070 graphics card. Its reasoning speed is about half faster than the CPU. The CPU reasoning requires about 30 seconds, and 4070 is about 15 seconds.

jbrownkramer · 2024-11-18T17:28:16Z

One point to make for people reading this issue: Load time and first inference may be significantly longer than subsequent inferences. If your workflow is to launch a script, load the model, run inference on one image, then exit, your inference might be 15s per frame. If, on the other hand, you load the model once, and feed it lots of images, individual inference times could be much shorter.

Itachi-6 · 2024-11-19T14:40:50Z

@jbrownkramer When you say "If, on the other hand, you load the model once, and feed it lots of images"? I am not understanding it properly. Here's what I'm thinking. When you said that I understood it in 2 ways.

Load the model Globally inside the script instead of loading it in function because every time the function called, it will load again. So, if the model is loaded Globally it is like loading the model first and just feeding the images in the function.
Just load the model in the script, eval() it and saved it locally in the project folder. Then remove the previous code and load the model from the local path, from there feed the images to the model.

I really appreciate if you could answer this.

xulsup · 2025-01-10T07:26:59Z

@Itachi-6
What he meant was to let the model actually perform reasoning on a simple example, and then feed the model with the samples you really need to reason about. From this difference, the speed of subsequent reasoning will be faster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very slow inference on 3080 #64

Very slow inference on 3080 #64

jbrownkramer commented Nov 4, 2024

jbrownkramer commented Nov 4, 2024

jbrownkramer commented Nov 4, 2024

Itachi-6 commented Nov 7, 2024 •

edited

Loading

jbrownkramer commented Nov 8, 2024

csyhping commented Nov 14, 2024 •

edited

Loading

jbrownkramer commented Nov 14, 2024

csyhping commented Nov 15, 2024

xulsup commented Nov 18, 2024

jbrownkramer commented Nov 18, 2024

Itachi-6 commented Nov 19, 2024

xulsup commented Jan 10, 2025

Very slow inference on 3080 #64

Very slow inference on 3080 #64

Comments

jbrownkramer commented Nov 4, 2024

jbrownkramer commented Nov 4, 2024

jbrownkramer commented Nov 4, 2024

Itachi-6 commented Nov 7, 2024 • edited Loading

jbrownkramer commented Nov 8, 2024

csyhping commented Nov 14, 2024 • edited Loading

jbrownkramer commented Nov 14, 2024

csyhping commented Nov 15, 2024

xulsup commented Nov 18, 2024

jbrownkramer commented Nov 18, 2024

Itachi-6 commented Nov 19, 2024

xulsup commented Jan 10, 2025

Itachi-6 commented Nov 7, 2024 •

edited

Loading

csyhping commented Nov 14, 2024 •

edited

Loading