Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow inference on 3080 #64

Open
jbrownkramer opened this issue Nov 4, 2024 · 11 comments
Open

Very slow inference on 3080 #64

jbrownkramer opened this issue Nov 4, 2024 · 11 comments

Comments

@jbrownkramer
Copy link

I would expect the 3080 to be, say, 30 to 50% slower than a V100. But it takes 45 seconds to run inference. The GPU utilization is very low (10% or so). Any idea how this could be fixed?

@jbrownkramer
Copy link
Author

I did model = model.cuda() and image = image.cuda() and I got higher GPU usage and more like 5 to 8 seconds of inference time.

Ran it again (without reloading the model or the image). Massive RAM usage and much longer inference time.

@jbrownkramer
Copy link
Author

Setting precision=torch.float16 seems to fix all of this. The initial inference takes about 1.5s, and subsequent inferences take about .6s.

You could consider updating the model loading part in the python code snippet to

# Load model and preprocessing transform
model, transform = depth_pro.create_model_and_transforms(device=torch.device("cuda"),precision=torch.float16)
model.eval()

@Itachi-6
Copy link

Itachi-6 commented Nov 7, 2024

@jbrownkramer I also have the same problem of larger inference time for a single image. It takes around 30-40 sec to process 1 image. I used the input you gave but it is giving me OutOfMemory Error. Could you please help me on what to do now?

I have Nvidia GTX-1650 GPU.

@jbrownkramer
Copy link
Author

I don't have a GTX-1650 at my disposal. You could try setting the precision even lower.

@csyhping
Copy link

csyhping commented Nov 14, 2024

Hi @jbrownkramer , I tried your suggestion on a 3090.

If set float16, it takes ~17s;
if default, it tasks ~18s.

And it seems they already set it to Half? #17

Do you have any idea about this? Thanks!

@jbrownkramer
Copy link
Author

Hi @jbrownkramer , I tried your suggestion on a 3090.

If set float16, it takes ~17s; if default, it tasks ~18s.

And it seems they already set it to Half? #17

Do you have any idea about this? Thanks!

@csyhping in issue #17 the user is running through run.py, which calls create_model_and_transforms on gpu if you have it, and half precision. However if you look at the definition of create_model_and_transforms

def create_model_and_transforms(
    config: DepthProConfig = DEFAULT_MONODEPTH_CONFIG_DICT,
    device: torch.device = torch.device("cpu"),
    precision: torch.dtype = torch.float32,
) -> Tuple[DepthPro, Compose]:

You see that by default it is float32 and running on cpu. I guess make sure you're doing

model, transform = depth_pro.create_model_and_transforms(device=torch.device("cuda"),precision=torch.float16)
model.eval()

And verify that your GPU is being well utilized when you run inference.

@csyhping
Copy link

Hi @jbrownkramer , I tried your suggestion on a 3090.
If set float16, it takes ~17s; if default, it tasks ~18s.
And it seems they already set it to Half? #17
Do you have any idea about this? Thanks!

@csyhping in issue #17 the user is running through run.py, which calls create_model_and_transforms on gpu if you have it, and half precision. However if you look at the definition of create_model_and_transforms

def create_model_and_transforms(
    config: DepthProConfig = DEFAULT_MONODEPTH_CONFIG_DICT,
    device: torch.device = torch.device("cpu"),
    precision: torch.dtype = torch.float32,
) -> Tuple[DepthPro, Compose]:

You see that by default it is float32 and running on cpu. I guess make sure you're doing

model, transform = depth_pro.create_model_and_transforms(device=torch.device("cuda"),precision=torch.float16)
model.eval()

And verify that your GPU is being well utilized when you run inference.

@jbrownkramer , thanks for your reply. I rechecked my code and it worked.

if default, it takes ~1.5s
if torch.float16, it takes ~0.15s

BTW: Load model takes ~16s

Thanks for your help.

@xulsup
Copy link

xulsup commented Nov 18, 2024

@jbrownkramer

Thanks for your help.

model, transform = depth_pro.create_model_and_transforms(device=torch.device("cuda:0"),precision=torch.float16)
model.eval()

On my laptop with a 4070 graphics card. Its reasoning speed is about half faster than the CPU. The CPU reasoning requires about 30 seconds, and 4070 is about 15 seconds.

@jbrownkramer
Copy link
Author

One point to make for people reading this issue: Load time and first inference may be significantly longer than subsequent inferences. If your workflow is to launch a script, load the model, run inference on one image, then exit, your inference might be 15s per frame. If, on the other hand, you load the model once, and feed it lots of images, individual inference times could be much shorter.

@Itachi-6
Copy link

@jbrownkramer When you say "If, on the other hand, you load the model once, and feed it lots of images"? I am not understanding it properly. Here's what I'm thinking. When you said that I understood it in 2 ways.

  1. Load the model Globally inside the script instead of loading it in function because every time the function called, it will load again. So, if the model is loaded Globally it is like loading the model first and just feeding the images in the function.

  2. Just load the model in the script, eval() it and saved it locally in the project folder. Then remove the previous code and load the model from the local path, from there feed the images to the model.

I really appreciate if you could answer this.

@xulsup
Copy link

xulsup commented Jan 10, 2025

@Itachi-6
What he meant was to let the model actually perform reasoning on a simple example, and then feed the model with the samples you really need to reason about. From this difference, the speed of subsequent reasoning will be faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants