-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow inference on 3080 #64
Comments
I did model = model.cuda() and image = image.cuda() and I got higher GPU usage and more like 5 to 8 seconds of inference time. Ran it again (without reloading the model or the image). Massive RAM usage and much longer inference time. |
Setting precision=torch.float16 seems to fix all of this. The initial inference takes about 1.5s, and subsequent inferences take about .6s. You could consider updating the model loading part in the python code snippet to
|
@jbrownkramer I also have the same problem of larger inference time for a single image. It takes around 30-40 sec to process 1 image. I used the input you gave but it is giving me OutOfMemory Error. Could you please help me on what to do now? I have Nvidia GTX-1650 GPU. |
I don't have a GTX-1650 at my disposal. You could try setting the precision even lower. |
Hi @jbrownkramer , I tried your suggestion on a 3090. If set And it seems they already set it to Half? #17 Do you have any idea about this? Thanks! |
@csyhping in issue #17 the user is running through run.py, which calls create_model_and_transforms on gpu if you have it, and half precision. However if you look at the definition of create_model_and_transforms
You see that by default it is float32 and running on cpu. I guess make sure you're doing
And verify that your GPU is being well utilized when you run inference. |
@jbrownkramer , thanks for your reply. I rechecked my code and it worked. if BTW: Load model takes ~16s Thanks for your help. |
Thanks for your help.
On my laptop with a 4070 graphics card. Its reasoning speed is about half faster than the CPU. The CPU reasoning requires about 30 seconds, and 4070 is about 15 seconds. |
One point to make for people reading this issue: Load time and first inference may be significantly longer than subsequent inferences. If your workflow is to launch a script, load the model, run inference on one image, then exit, your inference might be 15s per frame. If, on the other hand, you load the model once, and feed it lots of images, individual inference times could be much shorter. |
@jbrownkramer When you say "If, on the other hand, you load the model once, and feed it lots of images"? I am not understanding it properly. Here's what I'm thinking. When you said that I understood it in 2 ways.
I really appreciate if you could answer this. |
@Itachi-6 |
I would expect the 3080 to be, say, 30 to 50% slower than a V100. But it takes 45 seconds to run inference. The GPU utilization is very low (10% or so). Any idea how this could be fixed?
The text was updated successfully, but these errors were encountered: