You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When creating a GroundingDinoProcessor object, it is currently not possible to pass a size to which the image processor would resize the image before passing it on. Since GroundingDinoProcessor passes the images to GroundingDinoImageProcessor, which itself allows a "do_resize" and "size" argument, it would be a simple change to allow custom resizing.
Motivation
This feature is interesting to have when performing inferences on images that are small, since the current default resizing is putting the shortest edge at 800px or the longest edge at 1333px, and inference speed depends strongly on image size. It also helps with GPU memory usage.
I saw a 60% speed difference for inference on one image when manually forcing sizes around 400px. I also went from GPU OOM errors when using a batch of 2 ~400px pictures to no issues for batches of > 20 pictures.
Your contribution
I'm willing to do the PR if the maintainers think this is a good change!
The text was updated successfully, but these errors were encountered:
Hey! If I understood correctly, you want to pass image processing related arguments when doing processor(text, images). We are working on it and standardizing kwargs for processors.
GroundingDINO PR is here (#31964) to keep track of the progress :)
Feature request
When creating a GroundingDinoProcessor object, it is currently not possible to pass a size to which the image processor would resize the image before passing it on. Since GroundingDinoProcessor passes the images to GroundingDinoImageProcessor, which itself allows a "do_resize" and "size" argument, it would be a simple change to allow custom resizing.
Motivation
This feature is interesting to have when performing inferences on images that are small, since the current default resizing is putting the shortest edge at 800px or the longest edge at 1333px, and inference speed depends strongly on image size. It also helps with GPU memory usage.
I saw a 60% speed difference for inference on one image when manually forcing sizes around 400px. I also went from GPU OOM errors when using a batch of 2 ~400px pictures to no issues for batches of > 20 pictures.
Your contribution
I'm willing to do the PR if the maintainers think this is a good change!
The text was updated successfully, but these errors were encountered: