-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optim deformable detr #33600
Optim deformable detr #33600
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Just two things before being merge ready:
- We'll need to run the slow tests for these models
- Removing the baseline file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Just some nits and questions :)
Can you please provide a bit more details on what was changed and why that leads to a better performance? As far as I see the general idea of this PR is to avoid .item()
or am I missing anything else?
y_embed = pixel_mask.cumsum(1, dtype=torch.float16) | ||
x_embed = pixel_mask.cumsum(2, dtype=torch.float16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, why should we use float16 instead of float32? shouldn't it be pixel_values.dtype
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops absolutely, thanks for catching that.
@@ -704,7 +708,8 @@ def forward( | |||
|
|||
batch_size, num_queries, _ = hidden_states.shape | |||
batch_size, sequence_length, _ = encoder_hidden_states.shape | |||
if (spatial_shapes[:, 0] * spatial_shapes[:, 1]).sum() != sequence_length: | |||
total_elements = sum(shape[0] * shape[1] for shape in spatial_shapes_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case it holds here :)
total_elements = sum(shape[0] * shape[1] for shape in spatial_shapes_list) | |
total_elements = sum(height * width for height, width in spatial_shapes_list) |
Hey! 🤗 Thanks for your contribution to the Before merging this pull request, slow tests CI should be triggered. To enable this:
(For maintainers) The documentation for slow tests CI on PRs is here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding and iterating!
* optimize deformable detr * fix copies * remove deformable_detr_basline * fix hardcoded float16 and .float() * [run slow] deformable-detr,grounding-dino,mask2former,oneformer,rt-detr * [run slow] deformable_detr,grounding_dino,mask2former,oneformer,rt_detr
Hi, I'm coming to this PR a month late, but I noticed there are some changes here that I think would be great to transfer over to other models that are down-stream of DeformableDetr, specifically Mask2Former. I would love to contribute these changes, but to do this properly, I'd love if I could get more context on a few things that I couldn't discover from this PR, because the Notion doc is not public :)
For context, I'm trying to use the new-ish torch.export() non-strict path to export Mask2Former and I noticed that the changes in this PR include some of the same changes I would need to make to the Mask2Former modeling code. I would love to contribute these changes and have them align with the optimization objective you have in this PR! |
Hi @philkuz! As for the motivation, in general I think ensuring Transformers models are fully optimized for compilation sets a nice standard. There has been more and more issues and feature requests related to torch compilation, so there seems to be growing interest in using it in production. With torch.compile and torch.export gaining traction, especially for performance-critical applications like real-time vision, I can see these optimizations becoming increasingly valuable. |
What does this PR do?
This PR is part of an ongoing effort to optimize the inference speed of Transformers' vision models. (along with #33412 for now).
For more info on the specific issues these optimizations target and how they help improve the inference speed of compiled models, you can check out this Notion page.
The following metrics are for model inference only! Pre/post-processing time are not measured here. Currently, Transformers image processors seem to be a big bottleneck for inference speed, so end-to-end inference will sadly be much slower than this.
A few things to note:
ninja-build
installed for it to work.Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@qubvel @molbap @amyeroberts @NielsRogge