Fix swapped indices and names in 2D sincos pos_embed #10877
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
CURRENTLY STILL A DRAFT, I'M NOT SURE ALL NAMES ARE FIXED!
In
get_2d_sincos_pos_embed
andget_2d_sincos_pos_embed_from_grid
the conventional embedding scheme originated from the mae models seems incorrectly swapped the order ofh
andw
when callingmeshgrid
, and causingemb_h
to actually encode width information, whileemb_w
to actually encode height information, which, given the nature of permutational equivariance found in Attention mechanism, should not cause any practical differences when training or inferencing, but still confuse people reading the code for a comprehensive understanding of model's internal structure.This commit should fix the confusion and the swapped names in these functions, while keep being compatibles with pre-trained models using the old code.
Also this commit removed unnecessary stacking and reshaping of the grid tensors.
For reference of the behavior of the
meshgrid
function, see also: https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html https://pytorch.org/docs/stable/generated/torch.meshgrid.htmlAlso referenced to the original PR to reimplement sincos pos_embed in Pytorch #10156 written kindly by @hlky
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed.
@yiyixuxu