Skip to content

Replace VideoMAE sinusoid encoding helper with PyTorch implementation#46924

Draft
praful-srinivasan-027 wants to merge 1 commit into
huggingface:mainfrom
praful-srinivasan-027:refactor/videomae-sinusoid-torch
Draft

Replace VideoMAE sinusoid encoding helper with PyTorch implementation#46924
praful-srinivasan-027 wants to merge 1 commit into
huggingface:mainfrom
praful-srinivasan-027:refactor/videomae-sinusoid-torch

Conversation

@praful-srinivasan-027

Copy link
Copy Markdown
Contributor

What does this PR do?

This PR replaces the NumPy implementation of get_sinusoid_encoding_table in modeling_videomae.py with an equivalent vectorized PyTorch implementation.

As part of this refactor, fixed sinusoidal positional embeddings are registered as non-persistent buffers and reinitialized in _init_weights, making the model compatible with meta device initialization while preserving the existing sinusoidal encoding behavior.

This also resolves the existing TODO to implement the helper using PyTorch.

Current status: This PR is opened as a draft while investigating a model-parallel regression introduced by the buffer registration changes. The meta initialization tests now pass, but test_model_parallelism still requires investigation.

@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: videomae

@praful-srinivasan-027

Copy link
Copy Markdown
Contributor Author

Hi @yonigozlan! I opened this as a draft because I'd appreciate some guidance.

While replacing the NumPy implementation of get_sinusoid_encoding_table with a PyTorch version, I updated the fixed positional embeddings to be registered as non-persistent buffers and reinitialized them in _init_weights so that the model passes the meta initialization tests.

However, this introduces a failure in test_model_parallelism due to a device mismatch between sequence_output and the positional embeddings after model sharding.

Before I continue, I wanted to check whether this is the expected approach for making these fixed positional embeddings compatible with meta initialization. If so, is there a recommended pattern for handling the model-parallel case as well? If not, I'd appreciate any guidance on the preferred approach.

Thanks!

@github-actions

Copy link
Copy Markdown
Contributor

CI Dashboard: View test results in Grafana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant