Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable sequence_parallel and tp_comm_overlap in ESM2 #305

Open
jstjohn opened this issue Oct 11, 2024 · 0 comments
Open

Enable sequence_parallel and tp_comm_overlap in ESM2 #305

jstjohn opened this issue Oct 11, 2024 · 0 comments

Comments

@jstjohn
Copy link
Collaborator

jstjohn commented Oct 11, 2024

These changes should reduce memory consumption and should be used whenever tensor_model_parallelism>1.

Unfortunately these changes will impact the output behavior of self.word_embeddings and output sequence, batch, hidden rather than the standard batch, sequence, hidden. We need to handle weight sharing and also downstream steps after that step in forward in that case for ESM2Embedding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant