You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These changes should reduce memory consumption and should be used whenever tensor_model_parallelism>1.
Unfortunately these changes will impact the output behavior of self.word_embeddings and output sequence, batch, hidden rather than the standard batch, sequence, hidden. We need to handle weight sharing and also downstream steps after that step in forward in that case for ESM2Embedding.
The text was updated successfully, but these errors were encountered:
These changes should reduce memory consumption and should be used whenever
tensor_model_parallelism>1
.Unfortunately these changes will impact the output behavior of
self.word_embeddings
and outputsequence, batch, hidden
rather than the standardbatch, sequence, hidden
. We need to handle weight sharing and also downstream steps after that step in forward in that case forESM2Embedding
.The text was updated successfully, but these errors were encountered: