Enable `sequence_parallel` and `tp_comm_overlap` in ESM2 #305

jstjohn · 2024-10-11T23:26:27Z

These changes should reduce memory consumption and should be used whenever tensor_model_parallelism>1.

Unfortunately these changes will impact the output behavior of self.word_embeddings and output sequence, batch, hidden rather than the standard batch, sequence, hidden. We need to handle weight sharing and also downstream steps after that step in forward in that case for ESM2Embedding.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable `sequence_parallel` and `tp_comm_overlap` in ESM2 #305

Enable `sequence_parallel` and `tp_comm_overlap` in ESM2 #305

jstjohn commented Oct 11, 2024

Enable sequence_parallel and tp_comm_overlap in ESM2 #305

Enable sequence_parallel and tp_comm_overlap in ESM2 #305

Comments

jstjohn commented Oct 11, 2024

Enable `sequence_parallel` and `tp_comm_overlap` in ESM2 #305

Enable `sequence_parallel` and `tp_comm_overlap` in ESM2 #305