Linear does not support TP comm overlap for Column Parallel mode #1312

parthmannan · 2024-11-05T07:22:50Z

All output modes in Linear class only support Row parallel outputs (https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/module/linear.py#L358)

The Linear forward checks for ub_overlap_rs and changes the expected dim size to s/tp, h instead of also supporting column parallel mode which would have a different dim size for example, s, 3*h/tp - https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/module/linear.py#L268

Linear is used by TEColumnParallelLinear in Megatron-LM as well and supporting TP Comm overlap is highly important.
The alternative to use LayerNormLinear was considered but it does not allow skipping normalization either.

The text was updated successfully, but these errors were encountered:

denera linked a pull request Nov 20, 2024 that will close this issue

[PyTorch] Adding TP overlap support for te.Linear with parallel_mode="column" #1343

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear does not support TP comm overlap for Column Parallel mode #1312

Linear does not support TP comm overlap for Column Parallel mode #1312

parthmannan commented Nov 5, 2024

Linear does not support TP comm overlap for Column Parallel mode #1312

Linear does not support TP comm overlap for Column Parallel mode #1312

Comments

parthmannan commented Nov 5, 2024