Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear does not support TP comm overlap for Column Parallel mode #1312

Open
parthmannan opened this issue Nov 5, 2024 · 0 comments · May be fixed by #1343
Open

Linear does not support TP comm overlap for Column Parallel mode #1312

parthmannan opened this issue Nov 5, 2024 · 0 comments · May be fixed by #1343

Comments

@parthmannan
Copy link

All output modes in Linear class only support Row parallel outputs (https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/module/linear.py#L358)

The Linear forward checks for ub_overlap_rs and changes the expected dim size to s/tp, h instead of also supporting column parallel mode which would have a different dim size for example, s, 3*h/tp - https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/module/linear.py#L268

Linear is used by TEColumnParallelLinear in Megatron-LM as well and supporting TP Comm overlap is highly important.
The alternative to use LayerNormLinear was considered but it does not allow skipping normalization either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant