You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Linear is used by TEColumnParallelLinear in Megatron-LM as well and supporting TP Comm overlap is highly important.
The alternative to use LayerNormLinear was considered but it does not allow skipping normalization either.
The text was updated successfully, but these errors were encountered:
All output modes in Linear class only support Row parallel outputs (https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/module/linear.py#L358)
The Linear forward checks for
ub_overlap_rs
and changes the expected dim size tos/tp, h
instead of also supporting column parallel mode which would have a different dim size for example,s, 3*h/tp
- https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/module/linear.py#L268Linear is used by TEColumnParallelLinear in Megatron-LM as well and supporting TP Comm overlap is highly important.
The alternative to use LayerNormLinear was considered but it does not allow skipping normalization either.
The text was updated successfully, but these errors were encountered: