Fix incorrect dtype in LayerNormLinear #483

timmoon10 · 2023-10-19T18:04:39Z

We've encountered a runtime error in LayerNormLinear when training LLaMa since RMSNorm is outputting to a buffer with an invalid dtype. In particular, we are not properly handling the case where the RMSNorm output is returned in bf16.

Note that LayerNormMLP handles this correctly:

TransformerEngine/transformer_engine/pytorch/module/layernorm_mlp.py

Line 146 in f456ba1

    
           ln_out_dtype = torch.uint8 if (fp8 and not return_layernorm_output) else inputmat.dtype

Signed-off-by: Tim Moon <[email protected]>

timmoon10 · 2023-10-19T18:04:50Z

/te-ci pytorch

ksivaman

LGTM

ptrendx · 2023-10-19T23:26:17Z

I opened #485 to fix the errors in the fused attention test.

ksivaman · 2023-10-20T05:39:08Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>

Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]> Signed-off-by: Ming Huang <[email protected]>

Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

Fix incorrect dtype in LayerNormLinear

6f69693

Signed-off-by: Tim Moon <[email protected]>

timmoon10 added the bug Something isn't working label Oct 19, 2023

timmoon10 requested review from ptrendx and ksivaman October 19, 2023 18:04

ksivaman approved these changes Oct 19, 2023

View reviewed changes

timmoon10 added a commit to timmoon10/TransformerEngine that referenced this pull request Oct 19, 2023

Apply NVIDIA#483 to v0.12 release

8daf93b

ptrendx approved these changes Oct 19, 2023

View reviewed changes

Merge branch 'main' into debug-llama

8b3d37a

ksivaman merged commit 1afb625 into NVIDIA:main Oct 20, 2023

denera pushed a commit to denera/TransformerEngine that referenced this pull request Oct 23, 2023

Fix incorrect dtype in LayerNormLinear (NVIDIA#483)

f389b93

Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>

ptrendx pushed a commit that referenced this pull request Oct 23, 2023

Fix incorrect dtype in LayerNormLinear (#483)

719f422

Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>

timmoon10 deleted the debug-llama branch November 15, 2023 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect dtype in LayerNormLinear #483

Fix incorrect dtype in LayerNormLinear #483

timmoon10 commented Oct 19, 2023

timmoon10 commented Oct 19, 2023

ksivaman left a comment

ptrendx commented Oct 19, 2023

ksivaman commented Oct 20, 2023

Fix incorrect dtype in LayerNormLinear #483

Fix incorrect dtype in LayerNormLinear #483

Conversation

timmoon10 commented Oct 19, 2023

timmoon10 commented Oct 19, 2023

ksivaman left a comment

Choose a reason for hiding this comment

ptrendx commented Oct 19, 2023

ksivaman commented Oct 20, 2023