[BUG or ENHANCEMENT] Update qk_layernorm. #210

ftgreat · 2024-09-10T04:10:13Z

With current qk_layernorm implement training did not converge.
One shared qk_layernorm acts on every head, however qk_layernorm should affect all heads.
So just enlarge the shape of qk_layernorm weights, training converges as expected.

List some models using qk_layernorm for references:

Signed-off-by: ldwang <[email protected]>

heavyrain-lzy · 2024-09-10T08:32:17Z

megatron/megatron/core/transformer/attention.py

@@ -360,7 +360,7 @@ def __init__(
        if submodules.q_layernorm is not None:
            self.q_layernorm = build_module(
                submodules.q_layernorm,
-                hidden_size=self.hidden_size_per_attention_head,
+                hidden_size=self.query_projection_size,


Thanks for your contribution. According to my understanding, the difference between the origin code and your code is as follows:

origin code: normalize the every single head individually

new code: normalize all the heads together

According to the input_layernormhttps://github.com/NVIDIA/Megatron-LM/blob/bbecd0812ffc9a90eef472fda91c60eead3f417f/megatron/core/transformer/transformer_layer.py#L111, we should normalize all the heads together. Maybe you can add an issue in Megatron-LM to recheck it.
I think you are right.
LGTM.

Update qk_layernorm.

34c1951

Signed-off-by: ldwang <[email protected]>

heavyrain-lzy reviewed Sep 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG or ENHANCEMENT] Update qk_layernorm. #210

[BUG or ENHANCEMENT] Update qk_layernorm. #210

ftgreat commented Sep 10, 2024 •

edited

Loading

heavyrain-lzy Sep 10, 2024 •

edited

Loading

[BUG or ENHANCEMENT] Update qk_layernorm. #210

Are you sure you want to change the base?

[BUG or ENHANCEMENT] Update qk_layernorm. #210

Conversation

ftgreat commented Sep 10, 2024 • edited Loading

heavyrain-lzy Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

ftgreat commented Sep 10, 2024 •

edited

Loading

heavyrain-lzy Sep 10, 2024 •

edited

Loading