Add `intermediate_size` to GPT-NeoX models #1212

dtamayo-nlp · 2024-05-10T08:01:40Z

The current implementation only allows to set intermediate_size for Llama models, but I would like to be capable to change the intermediate_size in GPT-NeoX models.

I have tested this implementation under a quick training, inference and conversion and it seems that it doesn't give any bugs. I hope it helps!

CLAassistant · 2024-05-10T08:01:46Z

All committers have signed the CLA.

Quentin-Anthony · 2024-06-19T20:58:38Z

This looks good. Need to make it consistent with mamba and RWKV. We also need some TODO statements about revisiting this once we add swiglu.

@jahatef is on it

jahatef · 2024-06-19T22:03:56Z

Added support for mamba and RWKV, and added TODOs.

StellaAthena · 2024-06-20T19:51:48Z

With these changes, is there a point to having separate Linear and LLaMA Linear definitions? At a glance it looks like all the differences are configurable, with the only difference being what is assumed as the default behavior if unspecified.

jahatef · 2024-07-25T23:10:48Z

Refactored the activations and MLP layer to get rid of our redundant llama mlp class, and added some activations functions from https://arxiv.org/pdf/2002.05202.

Quentin-Anthony · 2024-08-06T04:20:31Z

configs/1-3B.yml

@@ -3,7 +3,7 @@
   # parallelism settings ( you will want to change these based on your cluster setup, ideally scheduling pipeline stages
   # across the node boundaries )
   "pipe_parallel_size": 1,
-   "model_parallel_size": 1,
+   "model_parallel_size": 2,


Let's not change the existing configs unless necessary

The llama config changes are fine, to be clear

Quentin-Anthony · 2024-08-06T04:44:08Z

megatron/model/mamba/mamba.py

        # set variables, mostly following mamba defaults
        self.d_model = neox_args.hidden_size
        self.d_state = 16  # state dimensions per channel
        self.d_conv = 4  # convolution width
        self.expand = 2  # linear projection expansion factors
-        self.d_inner = int(self.expand * self.d_model)


Removing this will cause a failure at https://github.com/EleutherAI/gpt-neox/pull/1212/files#diff-cf396efcb6001846b18513bfb48e1e2681f3d240e589bfef9ea17cbdfe6b1218R80

Quentin-Anthony · 2024-08-06T04:44:33Z

megatron/model/mamba/mamba.py

+            neox_args.d_inner = neox_args.intermediate_size
+        if neox_args.expansion_factor:
+            self.expand = neox_args.expansion_factor
+        neox_args.d_inner = neox_args.intermediate_size


neox_args.d_inner is set both in the if-statement above, and also here?

Quentin-Anthony · 2024-08-06T04:45:19Z

megatron/model/rwkv/v6/rwkv.py

@@ -247,11 +247,11 @@ def __init__(self, neox_args, layer_number):
            self.time_maa_k = nn.Parameter(1.0 - torch.pow(ddd, ratio_1_to_almost0))
            self.time_maa_r = nn.Parameter(1.0 - torch.pow(ddd, ratio_1_to_almost0))

-        self.key = nn.Linear(neox_args.hidden_size, neox_args.dim_ffn, bias=False)


I prefer ffn to ff. Also, why are we swapping the ordering of dim and ffn in the first place?

Quentin-Anthony · 2024-08-06T04:54:17Z

megatron/model/transformer.py

-            ff_dim = int(2 * neox_args.hidden_size * 4 / 3)
-            ff_dim = self.multiple_of * ((ff_dim + multiple_of - 1) // multiple_of)
-
-        self.w1 = mpu.ColumnParallelLinear(


need to test that tensor parallelism and MoE are maintained

Quentin-Anthony · 2024-08-06T04:55:04Z

megatron/model/transformer.py

-                    or self.num_experts > 1
-                    and self.moe_type == "deepspeed"
-                ):
+                if self.num_experts > 1 and self.moe_type == "deepspeed":


shouldn't we still be testing if this mlp is a swiglu?

Update transformer.py -> Add intermediate_size

6c6a46b

dtamayo-nlp requested a review from Quentin-Anthony as a code owner May 10, 2024 08:01

Quentin-Anthony assigned jahatef Jun 19, 2024

Quentin-Anthony mentioned this pull request Jun 19, 2024

How to set the ffn hidden size parameter in gpt neox #1230

Open

add support for rwkv and mamba and add todos about swiglu

a80e302

refactor activations and mlps

22149f0

jahatef added 3 commits July 25, 2024 23:14

change llama config to swiglu

148a86f

fixes gelu fusion

63e8677

pre-commit run

ae544cf

Quentin-Anthony reviewed Aug 6, 2024

View reviewed changes

add assert message to mamba linear

1170078

Quentin-Anthony reviewed Aug 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `intermediate_size` to GPT-NeoX models #1212

Add `intermediate_size` to GPT-NeoX models #1212

dtamayo-nlp commented May 10, 2024 •

edited

Loading

CLAassistant commented May 10, 2024 •

edited

Loading

Quentin-Anthony commented Jun 19, 2024

jahatef commented Jun 19, 2024

StellaAthena commented Jun 20, 2024

jahatef commented Jul 25, 2024

Quentin-Anthony Aug 6, 2024 •

edited

Loading

Quentin-Anthony Aug 6, 2024

Quentin-Anthony Aug 6, 2024

Quentin-Anthony Aug 6, 2024

Quentin-Anthony Aug 6, 2024

Quentin-Anthony Aug 6, 2024

Quentin-Anthony Aug 6, 2024

Add intermediate_size to GPT-NeoX models #1212

Are you sure you want to change the base?

Add intermediate_size to GPT-NeoX models #1212

Conversation

dtamayo-nlp commented May 10, 2024 • edited Loading

CLAassistant commented May 10, 2024 • edited Loading

Quentin-Anthony commented Jun 19, 2024

jahatef commented Jun 19, 2024

StellaAthena commented Jun 20, 2024

jahatef commented Jul 25, 2024

Quentin-Anthony Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

Quentin-Anthony Aug 6, 2024

Choose a reason for hiding this comment

Quentin-Anthony Aug 6, 2024

Choose a reason for hiding this comment

Quentin-Anthony Aug 6, 2024

Choose a reason for hiding this comment

Quentin-Anthony Aug 6, 2024

Choose a reason for hiding this comment

Quentin-Anthony Aug 6, 2024

Choose a reason for hiding this comment

Quentin-Anthony Aug 6, 2024

Choose a reason for hiding this comment

Add `intermediate_size` to GPT-NeoX models #1212

Add `intermediate_size` to GPT-NeoX models #1212

dtamayo-nlp commented May 10, 2024 •

edited

Loading

CLAassistant commented May 10, 2024 •

edited

Loading

Quentin-Anthony Aug 6, 2024 •

edited

Loading