Updated Megatron version #85

jlamypoirier · 2023-12-21T00:50:24Z

No description provided.

Sliding window attention See merge request ADLR/megatron-lm!1025

Refactor DistributedOptimizer for MoE model support See merge request ADLR/megatron-lm!986

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Signed-off-by: jiemingz <[email protected]>

Signed-off-by: Selvaraj Anandaraj <[email protected]>

add is_first_microbatch for TE See merge request ADLR/megatron-lm!1033

Need a switch at NeMo level to enable Atomic GEMM See merge request ADLR/megatron-lm!1017

Add distributed checkpoint support to non-TE based models See merge request ADLR/megatron-lm!1005

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Support for activation offloading to CPU in M-LM See merge request ADLR/megatron-lm!1016

add rope and swiglu fusion See merge request ADLR/megatron-lm!946

Fixed atomic gemm defaults/fixed the offloading check See merge request ADLR/megatron-lm!1125

Mcore CLIP ViT model See merge request ADLR/megatron-lm!1127

Bugfix: Make sure data_end_index is padded when creating new buckets See merge request ADLR/megatron-lm!1140

Unify resume and correctness functional tests See merge request ADLR/megatron-lm!1070

Mcore mock multimodal dataset See merge request ADLR/megatron-lm!1147

…ommunication Compute norm once per batch (instead of once per microbatch) and once per bucket (instead of once per param)

Fix NaN checking in grads: should be performed before data-parallel all-reduce See merge request ADLR/megatron-lm!989

…ry ffn_hidden_size

Move to Draco OCI See merge request ADLR/megatron-lm!1137

Print number of transformer and embedding parameters separately See merge request ADLR/megatron-lm!1159

Mcore LLaVA model See merge request ADLR/megatron-lm!1151

[OMNIML-614] AMMO ptq + TensorRT-LLM export examples for megatron-lm See merge request ADLR/megatron-lm!1013

Make throughput and memory footprint formulae compatible with arbitrary ffn_hidden_size See merge request ADLR/megatron-lm!1169

Experimental Yaml configs See merge request ADLR/megatron-lm!1134

mikolajblaz and others added 30 commits January 15, 2024 16:42

Merge branch 'main' into mblaz/dist-ckpt-layernorms

44a8f18

Include module parameters in default sharded_state_dict

c7d0fb1

Integrate one-logger api for E2E app metrics tracking

7bcb2e1

Set --enable-onelogger action to 'store_true'

97d9a50

Merge branch 'sliding_window_attention/akoumparouli' into 'main'

6e7ded3

Sliding window attention See merge request ADLR/megatron-lm!1025

Refactor DistributedOptimizer for MoE model support

46ca3db

Merge branch 'distopt_with_moe' into 'main'

d657a3e

Refactor DistributedOptimizer for MoE model support See merge request ADLR/megatron-lm!986

Run black on megatron/optimizer

6083743

Remove hardcoded data cache path

17545b3

Change --enable-onelogger to --enable-one-logger for consistent naming

6c0e7a9

Add ImportError catch for one_logger

bf9c0a1

Add message on how to install one_logger

85c4034

Better code formatting

54de98d

Fixed merge conflicts

909bda3

Signed-off-by: Selvaraj Anandaraj <[email protected]>

add is_first_microbatch for TE

3c44fb9

Signed-off-by: jiemingz <[email protected]>

add arg name

27879a7

Signed-off-by: jiemingz <[email protected]>

add docstring and move set_is_first_microbatch

7dc2ee8

Signed-off-by: jiemingz <[email protected]>

Fixed formatting

3e19c76

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'jiemingz/is_first_microbatch' into 'main'

bed60a8

add is_first_microbatch for TE See merge request ADLR/megatron-lm!1033

fix a bug in branch and format

cf1a1c6

Merge branch 'main' into fuse_rope_swiglu_main

036605d

fix tests

568da5a

Merge branch megatron-lm:main into atomic_gemm_switch

140642c

enable swiglu and rope fusion by default and disable them in tests

de9428a

Merge branch 'atomic_gemm_switch' into 'main'

599f558

Need a switch at NeMo level to enable Atomic GEMM See merge request ADLR/megatron-lm!1017

Merge branch 'mblaz/dist-ckpt-layernorms' into 'main'

ca8a00a

Add distributed checkpoint support to non-TE based models See merge request ADLR/megatron-lm!1005

Docstring removed for context config

79269fa

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Decoupled cpu offloading and SplitAlongDim imports

4b05862

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'cpu_offload' into 'main'

a5165ac

Support for activation offloading to CPU in M-LM See merge request ADLR/megatron-lm!1016

Merge branch 'fuse_rope_swiglu_main' into 'main'

640af6b

add rope and swiglu fusion See merge request ADLR/megatron-lm!946

jaredcasper and others added 30 commits February 21, 2024 21:00

Merge branch 'config_default' into 'main'

528d7cf

Fixed atomic gemm defaults/fixed the offloading check See merge request ADLR/megatron-lm!1125

Make sure data_end_index is padded when creating new buckets

a67ffda

Mcore CLIP ViT model

5afa5da

Merge branch 'trintamaki/clip-vit-model' into 'main'

6d14c7e

Mcore CLIP ViT model See merge request ADLR/megatron-lm!1127

Merge branch 'dist_optimizer_bugfix' into 'main'

ad53b1e

Bugfix: Make sure data_end_index is padded when creating new buckets See merge request ADLR/megatron-lm!1140

Print number of transformer and embedding parameters separately

9530e19

Unify resume and correctness functional tests

5f1f813

Merge branch 'mblaz/unify-resume-and-correctness-func-tests' into 'main'

70e469d

Unify resume and correctness functional tests See merge request ADLR/megatron-lm!1070

Mcore mock multimodal dataset

1fcdc95

Merge branch 'trintamaki/dummy-multimodal-dataset' into 'main'

1dada7e

Mcore mock multimodal dataset See merge request ADLR/megatron-lm!1147

Fix NaN checking in grads: should be performed before data-parallel c…

d668077

…ommunication Compute norm once per batch (instead of once per microbatch) and once per bucket (instead of once per param)

Merge branch 'check_nan_in_grad' into 'main'

53a350e

Fix NaN checking in grads: should be performed before data-parallel all-reduce See merge request ADLR/megatron-lm!989

Make throughput and memory footprint formulae compatible with arbitra…

9677b3b

…ry ffn_hidden_size

Move to Draco OCI

3dafc0e

Merge branch 'maanug/jet-oci' into 'main'

17c487a

Move to Draco OCI See merge request ADLR/megatron-lm!1137

Merge branch 'theoretical_memory_fix' into 'main'

3b0fcd1

Print number of transformer and embedding parameters separately See merge request ADLR/megatron-lm!1159

Mcore LLaVA model

7bc3c74

Merge branch 'trintamaki/llava-model-mr' into 'main'

d1acce3

Mcore LLaVA model See merge request ADLR/megatron-lm!1151

[OMNIML-614] AMMO ptq + TensorRT-LLM export examples for megatron-lm

80e180d

Merge branch 'chenhany/ammo_ptq_example' into 'main'

36e9b6b

[OMNIML-614] AMMO ptq + TensorRT-LLM export examples for megatron-lm See merge request ADLR/megatron-lm!1013

Merge branch 'variable_ffn_size' into 'main'

0c1e53d

Make throughput and memory footprint formulae compatible with arbitrary ffn_hidden_size See merge request ADLR/megatron-lm!1169

Experimental Yaml configs

47cb630

Merge branch 'yaml' into 'main'

8957468

Experimental Yaml configs See merge request ADLR/megatron-lm!1134

MOE support

63d9d3e

stuff

40a134a

Merge branch 'main' into compare_tensors_updated

1a96a99

Support megatron core models

fdd668c

Fix arg

4238a80

fixes

fe38434

fix

3c6652e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated Megatron version #85

Updated Megatron version #85

jlamypoirier commented Dec 21, 2023

Updated Megatron version #85

Are you sure you want to change the base?

Updated Megatron version #85

Conversation

jlamypoirier commented Dec 21, 2023