From NVIDIA Megatron-LM for visibility #18

RaymondLi0 · 2023-01-24T20:01:13Z

No description provided.

Fix `post_training/test_get_gpt_modelopt_spec_interface` See merge request ADLR/megatron-lm!3118

Co-authored-by: Shanmugam Ramasamy <[email protected]>

Remove legacy bert tests See merge request ADLR/megatron-lm!3023

Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: Mcore Bot <[email protected]>

Alit/config mamba head See merge request ADLR/megatron-lm!2601

…y for QAT.

Update CODEOWNERS to make modelopt review only for QAT. See merge request ADLR/megatron-lm!3125

Run nemo2 tests instead of nemo1 See merge request ADLR/megatron-lm!3119

…attn for dynamic batching. Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: Vijay Korthikanti <[email protected]> Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]>

Integrating paged attention feature of flash_attn for dynamic batching. See merge request ADLR/megatron-lm!2955

Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Chenhan Yu <[email protected]>

add l2 norm in torch_norm.py for LLAMA-4 support See merge request ADLR/megatron-lm!2960

fix: Improvements to the auto-reminder bot See merge request ADLR/megatron-lm!3126

Fix Gemma TRTLLM export See merge request ADLR/megatron-lm!2475

Co-authored-by: Yuzhong Wang <[email protected]> Co-authored-by: Shunkang <[email protected]>

Fix MLA THD format support See merge request ADLR/megatron-lm!2691

…t load strictness.

Dynamic inference example | Control checkpoint load strictness. See merge request ADLR/megatron-lm!2914

Co-authored-by: jianbinc <[email protected]>

patch for fp8 primary weight custom fsdp support See merge request ADLR/megatron-lm!3057

ci: Track info about MR See merge request ADLR/megatron-lm!3129

ci: Handle nargs See merge request ADLR/megatron-lm!3105

…h --no-optim-load Co-authored-by: jianbinc <[email protected]> Co-authored-by: 胡凯文 <[email protected]>

This reverts commit d87ba91.

ci: Run on multiple clusters See merge request ADLR/megatron-lm!3292

ci: Allow specific TE-ref See merge request ADLR/megatron-lm!3302

ci(fix): Write logs to log_dir See merge request ADLR/megatron-lm!3299

Address dist checkpointing PyT 24.08 failure See merge request ADLR/megatron-lm!3253

ci(hotfix): Downstream pipeline See merge request ADLR/megatron-lm!3307

…nal argparse flag to clear GPU... Co-authored-by: Szymon Migacz <[email protected]>

MR feedback: added units for arguments, optional argparse flag to clear GPU... See merge request ADLR/megatron-lm!3308

…mamba class constructor Co-authored-by: Zhiyu Li <[email protected]>

Allow process group as optional argument for mamba class constructor See merge request ADLR/megatron-lm!2966

Add NVTX ranges to categorize execution See merge request ADLR/megatron-lm!2588

Move fsdp 2 import from _composable to public See merge request ADLR/megatron-lm!3116

…image`

ci: Add nemo-image to `ci-rebuild-mcore-nemo-image` See merge request ADLR/megatron-lm!3321

ci: Re-enable tests that failed on memory See merge request ADLR/megatron-lm!3197

Signed-off-by: oliver könig <[email protected]>

Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]>

Engine updates See merge request ADLR/megatron-lm!3254

Co-authored-by: Mcore Bot <[email protected]>

ci: Onboard mr-slim to h100 See merge request ADLR/megatron-lm!3312

chore: Deprecate T5 tests See merge request ADLR/megatron-lm!3334

RaymondLi0 changed the base branch from multi-query-attention to before-merge June 20, 2023 20:12

RaymondLi0 changed the base branch from before-merge to multi-query-attention June 20, 2023 20:12

ko3n1g and others added 28 commits April 15, 2025 18:18

Merge branch 'fix_mo_spec_test' into 'main'

d46f999

Fix `post_training/test_get_gpt_modelopt_spec_interface` See merge request ADLR/megatron-lm!3118

ADLR/megatron-lm!3023 - Remove legacy bert tests

671f254

Co-authored-by: Shanmugam Ramasamy <[email protected]>

Merge branch 'remove-legacy-bert-test' into 'main'

8579a5d

Remove legacy bert tests See merge request ADLR/megatron-lm!3023

ADLR/megatron-lm!2601 - Alit/config mamba head

f5a57fe

Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: Mcore Bot <[email protected]>

Merge branch 'alit/config_mamba_head' into 'main'

ecf8a10

Alit/config mamba head See merge request ADLR/megatron-lm!2601

ADLR/megatron-lm!3125 - Update CODEOWNERS to make modelopt review onl…

cbbbacb

…y for QAT.

Merge branch 'shanmugamr-main-patch-70610' into 'main'

4597aaa

Update CODEOWNERS to make modelopt review only for QAT. See merge request ADLR/megatron-lm!3125

ADLR/megatron-lm!3119 - Run nemo2 tests instead of nemo1

f26cc41

Merge branch 'chtruong/update-functional-for-nemo2' into 'main'

0f52851

Run nemo2 tests instead of nemo1 See merge request ADLR/megatron-lm!3119

Merge branch 'vijay/unify_static_dynamic' into 'main'

d2e3ffc

Integrating paged attention feature of flash_attn for dynamic batching. See merge request ADLR/megatron-lm!2955

ADLR/megatron-lm!2960 - add l2 norm in torch_norm.py for LLAMA-4 support

d0534e9

Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Chenhan Yu <[email protected]>

Merge branch 'yuya/add_l2_norm' into 'main'

e6bd64c

add l2 norm in torch_norm.py for LLAMA-4 support See merge request ADLR/megatron-lm!2960

ADLR/megatron-lm!3126 - fix: Improvements to the auto-reminder bot

202ad22

Merge branch 'ko3n1g/fix/reminder-bot-final-review-date' into 'main'

8e0215c

fix: Improvements to the auto-reminder bot See merge request ADLR/megatron-lm!3126

ADLR/megatron-lm!2475 - Fix Gemma TRTLLM export

966bb9a

Merge branch 'bobchen/fix_nemo2' into 'main'

9db6e55

Fix Gemma TRTLLM export See merge request ADLR/megatron-lm!2475

ADLR/megatron-lm!2691 - Fix MLA THD format support

c0b5c91

Co-authored-by: Yuzhong Wang <[email protected]> Co-authored-by: Shunkang <[email protected]>

Merge branch 'mla_PackedSeqParams' into 'main'

0bbb642

Fix MLA THD format support See merge request ADLR/megatron-lm!2691

ADLR/megatron-lm!2914 - Dynamic inference example | Control checkpoin…

02c6d64

…t load strictness.

Merge branch 'lmcafee/ifb-broken-example-25.02' into 'main'

370508c

Dynamic inference example | Control checkpoint load strictness. See merge request ADLR/megatron-lm!2914

ADLR/megatron-lm!3057 - patch for fp8 primary weight custom fsdp support

b799e3f

Co-authored-by: jianbinc <[email protected]>

Merge branch 'fp8_patch_for_cfsdp' into 'main'

76d6bcf

patch for fp8 primary weight custom fsdp support See merge request ADLR/megatron-lm!3057

ADLR/megatron-lm!3129 - ci: Track info about MR

35fd148

Merge branch 'ko3n1g/feat/track-info-about-merge-request' into 'main'

7935dcf

ci: Track info about MR See merge request ADLR/megatron-lm!3129

ADLR/megatron-lm!3105 - ci: Handle nargs

04a5957

Merge branch 'ko3n1g/ci/handle-nargs' into 'main'

55c968d

ci: Handle nargs See merge request ADLR/megatron-lm!3105

ADLR/megatron-lm!2871 - Fix optimizer cpu offload load checkpoint wit…

2922bb6

…h --no-optim-load Co-authored-by: jianbinc <[email protected]> Co-authored-by: 胡凯文 <[email protected]>

ko3n1g and others added 30 commits May 13, 2025 12:00

Revert "ADLR/megatron-lm!2711 - Add in-process restart"

e41dde6

This reverts commit d87ba91.

ADLR/megatron-lm!3292 - ci: Run on multiple clusters

f61b17c

Merge branch 'ko3n1g/ci/multi-cluster' into 'main'

c552e21

ci: Run on multiple clusters See merge request ADLR/megatron-lm!3292

ADLR/megatron-lm!3302 - ci: Allow specific TE-ref

55343df

Merge branch 'ko3n1g/ci/te-nightly' into 'main'

d50e830

ci: Allow specific TE-ref See merge request ADLR/megatron-lm!3302

ADLR/megatron-lm!3299 - ci(fix): Write logs to log_dir

8c4875f

Merge branch 'ko3n1g/ci/unit-tests-locally' into 'main'

d6eb60b

ci(fix): Write logs to log_dir See merge request ADLR/megatron-lm!3299

ADLR/megatron-lm!3253 - Address dist checkpointing PyT 24.08 failure

c58e57f

Merge branch 'dist-ckpt-2408' into 'main'

4a114e6

Address dist checkpointing PyT 24.08 failure See merge request ADLR/megatron-lm!3253

ADLR/megatron-lm!3307 - ci(hotfix): Downstream pipeline

d2cbe5a

Merge branch 'ko3n1g/ci/fix-downstream-pipeline' into 'main'

53d55fb

ci(hotfix): Downstream pipeline See merge request ADLR/megatron-lm!3307

ADLR/megatron-lm!3308 - MR feedback: added units for arguments, optio…

9c586bf

…nal argparse flag to clear GPU... Co-authored-by: Szymon Migacz <[email protected]>

Merge branch 'inprocess_mr' into 'main'

8416bff

MR feedback: added units for arguments, optional argparse flag to clear GPU... See merge request ADLR/megatron-lm!3308

ADLR/megatron-lm!2966 - Allow process group as optional argument for …

07b1992

…mamba class constructor Co-authored-by: Zhiyu Li <[email protected]>

Merge branch 'zhiyul/orthotope/ssm' into 'main'

175497e

Allow process group as optional argument for mamba class constructor See merge request ADLR/megatron-lm!2966

ADLR/megatron-lm!2588 - Add NVTX ranges to categorize execution

7f9f2bf

Merge branch 'llama31_automated_breakdown' into 'main'

8a9e864

Add NVTX ranges to categorize execution See merge request ADLR/megatron-lm!2588

ADLR/megatron-lm!3116 - Move fsdp 2 import from _composable to public

1ff5a37

Merge branch 'boxiangw/public_fsdp_import' into 'main'

ed0d528

Move fsdp 2 import from _composable to public See merge request ADLR/megatron-lm!3116

ADLR/megatron-lm!3321 - ci: Add nemo-image to `ci-rebuild-mcore-nemo-…

d70e2e4

…image`

Merge branch 'ko3n1g/ci/fix-rebuild-job' into 'main'

054fad5

ci: Add nemo-image to `ci-rebuild-mcore-nemo-image` See merge request ADLR/megatron-lm!3321

ADLR/megatron-lm!3197 - ci: Re-enable tests that failed on memory

e494219

Merge branch 'ko3n1g/ci/re-enable-broken-tests' into 'main'

bfc751a

ci: Re-enable tests that failed on memory See merge request ADLR/megatron-lm!3197

tests: Disable flaky test

a73b4d2

Signed-off-by: oliver könig <[email protected]>

ADLR/megatron-lm!3254 - Engine updates

407e504

Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]>

Merge branch 'engine_updates' into 'main'

7fe8f69

Engine updates See merge request ADLR/megatron-lm!3254

ADLR/megatron-lm!3312 - ci: Onboard mr-slim to h100

ee1d765

Co-authored-by: Mcore Bot <[email protected]>

Merge branch 'ko3n1g/ci/dev-on-h100' into 'main'

861a8fa

ci: Onboard mr-slim to h100 See merge request ADLR/megatron-lm!3312

ADLR/megatron-lm!3334 - chore: Deprecate T5 tests

cf03fb2

Merge branch 'ko3n1g/chore/remove-t5-from-lts' into 'main'

8e1c3df

chore: Deprecate T5 tests See merge request ADLR/megatron-lm!3334

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

From NVIDIA Megatron-LM for visibility #18

From NVIDIA Megatron-LM for visibility #18

RaymondLi0 commented Jan 24, 2023

From NVIDIA Megatron-LM for visibility #18

Are you sure you want to change the base?

From NVIDIA Megatron-LM for visibility #18

Conversation

RaymondLi0 commented Jan 24, 2023