Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From NVIDIA Megatron-LM for visibility #18

Open
wants to merge 3,190 commits into
base: multi-query-attention
Choose a base branch
from

Conversation

RaymondLi0
Copy link
Collaborator

No description provided.

@RaymondLi0 RaymondLi0 changed the base branch from multi-query-attention to before-merge June 20, 2023 20:12
@RaymondLi0 RaymondLi0 changed the base branch from before-merge to multi-query-attention June 20, 2023 20:12
ko3n1g and others added 28 commits August 30, 2024 11:07
tests: Disable broken nightly

See merge request ADLR/megatron-lm!2011
ci: Improve alerting message

See merge request ADLR/megatron-lm!2012
Updating T5's sharded_state_dict to use parent's method

See merge request ADLR/megatron-lm!1991
Add option to skip segment detokenization

See merge request ADLR/megatron-lm!1976
Integrate lr scheduler into megatron.core

See merge request ADLR/megatron-lm!1385
chore: Add golden values for convergence tests

See merge request ADLR/megatron-lm!2014
tests: Disable test_capacity_padding_forward_backward

See merge request ADLR/megatron-lm!2016
ci: Better image caching

See merge request ADLR/megatron-lm!2018
ci: Create CI branches

See merge request ADLR/megatron-lm!2019
ci: H100 for non MR

See merge request ADLR/megatron-lm!2020
tests: Stop convergence training

See merge request ADLR/megatron-lm!2022
ci: CI on CI-branches only on schedule

See merge request ADLR/megatron-lm!2023
ci: Clean nodes

See merge request ADLR/megatron-lm!2024
ci: Nicer formatting of notifier

See merge request ADLR/megatron-lm!2025
xxuwenc and others added 30 commits September 26, 2024 07:20
Resolve release test failure caused by GroupedMLP distributed checkpointing

See merge request ADLR/megatron-lm!2155
tests: Set better name for Wandb logging

See merge request ADLR/megatron-lm!2156
Co-authored-by: Xin Yao <[email protected]>
Co-authored-by: Deepak Narayanan <[email protected]>
Remove pkg_resources package

See merge request ADLR/megatron-lm!1950
ci: Onboard CW

See merge request ADLR/megatron-lm!2142
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]>
Small changes to export

See merge request ADLR/megatron-lm!2158
Fix rope backward compatibility

See merge request ADLR/megatron-lm!2152
[Bug fix] Don't trace graphs during inference

See merge request ADLR/megatron-lm!2140
…r_engine, distributed_checkpoint)

Co-authored-by: Huy Vu2 <[email protected]>
Adding more MR tests for T5 (e.g., transformer_engine, distributed_checkpoint)

See merge request ADLR/megatron-lm!2109
ci: Download artifacts

See merge request ADLR/megatron-lm!2164
ci: Bump version

See merge request ADLR/megatron-lm!2165
Add the interface to set TP communication bootstrap backend

See merge request ADLR/megatron-lm!2153
Add support for SigLIP vision encoder to multimodal mcore

See merge request ADLR/megatron-lm!2095
adding cu_seqlens_padded support in MCore

See merge request ADLR/megatron-lm!2175
Fixing attention mask dimenions to support TE versions > 1.9

See merge request ADLR/megatron-lm!2181
rotary_scaling fix for llama3.1 and 3.2

See merge request ADLR/megatron-lm!2180
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.