-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Pull requests: NVIDIA/Megatron-LM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix: prevent double accumulation of load balancing loss and z-loss wi…
#1331
opened Dec 20, 2024 by
thuwzt
Loading…
fix args.mock_data bug caused by func get_blend_and_blend_per_split
#1306
opened Nov 29, 2024 by
1195343015
Loading…
Fix: Resolve multimodal model errors and update README usage instructions
#1286
opened Nov 13, 2024 by
singleheart
Loading…
Fix a bug in optimizer's mix_lr/max_lr when args.override_opt_param_scheduler==True
#1284
opened Nov 12, 2024 by
lyuwen
Loading…
fix: remove unnecessary trailing comma in statement
#1265
opened Oct 29, 2024 by
singleheart
Loading…
Enabling LR scaling for a specific layer (ex. down-projection...) during pretraining
#1262
opened Oct 28, 2024 by
dhia680
Loading…
[ENHANCEMENT] Add support for Apex RMSNorm for use in qk-norm
#1261
opened Oct 28, 2024 by
wdevazelhes
Loading…
support qwen2 and siglip weight conversion script to enable training …
stale
No activity in 60 days on issue or PR
#1221
opened Oct 16, 2024 by
tao-githup
Loading…
[Functions] Support Packed_seq_params in Megatron-LM
stale
No activity in 60 days on issue or PR
#1215
opened Oct 12, 2024 by
Baibaifan
Loading…
Embedding
stale
No activity in 60 days on issue or PR
#1209
opened Oct 10, 2024 by
rachitgarg91
Loading…
fix bugs for multi_latent_attention
stale
No activity in 60 days on issue or PR
#1203
opened Oct 9, 2024 by
xqiangx1991
Loading…
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.