NVIDIA / Megatron-LM Public

Notifications You must be signed in to change notification settings
Fork 2.4k
Star 10.9k

Code
Issues 156
Pull requests 152
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: NVIDIA/Megatron-LM

Labels 11 Milestones 0

New pull request New

152 Open 250 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Create python-package.yml

#1332 opened Dec 21, 2024 by invisiblepancake

Loading…

Fix: prevent double accumulation of load balancing loss and z-loss wi…

#1331 opened Dec 20, 2024 by thuwzt

Loading…

Add Mamba TRTLLM support

#1320 opened Dec 12, 2024 by meatybobby

Loading…

update network interface env

#1319 opened Dec 12, 2024 by lizamd

Loading…

fix args.mock_data bug caused by func get_blend_and_blend_per_split

#1306 opened Nov 29, 2024 by 1195343015

Loading…

[Update] Print training log in rank0

#1296 opened Nov 21, 2024 by shijungg

Loading…

support qwen2 hf<->mcore ckpt converter

#1290 opened Nov 19, 2024 by wenyujin333

Loading…

Fix: Resolve multimodal model errors and update README usage instructions

#1286 opened Nov 13, 2024 by singleheart

Loading…

Set torch.multiprocessing start method as 'spawn'

#1285 opened Nov 12, 2024 by hxdtest

Loading…

Fix a bug in optimizer's mix_lr/max_lr when args.override_opt_param_scheduler==True

#1284 opened Nov 12, 2024 by lyuwen

Loading…

Huvu/update t5 attentionmasktype

#1273 opened Nov 4, 2024 by huvunvidia

Loading…

Update t5_model.py

#1271 opened Nov 2, 2024 by huvunvidia

Loading…

Enable huggingface tokenizer

#1268 opened Oct 30, 2024 by msiddaiah

Loading…

fix: remove unnecessary trailing comma in statement

#1265 opened Oct 29, 2024 by singleheart

Loading…

Enabling LR scaling for a specific layer (ex. down-projection...) during pretraining

#1262 opened Oct 28, 2024 by dhia680

Loading…

[ENHANCEMENT] Add support for Apex RMSNorm for use in qk-norm

#1261 opened Oct 28, 2024 by wdevazelhes

Loading…

Add support to process gzip files

#1260 opened Oct 28, 2024 by puneeshkhanna

Loading…

[Wrong spelling] Update training.py

#1229 opened Oct 21, 2024 by zyqhnu

Loading…

Typo fix in readme

#1223 opened Oct 17, 2024 by alexchen4ai

Loading…

support qwen2 and siglip weight conversion script to enable training … stale

No activity in 60 days on issue or PR

#1221 opened Oct 16, 2024 by tao-githup

Loading…

readme spelling correction

#1216 opened Oct 13, 2024 by jonassteinberg1

Loading…

[Functions] Support Packed_seq_params in Megatron-LM stale

No activity in 60 days on issue or PR

#1215 opened Oct 12, 2024 by Baibaifan

Loading…

Embedding stale

No activity in 60 days on issue or PR

#1209 opened Oct 10, 2024 by rachitgarg91

Loading…

Dev/optimizer offloading

#1205 opened Oct 10, 2024 by lostkevin

Loading…

fix bugs for multi_latent_attention stale

No activity in 60 days on issue or PR

#1203 opened Oct 9, 2024 by xqiangx1991

Loading…

Previous 1 2 3 4 5 6 7 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly