NVIDIA/Megatron-LM

#3711

· yaox12 opened

on Mar 5, 2026

[QUESTION] GPT-OSS example configs: should --window-size be 127,0 to match sliding_window=128?

community-request

question

#3690

· returnL opened

on Mar 4, 2026

Global auxiliary loss gradient incorrectly scaled due to averaging over global token count

bug

community-request

module: moe

#3672

· zyeric opened

on Mar 3, 2026

[QUESTION]NVFP4 Post SFT Training & Model Accuracy

community-request

question

#3671

· deepak-vij opened

on Mar 3, 2026

Gradient synchronization incorrect when <code>--overlap-grad-reduce</code> and <code>--num-distributed-optimizer-instances > 1</code> due to autograd hook stream affinity

bug

community-request

#3670

· zyeric opened

on Mar 3, 2026

Enable chunked MLP during training

community-request

enhancement

#3644

· pengdurice opened

on Feb 28, 2026

Using MTP + packing + full recompute causing exception in save_for_backward

bug

community-request

#3643

· arvyanh opened

on Feb 28, 2026

[QUESTION] How to enable fp8 dispatch while training MOE models?

community-request

module: moe

#3578

· new-TonyWang opened

on Feb 25, 2026

Establish review process for training code

enhancement

Task

#3572

· maanug-nv opened

on Feb 24, 2026

API Support Tracker

Initiative

#3571

· maanug-nv opened

on Feb 24, 2026

Move argument validation into dataclass post-init

enhancement

Task

#3568

· maanug-nv opened

on Feb 24, 2026

Migrate Pretraining ConfigContainer from Megatron Bridge

enhancement

Task

#3557

· maanug-nv opened

on Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the <code>nv-grouped-gemm</code> dependency

[QUESTION] GPT-OSS example configs: should --window-size be 127,0 to match sliding_window=128?

Global auxiliary loss gradient incorrectly scaled due to averaging over global token count

[QUESTION]NVFP4 Post SFT Training & Model Accuracy

Gradient synchronization incorrect when <code>--overlap-grad-reduce</code> and <code>--num-distributed-optimizer-instances > 1</code> due to autograd hook stream affinity

Enable chunked MLP during training

Using MTP + packing + full recompute causing exception in save_for_backward

[QUESTION] How to enable fp8 dispatch while training MOE models?

Establish review process for training code

API Support Tracker

Move argument validation into dataclass post-init

Migrate Pretraining ConfigContainer from Megatron Bridge

Issues

Search results