feat: Support qwen3-next, mcore path #1530

ahmadki · 2025-11-17T18:57:43Z

What does this PR do ?

Add qwen3-next support, recipes and tests to mcore path.

Issues

Usage

SFT:

uv run --extra mcore --extra vllm --extra automodel python examples/run_sft.py \
    --config examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml \

GRPO:

uv run --extra mcore --extra vllm python examples/run_grpo_math.py \
    --config examples/configs/recipes/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.yaml

Before your PR is "Ready for review"

Pre checks:

[x ] Make sure you read and followed Contributor guidelines
[x ] Did you write any new necessary tests?
[x ] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Added training recipes for Qwen3-Next-80B model with GRPO and SFT configurations.
Bug Fixes
- Fixed YAML syntax errors in GRPO configuration files.
Configuration Enhancements
- Updated distributed data parallel settings across training recipes.
- Updated dependency versions including Transformer library and new runtime dependencies.
Tests
- Added automated test suites for Qwen3-Next-80B model training pipelines.

Signed-off-by: Yu Yao <[email protected]>

…1341) Signed-off-by: Guyue Huang <[email protected]> Signed-off-by: Guyue Huang <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…#1334) Signed-off-by: Terry Kong <[email protected]>

Signed-off-by: Terry Kong <[email protected]> fix more stuff Signed-off-by: Terry Kong <[email protected]> fix Signed-off-by: Terry Kong <[email protected]>

Signed-off-by: Ahmad Kiswani <[email protected]>

fixed dpo conflict ? Signed-off-by: Ahmad Kiswani <[email protected]>

Signed-off-by: Ahmad Kiswani <[email protected]>

github-actions · 2025-11-17T18:58:34Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: ae5d7ac (PR #1530 from ahmadki/qwen3-next)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

coderabbitai · 2025-11-17T19:11:33Z

📝 Walkthrough

Walkthrough

This PR updates NeMo-RL to support Qwen3-Next-80B model integration with Megatron-LM, adding new GRPO and SFT test recipes and configurations, refactoring policy configuration type signatures, updating submodule pointers and dependencies, and removing some validation logic from MegatronPolicyWorker.

Changes

Cohort / File(s)	Summary
Version control configuration `.gitignore`, `.gitmodules`	Added .env and .envrc to gitignore; updated Megatron-LM submodule URL and branch from terrykong/Megatron-LM (yuya/nemo-rl-use-dev) to ahmadki/Megatron-LM (ahmadki/rl-qwen3-from-nemo-rl-submodule)
Submodule pointers `3rdparty/Automodel-workspace/Automodel`, `3rdparty/Megatron-Bridge-workspace/Megatron-Bridge`, `3rdparty/Megatron-LM-workspace/Megatron-LM`	Updated submodule commit references to latest versions
Dependency management `pyproject.toml`, `3rdparty/Megatron-Bridge-workspace/setup.py`	Pinned transformers to >=4.57.1; relaxed mlflow constraint; updated megatron-core to <0.17.0; added rwkv-fla>=0.7.202508221413; adjusted ruff linting configuration
GRPO and general policy configs `examples/configs/grpo_math_1B.yaml`, `examples/configs/sft.yaml`, `examples/configs/vlm_grpo_3B.yaml`, `examples/configs/vlm_grpo_3B_megatron.yaml`	Added average_in_collective flag; removed KL divergence-related loss configuration; changed defer_fp32_logits to null; updated metric names from accuracy to reward
GRPO recipe configs `examples/configs/recipes/llm/grpo-deepscaler-1.5b-16K.yaml`, `examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml`, `examples/configs/recipes/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.yaml`	Fixed YAML list syntax; added comprehensive Qwen3-Next-80B configuration with tensor/pipeline/expert parallelism settings, generation, logging, and cluster topology
SFT recipe config `examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml`	Added new supervised fine-tuning configuration for Qwen3-Next-80B with Megatron parallelism, batch sizing, and multi-node deployment settings
Policy type definitions `nemo_rl/models/policy/__init__.py`	Refactored TypedDict signatures: changed enabled fields from Literal[True] to bool; added average_in_collective to MegatronDDPConfig; converted many fields to NotRequired; removed Disabled config variants; added converter_type to MegatronConfig; adjusted type constraints on optimizer/scheduler fields
Megatron policy worker `nemo_rl/models/policy/megatron_policy_worker.py`	Removed assertion coupling logprob_chunk_size to defer_fp32_logits; removed per-token loss forcing and MoE sanity check; sourced average_in_collective from config; added grad_norm to training metrics; removed public method check_tensor_parallel_attributes
Policy utilities `nemo_rl/models/policy/utils.py`	Relaxed runtime assertion in sliding_window_overwrite; commented out precondition check for hf_config.sliding_window
Test scripts and configuration `tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh`, `tests/test_suites/llm/sft-qwen3-next-80ba3b-8n8g-megatron.sh`, `tests/test_suites/nightly.txt`	Added two new test shell scripts for GRPO and SFT Qwen3-Next experiments with logging, metrics validation, and TensorBoard conversion; registered tests in nightly suite

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Runner
    participant Experiment as Training Experiment
    participant Logging as Log Handler
    participant Metrics as Metrics Validator
    
    Test->>Test: Initialize config (NUM_NODES, MAX_STEPS)
    Test->>Test: Check if max steps reached
    alt Max steps not reached
        Test->>Experiment: Run training (uv run)
        Experiment-->>Experiment: Training loop
        Experiment-->>Logging: Write tensorboard logs
        Test->>Logging: Convert TB logs to JSON
        Test->>Metrics: Parse JSON metrics
        Metrics->>Metrics: Validate conditions<br/>(loss, reward/val_loss, timing)
        alt All metrics pass
            Metrics-->>Test: Success
        else Metrics fail
            Metrics-->>Test: Failure
        end
    else Max steps reached
        Test->>Test: Exit early
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

Type signature refactoring (nemo_rl/models/policy/__init__.py): Extensive changes to TypedDict definitions (removal of Disabled variants, NotRequired adjustments, type narrowing/widening) require careful validation of backward compatibility and config parsing logic
MegatronPolicyWorker logic changes (nemo_rl/models/policy/megatron_policy_worker.py): Removal of assertions (defer_fp32_logits coupling, per-token loss forcing) and public method deletion need verification that downstream code handles these changes; grad_norm metric collection logic should be reviewed for correctness
Submodule pointer updates: Verify that the new Megatron-LM and Megatron-Bridge commits are stable and compatible with the updated configurations
Dependency constraint changes (pyproject.toml): Transformer pinning to 4.57.1 and mlflow relaxation should be tested against existing recipes to ensure no regressions

Possibly related PRs

PR #1389: Modifies average_in_collective field in MegatronDDPConfig and changes MegatronPolicyWorker method signatures, directly overlapping with type and implementation changes in this PR
PR #1398: Updates Megatron integration code and config typing in the same policy modules, affecting how Megatron configurations are parsed and applied
PR #1334: Modifies nemo_rl/models/policy/init.py TypedDict definitions and examples/configs/sft.yaml recipes with overlapping configuration field changes

Suggested labels

CI:L1, Run CICD

Suggested reviewers

terrykong
guyueh1
parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Results For Major Changes	⚠️ Warning	PR claims tests were run locally but provides no test results, metrics, or evidence. Review comments document unresolved blocker issues including invalid submodule commit hashes, runtime errors, and syntax errors, indicating code was not actually tested.	Provide actual test execution logs with final metrics for GRPO and SFT recipes. Resolve all review comment blockers: fix invalid Megatron-Bridge commit, fix KeyError/IndexError in megatron_policy_worker.py, fix test script syntax error, and document baseline metrics for new test thresholds.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the main feature being added: support for qwen3-next model with mcore (Megatron-core) path integration, which aligns with the PR's primary objective of adding qwen3-next support and recipes.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ahmadki/qwen3-next

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining why this PR is needed, why this solution was chosen, and what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pyproject.toml (1)

304-310: Move per-file-ignores back to [tool.ruff.lint.per-file-ignores] to align with current ruff configuration standards.

The recommended placement for per-file-ignores in pyproject.toml is under [tool.ruff.lint.per-file-ignores], but the code uses the top-level [tool.ruff.per-file-ignores] placement, which appears in older examples. While ruff still recognizes both formats, using the nested structure maintains consistency with current ruff documentation and best practices.

🧹 Nitpick comments (4)

nemo_rl/models/policy/__init__.py (3)

15-15: Minor: consider unifying union style

Mixing Union[...] (for max_grad_norm) with | unions elsewhere is legal but a bit inconsistent. You may want to standardize on one style for readability, preferably | now that you’re on Python 3.10+.

33-39: SequencePackingConfig now requires algorithm

Making algorithm: str required means every config that turns sequence packing on must supply an algorithm name. That’s fine, but it does tighten the config surface; please double‑check all existing sequence_packing uses (e.g., seqpack SFT/GRPO recipes) specify algorithm, or consider NotRequired[str] if you still want a default.

71-79: Scheduler typings align with milestones pattern but could be clearer

lr_decay_iters: int in MegatronSchedulerConfig and milestones: NotRequired[list[int]] in SinglePytorchSchedulerConfig match the “top‑level milestones alongside scheduler config” pattern used in GRPO/DPO/distillation configs.

Allowing PolicyConfig.scheduler: list[SinglePytorchSchedulerConfig] | SchedulerMilestones captures both the standard list form and the milestones‑only dict form, but the union is a bit opaque for readers and tools.

If this proves confusing, a follow‑up refactor to introduce a dedicated TypedDict for the “SequentialLR + milestones” shape (instead of raw dict[str, list[int]]) would make the config schema more self‑documenting. Based on learnings.

Also applies to: 133-137, 139-140, 178-179

examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml (1)

1-7: Clarify policy.optimizer: null semantics

You’re inheriting from sft.yaml and explicitly setting policy.optimizer: null while relying on policy.megatron_cfg.optimizer for Megatron. That’s reasonable, but only if downstream code treats optimizer as optional / nullable rather than using 'optimizer' in policy_cfg or assuming it’s always a mapping.

If the intent is “no PyTorch optimizer, only Megatron optimizer”, please confirm the loader logic handles None correctly; otherwise you might prefer removing the key entirely at the base config or gating on megatron_cfg.enabled instead.

Also applies to: 8-19

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fc917f and ae5d7ac.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (21)

.gitignore (1 hunks)
.gitmodules (1 hunks)
3rdparty/Automodel-workspace/Automodel (1 hunks)
3rdparty/Megatron-Bridge-workspace/Megatron-Bridge (1 hunks)
3rdparty/Megatron-Bridge-workspace/setup.py (1 hunks)
3rdparty/Megatron-LM-workspace/Megatron-LM (1 hunks)
examples/configs/grpo_math_1B.yaml (1 hunks)
examples/configs/recipes/llm/grpo-deepscaler-1.5b-16K.yaml (1 hunks)
examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml (1 hunks)
examples/configs/recipes/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.yaml (1 hunks)
examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml (1 hunks)
examples/configs/sft.yaml (3 hunks)
examples/configs/vlm_grpo_3B.yaml (3 hunks)
examples/configs/vlm_grpo_3B_megatron.yaml (4 hunks)
nemo_rl/models/policy/__init__.py (6 hunks)
nemo_rl/models/policy/megatron_policy_worker.py (3 hunks)
nemo_rl/models/policy/utils.py (1 hunks)
pyproject.toml (4 hunks)
tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh (1 hunks)
tests/test_suites/llm/sft-qwen3-next-80ba3b-8n8g-megatron.sh (1 hunks)
tests/test_suites/nightly.txt (2 hunks)

🧰 Additional context used

🧠 Learnings (8)

📚 Learning: 2025-10-30T20:50:44.126Z

Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.

Applied to files:

examples/configs/grpo_math_1B.yaml

📚 Learning: 2025-10-12T14:46:57.171Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh
tests/test_suites/llm/sft-qwen3-next-80ba3b-8n8g-megatron.sh

📚 Learning: 2025-10-12T14:46:55.513Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Applied to files:

tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh

📚 Learning: 2025-09-18T13:26:43.307Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml:19-26
Timestamp: 2025-09-18T13:26:43.307Z
Learning: In on-policy distillation workflows, validation can use downstream task performance (like math problem solving) as RL-like reward metrics rather than traditional distillation metrics like KL divergence. In this case, "val_reward" with "higher_is_better: true" is the correct checkpoint monitoring configuration.

Applied to files:

examples/configs/vlm_grpo_3B.yaml
examples/configs/vlm_grpo_3B_megatron.yaml
examples/configs/sft.yaml

📚 Learning: 2025-09-19T03:00:58.662Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

Applied to files:

examples/configs/recipes/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.yaml

📚 Learning: 2025-09-17T01:52:21.399Z

Learnt from: ffrujeri
Repo: NVIDIA-NeMo/RL PR: 1023
File: nemo_rl/utils/checkpoint.py:58-65
Timestamp: 2025-09-17T01:52:21.399Z
Learning: model_state_dict_keys is not intended to be part of the nemo-rl CheckpointingConfig TypedDict - it's handled at the automodel implementation layer, not as a general checkpointing configuration parameter.

Applied to files:

nemo_rl/models/policy/megatron_policy_worker.py
nemo_rl/models/policy/__init__.py

📚 Learning: 2025-09-19T02:44:38.451Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:73-84
Timestamp: 2025-09-19T02:44:38.451Z
Learning: The scheduler configuration format with a separate "milestones: [20]" entry (not wrapped under name/kwargs) is a valid and established pattern used across GRPO, DPO, and distillation configs in the NeMo RL codebase. This format specifies transition points between different schedulers (e.g., LinearLR for warmup steps, then ConstantLR).

Applied to files:

nemo_rl/models/policy/__init__.py

📚 Learning: 2025-09-24T18:36:06.287Z

Learnt from: terrykong
Repo: NVIDIA-NeMo/RL PR: 1024
File: examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp4.yaml:1-1
Timestamp: 2025-09-24T18:36:06.287Z
Learning: In the NVIDIA NeMo RL repository, when working with Hydra config defaults, the scalar string format (defaults: ../../dpo.yaml) is acceptable and preferred over the list format, even though Hydra typically expects defaults to be a list.

Applied to files:

examples/configs/recipes/llm/grpo-deepscaler-1.5b-16K.yaml

🧬 Code graph analysis (1)

nemo_rl/models/policy/__init__.py (1)

nemo_rl/models/generation/interfaces.py (1)

GenerationConfig (118-131)

🪛 GitHub Actions: Automodel Integration and Submodule Checks

3rdparty/Automodel-workspace/Automodel

[error] 1-1: Submodule Automodel has diverged from the main branch. TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/ CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/ This indicates parallel development; manual merge may be required.

🪛 Shellcheck (0.11.0)

tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/sft-qwen3-next-80ba3b-8n8g-megatron.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: semantic-pull-request / semantic-pull-request
GitHub Check: Lint check
GitHub Check: Post automodel integration comment / Comment on PR

🔇 Additional comments (19)

nemo_rl/models/policy/utils.py (1)

295-295: No changes required; the commented assertion is an intentional workaround.

The commented-out assertion on line 295 is part of a documented temporary workaround for HuggingFace issue #38002 (referenced in the TODO on line 290). This is not dead code but rather a controlled relaxation of validation to support models like qwen3-next that may lack the sliding_window attribute while having use_sliding_window=False.

Backward compatibility is maintained: the function sets "sliding_window": None regardless of whether the attribute exists, and existing tests in tests/unit/models/policy/test_dtensor_worker.py and tests/unit/models/policy/test_dtensor_worker_v2.py extensively exercise this code path with qwen2 and qwen3 models, confirming that models both with and without the attribute continue to work.

pyproject.toml (3)

54-54: No critical issues found, but verify build-time dependency duplication.

Verification results:

✓ Torch 2.8.0 compatibility: transformers ≥4.45.0 (required by rwkv-fla) is compatible with torch 2.8.0

✓ Version string intentional: 0.7.202508221413 is a specific date-based build identifier (2025-08-22 14:13)

Duplication context: rwkv-fla>=0.7.202508221413 appears in both main dependencies (line 54) and mcore optional-dependencies (line 104) as separate dependency groups. This pattern may be intentional for mcore-specific builds or could be redundant; the codebase doesn't provide clear indication either way.

Recommend confirming with the team whether both main and mcore sections need rwkv-fla, or if mcore inherits from main dependencies.

28-28: Pin is correct and necessary — Qwen3-Next support justifies the specific version.

Qwen3-Next was added to Transformers in v4.57.0 (September 2025), making v4.57.1 the stable patch release required for Qwen3-Next support. Earlier versions are incompatible, so the specific pinning prevents accidental downgrades to unsupported versions. v4.57.1 is a small patch release with bug fixes and no breaking API changes, confirming compatibility with the rest of the dependency stack.

50-50: Remove the mlflow constraint concern—the original premise is invalid.

MLflow 3.6.0 was released Nov 7, 2025, and MLflow 4.0 has not been released. The constraint relaxation from >=3.5.0,<3.6.0 to bare mlflow simply allows newer 3.x releases without re-pinning, which is reasonable during development. There is no exposure to 4.x breaking changes.

.gitignore (1)

50-52: Good practice to exclude local environment variable files.

Adding .env and .envrc to the gitignore prevents accidental commits of local development configurations and secrets. This aligns with standard development practices.

examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml (1)

1-3: Minor formatting adjustment to defaults list.

The reordering of defaults entries has no semantic impact on configuration inheritance. Both entries remain functionally identical in their order.

examples/configs/grpo_math_1B.yaml (1)

154-160: Parameter verified as valid—approve changes.

The average_in_collective parameter is confirmed as a valid boolean in Megatron-LM's DistributedDataParallelConfig, where it controls gradient averaging inside the collective operation. The configuration change is appropriate and consistent with Megatron-LM's API.

examples/configs/recipes/llm/grpo-deepscaler-1.5b-16K.yaml (2)

1-3: YAML list syntax correction is correct.

The fix properly restores valid YAML list syntax by removing extraneous dashes from each list item. This enables Hydra to correctly parse the defaults configuration. With multiple defaults items, a proper YAML list structure (single dash per item) is required.

2-3: Configuration references verified and are compatible.

Both config files exist, contain valid YAML syntax, and are compatible:

examples/configs/grpo_math_1B.yaml ✓

examples/configs/recipes/llm/grpo-deepscaler-1.5b-8K.yaml ✓

The second file correctly uses a defaults inheritance pattern with selective parameter overrides (e.g., adjusted training batch sizes, learning rate, and model specifics), which is a standard and compatible configuration composition approach.

3rdparty/Megatron-LM-workspace/Megatron-LM (1)

1-1: Submodule update is targeted for Qwen3-Next support; verify through test execution.

The Megatron-LM update is tied to git commit f9df826 ("qwen3-next mcore path recipes") on a custom fork branch specifically for RL+Qwen3 integration. Verification confirms:

Dependency constraints are in place (megatron-core>=0.15.0a0,<0.17.0) with consistency validation in setup.py

Integration is config-driven via Megatron-Bridge abstraction (YAML recipes), not direct code imports, reducing API coupling

Functional tests exist (grpo_megatron.sh, sft_megatron.sh, dpo_megatron.sh, distillation_megatron.sh)

Recommended: Run the provided Megatron functional tests locally to validate integration and ensure no regressions with the Qwen3-Next recipes.

.gitmodules (1)

3-4: Confirm the purpose and timeline for the personal fork and feature branch.

The verification confirms these are intentional changes to use your personal forks (ahmadki/Megatron-LM and ahmadki/Automodel) on matching feature branches (ahmadki/rl-qwen3-from-nemo-rl-submodule). This is part of active qwen3-next development, and the previous version also used a personal fork.

However, no documented merge plan or timeline was found in the commit history. To clear this concern:

Confirm whether these changes are intended to remain temporary until merged upstream to the official NVIDIA-NeMo repositories, or if they are intended as a longer-term pattern.

If temporary, document the expected merge timeline or add a note in the code or PR description for clarity.

3rdparty/Megatron-Bridge-workspace/setup.py (1)

40-40: I'll search for the official release notes and breaking changes for megatron-core 0.16.0.

The first search had limited details. Let me search for more specific information about megatron-core version history and the specific 0.16.0 release.

Let me search for more information about planned Megatron-Core releases and any announced breaking changes.

Verify that the version range actually targets intended megatron-core versions.

The latest released version on PyPI is megatron-core 0.14.0 (October 8, 2025). The dependency specification allows >=0.15.0a0,<0.17.0, which permits versions 0.15.x and 0.16.x that have not yet been released. Since these versions don't exist publicly, it's not possible to verify breaking changes against them at this time.

Clarify whether:

You intend to support released versions (currently up to 0.14.0) or wait for future releases

The lower bound 0.15.0a0 is correct, or if 0.14.0 is the intended minimum

You have specific knowledge of breaking changes planned for 0.15.x or 0.16.x from NVIDIA's roadmap

nemo_rl/models/policy/__init__.py (3)

20-31: DTensor, dynamic batching, and sequence packing gating look consistent

The pattern of required dtensor_cfg / dynamic_batching blocks with an enabled: bool flag (and other fields NotRequired[...]) looks coherent and matches how the new recipes use dtensor_cfg.enabled=false plus megatron_cfg.enabled=true. This should keep the config surface explicit while allowing minimal stubs like {enabled: false}.

Also applies to: 142-153, 169-173

81-88: Megatron config and DDP additions look correct

Adding average_in_collective: bool to MegatronDDPConfig matches how new configs set distributed_data_parallel_config.average_in_collective=true.

Moving several Megatron fields to NotRequired[...] and widening max_grad_norm to float | int matches real‑world usage and should reduce spurious type errors, while keeping enabled: bool required ensures explicit selection of the Megatron path.

No obvious inconsistencies with the new Megatron Qwen3‑next recipes.

Also applies to: 90-118, 176-176

121-126: Tokenizer chat_template_kwargs and logprob_chunk_size extensions

Typing chat_template_kwargs as NotRequired[dict[str, Any]] and adding optional logprob_chunk_size: NotRequired[int] on PolicyConfig gives enough flexibility for different tokenizers and logprob sharding strategies without over‑constraining the schema. This matches the looser generation config typing pattern shown in GenerationConfig.

Also applies to: 160-166

tests/test_suites/nightly.txt (1)

51-53: Nightly wiring for Qwen3-next GRPO/SFT looks good

The new Qwen3-next GRPO and SFT Megatron scripts are registered under the expected GRPO/SFT sections with consistent naming; this should integrate cleanly into the nightly suite.

Also applies to: 76-78

examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml (1)

8-18: Overall SFT Qwen3-next Megatron recipe looks coherent

dtensor is explicitly disabled while Megatron is enabled, aligning with the updated PolicyConfig typings.

make_sequence_length_divisible_by: ${policy.megatron_cfg.tensor_model_parallel_size} is consistent with existing recipes that tie divisibility to TP size.

Data, logging, and cluster settings are in line with other large-model SFT recipes.

No additional structural issues spotted.

Also applies to: 31-47

tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh (1)

5-13: GRPO Qwen3-next test harness matches existing patterns

The NUM_* config block, exit_if_max_steps_reached, uv run invocation, TB→JSON conversion, and jq/metric checks all mirror existing GRPO Megatron tests, including use of common.env variables and unquoted $@. No additional changes needed.

Also applies to: 16-33

tests/test_suites/llm/sft-qwen3-next-80ba3b-8n8g-megatron.sh (1)

5-13: SFT Qwen3-next Megatron test script looks solid

The script follows the standard tests/common.env pattern (NUM_* vars, exit_if_max_steps_reached, uv run, TB→JSON, gated metric checks). Metrics keyed at step 200 align with MAX_STEPS=200, and the bash syntax is correct.

Also applies to: 16-40

coderabbitai · 2025-11-17T19:11:36Z

3rdparty/Automodel-workspace/Automodel

@@ -1 +1 @@
-Subproject commit a2db048383cd54b3fafc928df4c30bf7bbf7c430
+Subproject commit 45466962771e9aac6f6f26eb05a0b287983a9b33


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Resolve submodule divergence flagged by pipeline.

The GitHub Actions pipeline has detected that the Automodel submodule has diverged from the main branch. The current PR points to commit 45466962771e9aac6f6f26eb05a0b287983a9b33, while main is at a2db048383cd54b3fafc928df4c30bf7bbf7c430. The pipeline indicates "manual merge may be required," which could block PR integration.

Please clarify:

Is this submodule divergence intentional for qwen3-next support?

What commits were added in the new branch and why weren't they in main?

Has this been coordinated with the Automodel maintainers, or does main need to be updated first?

Run the following script to investigate the divergence:

🏁 Script executed:

#!/bin/bash # Description: Investigate Automodel submodule divergence between branches # Show commit details for both the current PR commit and main branch commit echo "=== PR Commit (45466962771e9aac6f6f26eb05a0b287983a9b33) ===" curl -s https://api.github.com/repos/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33 | jq '{sha: .sha, message: .commit.message, author: .commit.author.name, date: .commit.author.date}' echo "" echo "=== Main Branch Commit (a2db048383cd54b3fafc928df4c30bf7bbf7c430) ===" curl -s https://api.github.com/repos/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430 | jq '{sha: .sha, message: .commit.message, author: .commit.author.name, date: .commit.author.date}' echo "" echo "=== Checking commit history to find common ancestor ===" curl -s "https://api.github.com/repos/NVIDIA-NeMo/Automodel/compare/a2db048383cd54b3fafc928df4c30bf7bbf7c430...45466962771e9aac6f6f26eb05a0b287983a9b33" | jq '{ahead_by: .ahead_by, behind_by: .behind_by, diverged: (.ahead_by > 0 and .behind_by > 0)}'

Length of output: 1152

Resolve submodule divergence—PR targets outdated Automodel commit.

The Automodel submodule has confirmed divergence: the PR points to commit 45466962771e9aac6f6f26eb05a0b287983a9b33 (Sept 10, 2025), while main is at a2db048383cd54b3fafc928df4c30bf7bbf7c430 (Oct 10, 2025). Both branches have evolved independently (109 commits ahead, 3 commits behind).

More importantly, main contains critical updates: transformer-engine 2.5 → 2.8 and CUDA 12.8 → 12.9 (applied Oct 10). The PR's commit predates these by ~1 month.

Action required:

Confirm whether qwen3-next support intentionally requires the Sept 10 commit despite missing the latest CUDA 12.9 and TE 2.8 updates.

If the latest main is compatible with qwen3-next, rebase/update the submodule pointer to a2db048383cd54b3fafc928df4c30bf7bbf7c430.

If the Sept 10 commit is necessary, document why the newer CUDA/TE updates cannot be included.

🧰 Tools

🪛 GitHub Actions: Automodel Integration and Submodule Checks

[error] 1-1: Submodule Automodel has diverged from the main branch. TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/ CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/ This indicates parallel development; manual merge may be required.

🤖 Prompt for AI Agents

In 3rdparty/Automodel-workspace/Automodel around lines 1-1 the PR points the Automodel submodule at an outdated commit (4546696...) that is diverged from main (a2db048...); you must either confirm the Sept 10 commit is intentionally required or update the submodule to main — if qwen3-next does not need the older commit, fetch the Automodel repo, checkout or fetch main at commit a2db048383cd54b3fafc928df4c30bf7bbf7c430, update the superproject submodule pointer to that commit (git submodule update --init; cd 3rdparty/Automodel-workspace/Automodel; git fetch && git checkout a2db048...; cd ../..; git add 3rdparty/Automodel-workspace/Automodel; git commit -m "bump Automodel submodule to a2db048... (Oct 10)"), run CI/build and any CUDA/transformer-engine compatibility tests, and push the submodule-pointer commit; if the older Sept 10 commit is required, add a brief rationale in the PR describing why the October updates (CUDA 12.9 / TE 2.8) cannot be used and document any blockers and test results.

coderabbitai · 2025-11-17T19:11:36Z

3rdparty/Megatron-Bridge-workspace/Megatron-Bridge

@@ -1 +1 @@
-Subproject commit f003cd8ca3e4876853b6097e816f0a94ea8fefc1
+Subproject commit b6728896fd2215da024981169f30baaee1e70e56


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify submodule commit and dependency compatibility with version bumps.

The submodule pointer is being updated alongside dependency constraint changes (megatron-core <0.16.0 → <0.17.0, new transformers>=4.57.1). Ensure the new commit in Megatron-Bridge is compatible with these changes and that no breaking changes exist in the upgraded versions.

Run the following checks to verify the changes:

Additionally, verify that the setup.py dependency constraint updates are intentional and documented:

Confirm megatron-core version bump from <0.16.0 to <0.17.0 aligns with Qwen3-Next support

Confirm transformers>=4.57.1 requirement is necessary for model tokenization or features

🏁 Script executed:

#!/bin/bash # Description: Verify Megatron-Bridge submodule commit exists and validate dependency metadata # Check if the commit hash is valid and retrieve commit info git ls-remote --heads --tags https://github.com/NVIDIA/Megatron-Bridge.git b6728896fd2215da024981169f30baaee1e70e56 || \ echo "Commit not found; checking if it exists in the repository..." # Alternative: Clone and check directly git clone --depth 1 https://github.com/NVIDIA/Megatron-Bridge.git /tmp/megatron-bridge-check 2>/dev/null && \ cd /tmp/megatron-bridge-check && \ git log --oneline -1 b6728896fd2215da024981169f30baaee1e70e56 2>/dev/null || \ echo "Commit verification: unable to confirm commit exists" # Verify PyPI availability of dependencies with version constraints echo "=== Checking PyPI for megatron-core versions ===" pip index versions megatron-core 2>/dev/null | head -20 echo "=== Checking PyPI for transformers versions ===" pip index versions transformers 2>/dev/null | head -20

Length of output: 2025

Let me verify the actual setup.py file changes to understand the dependency constraints better:

🏁 Script executed:

#!/bin/bash # Find and display setup.py to verify dependency version constraints find . -name "setup.py" -type f | head -5 | while read setup_file; do echo "=== File: $setup_file ===" grep -n "megatron-core\|transformers" "$setup_file" || echo "No megatron-core or transformers found" echo "" done # Also check setup.cfg and pyproject.toml if they exist find . -name "setup.cfg" -o -name "pyproject.toml" | head -5 | while read config_file; do echo "=== File: $config_file ===" grep -n "megatron-core\|transformers" "$config_file" || echo "No megatron-core or transformers found" echo "" done

Length of output: 769

Let me check the git history to see what the previous dependency constraint was:

🏁 Script executed:

#!/bin/bash # Check git history for setup.py changes to Megatron-Bridge-workspace cd ./git && git log --oneline -20 -- 3rdparty/Megatron-Bridge-workspace/setup.py 2>/dev/null || echo "No git history available" echo "=== Current setup.py dependency constraints ===" grep -A2 -B2 "megatron-core\|transformers" 3rdparty/Megatron-Bridge-workspace/setup.py | head -20 echo "" echo "=== Checking if .gitmodules exists for submodule info ===" if [ -f .gitmodules ]; then echo "=== .gitmodules content ===" cat .gitmodules else echo ".gitmodules not found" fi

Length of output: 1068

Let me verify the commit with the correct repository URL and check the git diff for setup.py changes:

🏁 Script executed:

#!/bin/bash # Verify commit with CORRECT Megatron-Bridge URL from .gitmodules echo "=== Verifying commit b6728896fd2215da024981169f30baaee1e70e56 in correct repo ===" git ls-remote --heads --tags https://github.com/NVIDIA-NeMo/Megatron-Bridge.git b6728896fd2215da024981169f30baaee1e70e56 2>/dev/null || echo "Commit not found as branch/tag" echo "" echo "=== Checking if commit exists in repository history ===" git ls-remote https://github.com/NVIDIA-NeMo/Megatron-Bridge.git | head -5 echo "" echo "=== Get git diff for setup.py to see what changed ===" git diff HEAD~1 HEAD -- 3rdparty/Megatron-Bridge-workspace/setup.py 2>/dev/null || echo "Git diff not available or no prior commit" echo "" echo "=== Current HEAD info ===" git log -1 --oneline 2>/dev/null || echo "Git log not available"

Length of output: 557

Let me check the current state of .gitmodules and the submodule pointer to understand what's being changed:

🏁 Script executed:

#!/bin/bash # Check current submodule state and what pointer is being set echo "=== Current submodule commit pointers ===" git config --file .gitmodules --get-regexp path | while read path_line; do path=$(echo "$path_line" | awk '{print $2}') echo "Path: $path" if [ -d "$path/.git" ]; then cd "$path" 2>/dev/null && git rev-parse HEAD 2>/dev/null && cd - > /dev/null else echo " (not initialized)" fi done echo "" echo "=== Check git ls-files for submodule information ===" git ls-files --stage | grep 160000 echo "" echo "=== Search for commit hash in codebase (may indicate intended change) ===" grep -r "b6728896fd2215da024981169f30baaee1e70e56" . --exclude-dir=.git 2>/dev/null || echo "Commit hash not found in codebase"

Length of output: 698

Critical: Megatron-Bridge submodule commit does not exist in remote repository.

The submodule pointer b6728896fd2215da024981169f30baaee1e70e56 cannot be resolved in https://github.com/NVIDIA-NeMo/Megatron-Bridge.git. The current HEAD in that repository is 4ab7b62b40eb0f43f51bb5a323082e6af18173de. Users attempting to clone and initialize submodules will encounter a "commit not found" error, causing build failures.

Verify that:

The commit hash is correct

The commit has been pushed to the Megatron-Bridge repository main branch

Or provide the correct commit hash that exists in the repository

The dependency versions (megatron-core pre-release and transformers 4.57.1) are valid and available.

🤖 Prompt for AI Agents

In 3rdparty/Megatron-Bridge-workspace/Megatron-Bridge around lines 1-1, the submodule commit pointer b6728896fd2215da024981169f30baaee1e70e56 does not exist in the remote https://github.com/NVIDIA-NeMo/Megatron-Bridge.git; update the submodule reference to a valid pushed commit (e.g., the current remote HEAD 4ab7b62b40eb0f43f51bb5a323082e6af18173de) or replace it with the correct commit hash that exists in the remote, then commit the updated .gitmodules/.git link (or re-add the submodule) so users can clone and init submodules without a "commit not found" error.

coderabbitai · 2025-11-17T19:11:36Z

3rdparty/Megatron-Bridge-workspace/setup.py

    "transformer-engine[pytorch]>=2.9.0a0,<2.10.0",
    "mamba-ssm",
    "nvidia-resiliency-ext",
+    "transformers>=4.57.1",


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify transformers version 4.57.1 availability and qwen3-next requirements.

The dependency on transformers>=4.57.1 has been added. Please confirm:

This version exists and is available on PyPI

This specific version is required for qwen3-next model support

There are no known compatibility issues with other dependencies

🌐 Web query:

What is the latest version of transformers library on PyPI?

💡 Result:

The latest version on PyPI is transformers 4.57.1 — uploaded Oct 14, 2025. [1]

Sources
[1] PyPI — transformers release files (transformers‑4.57.1).

--- 🏁 Script executed: ```shell #!/bin/bash # Search for qwen3-next references and version rationale in codebase rg -i 'qwen3-next' -A 3 -B 3

Length of output: 5094

🏁 Script executed:

#!/bin/bash # Check full setup.py to see all dependencies and constraints cat 3rdparty/Megatron-Bridge-workspace/setup.py

Length of output: 3891

--- 🏁 Script executed: ```shell #!/bin/bash # Check if Megatron-Bridge submodule has pyproject.toml and its transformers spec cat 3rdparty/Megatron-Bridge/pyproject.toml 2>/dev/null | grep -A 50 dependencies

Length of output: 0

🏁 Script executed:

#!/bin/bash # Search for any comments or documentation about transformers version in codebase rg 'transformers' -i --type yaml --type py -B 2 -A 2 | grep -i 'version\|4\.57\|qwen' | head -30

Length of output: 2294

--- 🏁 Script executed: ```shell #!/bin/bash # View full test_transformer_memory_regression.py to understand memory regression issue cat tests/unit/test_transformer_memory_regression.py

Length of output: 3092

🏁 Script executed:

#!/bin/bash # Search for skip decorators or conditionals around this test rg 'test_transformers_version_memory_regression' -B 10 -A 15

Length of output: 2446

🏁 Script executed:

#!/bin/bash # Check for any recent changes or comments about transformers version bump git log --oneline --all -S 'transformers>=4.57.1' 2>/dev/null | head -5 || echo "Git log not available"

Length of output: 58

Remove test_transformer_memory_regression.py and reinstate disabled nightly test per the documented cleanup instructions.

The transformers>=4.57.1 constraint in setup.py (added for qwen3-next support) will immediately fail tests/unit/test_transformer_memory_regression.py, which enforces version [4.54, 4.56). This is actually good news—the memory regression is fixed in 4.57.1—but the cleanup steps documented in the test must be completed:

Delete tests/unit/test_transformer_memory_regression.py

Reinstate the disabled nightly test: tests/test_suites/llm/dpo-mistral-nemo-instruct-2407-1n8g-fsdp2tp8-actckpt-long.sh

Update GitHub issue: Dtensor memory regression with TP and long sequence length present in transformers>=4.54,<4.56 #1343

🤖 Prompt for AI Agents

In 3rdparty/Megatron-Bridge-workspace/setup.py around line 45, the bump to "transformers>=4.57.1" makes the existing unit test test_transformer_memory_regression.py invalid; follow the documented cleanup: delete tests/unit/test_transformer_memory_regression.py, re-enable the previously disabled nightly test tests/test_suites/llm/dpo-mistral-nemo-instruct-2407-1n8g-fsdp2tp8-actckpt-long.sh (restore it to the test matrix), and update the referenced GitHub issue https://github.com/NVIDIA-NeMo/RL/issues/1343 to reflect that the regression is fixed and cleanup completed.

coderabbitai · 2025-11-17T19:11:36Z

examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml

+  megatron_cfg:
+    enabled: true
+    converter_type: Qwen3NextForCausalLM
+    pipeline_model_parallel_size: 2
+    expert_model_parallel_size: 8
+    optimizer:
+      lr: 2.0e-05
+      min_lr: 1.99999e-05
+      weight_decay: 0.01
+      bf16: true
+    scheduler:
+      lr_warmup_init: 1.9999e-65


⚠️ Potential issue | 🟡 Minor

Double-check lr_warmup_init magnitude

lr: 2.0e-05, min_lr: 1.99999e-05, but scheduler.lr_warmup_init: 1.9999e-65 is extremely close to zero and looks like it might be a -05 vs -65 typo. That won’t break anything but will effectively start from ~0 lr during warmup.

If this wasn’t intentional (e.g., you meant 1.9999e-05 or similar), it’s worth correcting now to avoid confusing future readers.

🤖 Prompt for AI Agents

In examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml around lines 19 to 30, the scheduler.lr_warmup_init value is set to 1.9999e-65 which is extremely close to zero and likely a typo (should probably be 1.9999e-05); update lr_warmup_init to the intended magnitude (e.g., 1.9999e-05 or another value consistent with lr/min_lr) so warmup starts at the correct learning rate and avoid confusing future readers.

coderabbitai · 2025-11-17T19:11:37Z

nemo_rl/models/policy/megatron_policy_worker.py

+                average_in_collective=self.cfg["megatron_cfg"][
+                    "distributed_data_parallel_config"
+                ]["average_in_collective"],


⚠️ Potential issue | 🟠 Major

Restore fallback for average_in_collective.

Directly indexing ["average_in_collective"] will raise a KeyError for any existing config (or user override) that hasn’t been updated with the new flag. That regresses previously working setups. Please retain a default (e.g., False) when the key is absent.

- average_in_collective=self.cfg["megatron_cfg"][ - "distributed_data_parallel_config" - ]["average_in_collective"], + average_in_collective=self.cfg["megatron_cfg"][ + "distributed_data_parallel_config" + ].get("average_in_collective", False),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

average_in_collective=self.cfg["megatron_cfg"][

"distributed_data_parallel_config"

]["average_in_collective"],

average_in_collective=self.cfg["megatron_cfg"][

"distributed_data_parallel_config"

].get("average_in_collective", False),

🤖 Prompt for AI Agents

In nemo_rl/models/policy/megatron_policy_worker.py around lines 694 to 696, the code directly indexes ["average_in_collective"] which will raise KeyError if the key is missing; update the access to use a safe lookup with a default (e.g., use .get on the parent dict and then .get("average_in_collective", False) or equivalent) so that when the key is absent the value falls back to False instead of raising an exception.

coderabbitai · 2025-11-17T19:11:37Z

nemo_rl/models/policy/megatron_policy_worker.py

+            "grad_norm": torch.tensor(
+                mb_metrics["grad_norm"][-1]
+            ).cpu(),  # TODO @sahilj: return an average or something later


⚠️ Potential issue | 🟠 Major

Guard against empty grad-norm history.

When num_global_batches evaluates to zero (e.g., final partial batch smaller than gbs), mb_metrics["grad_norm"] stays empty and the new [-1] access throws. Please fall back to 0.0 (or skip emitting the metric) when the list is empty.

- "grad_norm": torch.tensor( - mb_metrics["grad_norm"][-1] - ).cpu(), # TODO @sahilj: return an average or something later + "grad_norm": torch.tensor( + (mb_metrics.get("grad_norm") or [0.0])[-1] + ).cpu(), # TODO @sahilj: return an average or something later

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"grad_norm": torch.tensor(

mb_metrics["grad_norm"][-1]

).cpu(), # TODO @sahilj: return an average or something later

"grad_norm": torch.tensor(

(mb_metrics.get("grad_norm") or [0.0])[-1]

).cpu(), # TODO @sahilj: return an average or something later

🤖 Prompt for AI Agents

In nemo_rl/models/policy/megatron_policy_worker.py around lines 1142 to 1144, guard against an empty mb_metrics["grad_norm"] list before indexing [-1]; if the list is non-empty use the last value, otherwise use 0.0 as the fallback (convert to a tensor and .cpu() as before) so the code does not throw when num_global_batches == 0.

coderabbitai · 2025-11-17T19:11:37Z

tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh

+Only run metrics if the target step is reached
+if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then
+    uv run tests/check_metrics.py $JSON_METRICS \
+        'data["train/token_mult_prob_error"]["30"] < 1.1' \
+        'data["train/reward"]["30"] > 0.43' \
+        'mean(data["timing/train/total_step_time"], -6, -1) < 305'
+fi


⚠️ Potential issue | 🔴 Critical

Fix missing comment marker before metrics gate

Line 34 (Only run metrics if the target step is reached) is missing # and will be executed as a command, causing the script to fail.

Recommend:

-Only run metrics if the target step is reached +# Only run metrics if the target step is reached

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Only run metrics if the target step is reached

if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then

uv run tests/check_metrics.py $JSON_METRICS \

'data["train/token_mult_prob_error"]["30"] < 1.1' \

'data["train/reward"]["30"] > 0.43' \

'mean(data["timing/train/total_step_time"], -6, -1) < 305'

fi

# Only run metrics if the target step is reached

if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then

uv run tests/check_metrics.py $JSON_METRICS \

'data["train/token_mult_prob_error"]["30"] < 1.1' \

'data["train/reward"]["30"] > 0.43' \

'mean(data["timing/train/total_step_time"], -6, -1) < 305'

fi

🤖 Prompt for AI Agents

In tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh around lines 34 to 40 the explanatory line "Only run metrics if the target step is reached" is missing a leading '#' so the shell will try to execute it; edit the file to prefix that line with '#' to make it a comment (or remove the line entirely) so the subsequent if-statement runs correctly and the script no longer attempts to execute the text as a command.

Signed-off-by: Ahmad Kiswani <[email protected]>

github-actions · 2025-11-17T19:14:19Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 8a16bb5 (PR #1530 from ahmadki/qwen3-next)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

Signed-off-by: Ahmad Kiswani <[email protected]>

ahmadki · 2025-11-19T16:21:35Z

With terrykong/Megatron-LM@0d40160 and after #1541 is merged, I'll rebase this PR which should reduce it to just configs and tests.

yaoyu-33 and others added 13 commits November 15, 2025 22:24

update to use mixed_precision_wrapper api

2944cd7

Update refit_verifier.py

d458dd7

Signed-off-by: Yu Yao <[email protected]>

update setup.py

dfe58b8

fix: Fix policy worker placement when using unified placement group (#…

6257143

…1341) Signed-off-by: Guyue Huang <[email protected]> Signed-off-by: Guyue Huang <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

update setup.py

87a1d8c

revert setup.py change for megatron-bridge

1ed6b5d

update run_mcore_engine import path

c7a042d

chore: major version bump (torch 2.8, vllm 0.11, ray 2.49) & SP fixes (…

f90fe85

…#1334) Signed-off-by: Terry Kong <[email protected]>

fix dependency issues

e24ff52

Signed-off-by: Terry Kong <[email protected]> fix more stuff Signed-off-by: Terry Kong <[email protected]> fix Signed-off-by: Terry Kong <[email protected]>

feat: qwen3-next mcore path recipes (GRPO and SFT)

f9df826

Signed-off-by: Ahmad Kiswani <[email protected]>

reverted auto-model changes, not needed after opencv overrides

72915eb

Signed-off-by: Ahmad Kiswani <[email protected]>

fixed sft.yaml val metric name

e0f3661

fixed dpo conflict ? Signed-off-by: Ahmad Kiswani <[email protected]>

made the sft config make more sense

4bb8b2c

Signed-off-by: Ahmad Kiswani <[email protected]>

ahmadki requested review from a team as code owners November 17, 2025 18:57

This was referenced Nov 17, 2025

Support Qwen-3 next in MCore path #1431

Open

feat: Support qwen3-next, mcore path #1510

Closed

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

test metrics

8a16bb5

Signed-off-by: Ahmad Kiswani <[email protected]>

ahmadki force-pushed the ahmadki/qwen3-next branch from ae5d7ac to 8a16bb5 Compare November 17, 2025 19:13

back to using Terry's fork of mlm since the fixes were upstreamed

2bf6313

Signed-off-by: Ahmad Kiswani <[email protected]>

		@@ -1 +1 @@
		Subproject commit a2db048383cd54b3fafc928df4c30bf7bbf7c430
		Subproject commit 45466962771e9aac6f6f26eb05a0b287983a9b33

		@@ -1 +1 @@
		Subproject commit f003cd8ca3e4876853b6097e816f0a94ea8fefc1
		Subproject commit b6728896fd2215da024981169f30baaee1e70e56

feat: Support qwen3-next, mcore path #1530

Are you sure you want to change the base?

feat: Support qwen3-next, mcore path #1530

Uh oh!

Conversation

ahmadki commented Nov 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 17, 2025

❌ Submodule Fast-Forward Check Failed

✅ Submodules that are properly updated:

❌ Submodules that need attention:

Uh oh!

coderabbitai bot commented Nov 17, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 17, 2025

❌ Submodule Fast-Forward Check Failed

✅ Submodules that are properly updated:

❌ Submodules that need attention:

Uh oh!

ahmadki commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ahmadki commented Nov 17, 2025 •

edited by coderabbitai bot

Loading