Skip to content

Conversation

@ahmadki
Copy link
Member

@ahmadki ahmadki commented Nov 17, 2025

What does this PR do ?

Add qwen3-next support, recipes and tests to mcore path.

Issues

#1431

Usage

SFT:

uv run --extra mcore --extra vllm --extra automodel python examples/run_sft.py \
    --config examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml \

GRPO:

uv run --extra mcore --extra vllm python examples/run_grpo_math.py \
    --config examples/configs/recipes/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.yaml

Before your PR is "Ready for review"

Pre checks:

  • [x ] Make sure you read and followed Contributor guidelines
  • [x ] Did you write any new necessary tests?
  • [x ] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features

    • Added training recipes for Qwen3-Next-80B model with GRPO and SFT configurations.
  • Bug Fixes

    • Fixed YAML syntax errors in GRPO configuration files.
  • Configuration Enhancements

    • Updated distributed data parallel settings across training recipes.
    • Updated dependency versions including Transformer library and new runtime dependencies.
  • Tests

    • Added automated test suites for Qwen3-Next-80B model training pipelines.

@github-actions
Copy link

❌ Submodule Fast-Forward Check Failed

Check based on commit: ae5d7ac (PR #1530 from ahmadki/qwen3-next)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 17, 2025

📝 Walkthrough

Walkthrough

This PR updates NeMo-RL to support Qwen3-Next-80B model integration with Megatron-LM, adding new GRPO and SFT test recipes and configurations, refactoring policy configuration type signatures, updating submodule pointers and dependencies, and removing some validation logic from MegatronPolicyWorker.

Changes

Cohort / File(s) Summary
Version control configuration
.gitignore, .gitmodules
Added .env and .envrc to gitignore; updated Megatron-LM submodule URL and branch from terrykong/Megatron-LM (yuya/nemo-rl-use-dev) to ahmadki/Megatron-LM (ahmadki/rl-qwen3-from-nemo-rl-submodule)
Submodule pointers
3rdparty/Automodel-workspace/Automodel, 3rdparty/Megatron-Bridge-workspace/Megatron-Bridge, 3rdparty/Megatron-LM-workspace/Megatron-LM
Updated submodule commit references to latest versions
Dependency management
pyproject.toml, 3rdparty/Megatron-Bridge-workspace/setup.py
Pinned transformers to >=4.57.1; relaxed mlflow constraint; updated megatron-core to <0.17.0; added rwkv-fla>=0.7.202508221413; adjusted ruff linting configuration
GRPO and general policy configs
examples/configs/grpo_math_1B.yaml, examples/configs/sft.yaml, examples/configs/vlm_grpo_3B.yaml, examples/configs/vlm_grpo_3B_megatron.yaml
Added average_in_collective flag; removed KL divergence-related loss configuration; changed defer_fp32_logits to null; updated metric names from accuracy to reward
GRPO recipe configs
examples/configs/recipes/llm/grpo-deepscaler-1.5b-16K.yaml, examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml, examples/configs/recipes/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.yaml
Fixed YAML list syntax; added comprehensive Qwen3-Next-80B configuration with tensor/pipeline/expert parallelism settings, generation, logging, and cluster topology
SFT recipe config
examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml
Added new supervised fine-tuning configuration for Qwen3-Next-80B with Megatron parallelism, batch sizing, and multi-node deployment settings
Policy type definitions
nemo_rl/models/policy/__init__.py
Refactored TypedDict signatures: changed enabled fields from Literal[True] to bool; added average_in_collective to MegatronDDPConfig; converted many fields to NotRequired; removed Disabled config variants; added converter_type to MegatronConfig; adjusted type constraints on optimizer/scheduler fields
Megatron policy worker
nemo_rl/models/policy/megatron_policy_worker.py
Removed assertion coupling logprob_chunk_size to defer_fp32_logits; removed per-token loss forcing and MoE sanity check; sourced average_in_collective from config; added grad_norm to training metrics; removed public method check_tensor_parallel_attributes
Policy utilities
nemo_rl/models/policy/utils.py
Relaxed runtime assertion in sliding_window_overwrite; commented out precondition check for hf_config.sliding_window
Test scripts and configuration
tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh, tests/test_suites/llm/sft-qwen3-next-80ba3b-8n8g-megatron.sh, tests/test_suites/nightly.txt
Added two new test shell scripts for GRPO and SFT Qwen3-Next experiments with logging, metrics validation, and TensorBoard conversion; registered tests in nightly suite

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Runner
    participant Experiment as Training Experiment
    participant Logging as Log Handler
    participant Metrics as Metrics Validator
    
    Test->>Test: Initialize config (NUM_NODES, MAX_STEPS)
    Test->>Test: Check if max steps reached
    alt Max steps not reached
        Test->>Experiment: Run training (uv run)
        Experiment-->>Experiment: Training loop
        Experiment-->>Logging: Write tensorboard logs
        Test->>Logging: Convert TB logs to JSON
        Test->>Metrics: Parse JSON metrics
        Metrics->>Metrics: Validate conditions<br/>(loss, reward/val_loss, timing)
        alt All metrics pass
            Metrics-->>Test: Success
        else Metrics fail
            Metrics-->>Test: Failure
        end
    else Max steps reached
        Test->>Test: Exit early
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • Type signature refactoring (nemo_rl/models/policy/__init__.py): Extensive changes to TypedDict definitions (removal of Disabled variants, NotRequired adjustments, type narrowing/widening) require careful validation of backward compatibility and config parsing logic
  • MegatronPolicyWorker logic changes (nemo_rl/models/policy/megatron_policy_worker.py): Removal of assertions (defer_fp32_logits coupling, per-token loss forcing) and public method deletion need verification that downstream code handles these changes; grad_norm metric collection logic should be reviewed for correctness
  • Submodule pointer updates: Verify that the new Megatron-LM and Megatron-Bridge commits are stable and compatible with the updated configurations
  • Dependency constraint changes (pyproject.toml): Transformer pinning to 4.57.1 and mlflow relaxation should be tested against existing recipes to ensure no regressions

Possibly related PRs

  • PR #1389: Modifies average_in_collective field in MegatronDDPConfig and changes MegatronPolicyWorker method signatures, directly overlapping with type and implementation changes in this PR
  • PR #1398: Updates Megatron integration code and config typing in the same policy modules, affecting how Megatron configurations are parsed and applied
  • PR #1334: Modifies nemo_rl/models/policy/init.py TypedDict definitions and examples/configs/sft.yaml recipes with overlapping configuration field changes

Suggested labels

CI:L1, Run CICD

Suggested reviewers

  • terrykong
  • guyueh1
  • parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning PR claims tests were run locally but provides no test results, metrics, or evidence. Review comments document unresolved blocker issues including invalid submodule commit hashes, runtime errors, and syntax errors, indicating code was not actually tested. Provide actual test execution logs with final metrics for GRPO and SFT recipes. Resolve all review comment blockers: fix invalid Megatron-Bridge commit, fix KeyError/IndexError in megatron_policy_worker.py, fix test script syntax error, and document baseline metrics for new test thresholds.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main feature being added: support for qwen3-next model with mcore (Megatron-core) path integration, which aligns with the PR's primary objective of adding qwen3-next support and recipes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ahmadki/qwen3-next

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining why this PR is needed, why this solution was chosen, and what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pyproject.toml (1)

304-310: Move per-file-ignores back to [tool.ruff.lint.per-file-ignores] to align with current ruff configuration standards.

The recommended placement for per-file-ignores in pyproject.toml is under [tool.ruff.lint.per-file-ignores], but the code uses the top-level [tool.ruff.per-file-ignores] placement, which appears in older examples. While ruff still recognizes both formats, using the nested structure maintains consistency with current ruff documentation and best practices.

🧹 Nitpick comments (4)
nemo_rl/models/policy/__init__.py (3)

15-15: Minor: consider unifying union style

Mixing Union[...] (for max_grad_norm) with | unions elsewhere is legal but a bit inconsistent. You may want to standardize on one style for readability, preferably | now that you’re on Python 3.10+.


33-39: SequencePackingConfig now requires algorithm

Making algorithm: str required means every config that turns sequence packing on must supply an algorithm name. That’s fine, but it does tighten the config surface; please double‑check all existing sequence_packing uses (e.g., seqpack SFT/GRPO recipes) specify algorithm, or consider NotRequired[str] if you still want a default.


71-79: Scheduler typings align with milestones pattern but could be clearer

  • lr_decay_iters: int in MegatronSchedulerConfig and milestones: NotRequired[list[int]] in SinglePytorchSchedulerConfig match the “top‑level milestones alongside scheduler config” pattern used in GRPO/DPO/distillation configs.
  • Allowing PolicyConfig.scheduler: list[SinglePytorchSchedulerConfig] | SchedulerMilestones captures both the standard list form and the milestones‑only dict form, but the union is a bit opaque for readers and tools.

If this proves confusing, a follow‑up refactor to introduce a dedicated TypedDict for the “SequentialLR + milestones” shape (instead of raw dict[str, list[int]]) would make the config schema more self‑documenting. Based on learnings.

Also applies to: 133-137, 139-140, 178-179

examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml (1)

1-7: Clarify policy.optimizer: null semantics

You’re inheriting from sft.yaml and explicitly setting policy.optimizer: null while relying on policy.megatron_cfg.optimizer for Megatron. That’s reasonable, but only if downstream code treats optimizer as optional / nullable rather than using 'optimizer' in policy_cfg or assuming it’s always a mapping.

If the intent is “no PyTorch optimizer, only Megatron optimizer”, please confirm the loader logic handles None correctly; otherwise you might prefer removing the key entirely at the base config or gating on megatron_cfg.enabled instead.

Also applies to: 8-19

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fc917f and ae5d7ac.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (21)
  • .gitignore (1 hunks)
  • .gitmodules (1 hunks)
  • 3rdparty/Automodel-workspace/Automodel (1 hunks)
  • 3rdparty/Megatron-Bridge-workspace/Megatron-Bridge (1 hunks)
  • 3rdparty/Megatron-Bridge-workspace/setup.py (1 hunks)
  • 3rdparty/Megatron-LM-workspace/Megatron-LM (1 hunks)
  • examples/configs/grpo_math_1B.yaml (1 hunks)
  • examples/configs/recipes/llm/grpo-deepscaler-1.5b-16K.yaml (1 hunks)
  • examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml (1 hunks)
  • examples/configs/recipes/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.yaml (1 hunks)
  • examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml (1 hunks)
  • examples/configs/sft.yaml (3 hunks)
  • examples/configs/vlm_grpo_3B.yaml (3 hunks)
  • examples/configs/vlm_grpo_3B_megatron.yaml (4 hunks)
  • nemo_rl/models/policy/__init__.py (6 hunks)
  • nemo_rl/models/policy/megatron_policy_worker.py (3 hunks)
  • nemo_rl/models/policy/utils.py (1 hunks)
  • pyproject.toml (4 hunks)
  • tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh (1 hunks)
  • tests/test_suites/llm/sft-qwen3-next-80ba3b-8n8g-megatron.sh (1 hunks)
  • tests/test_suites/nightly.txt (2 hunks)
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-10-30T20:50:44.126Z
Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.

Applied to files:

  • examples/configs/grpo_math_1B.yaml
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

  • tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh
  • tests/test_suites/llm/sft-qwen3-next-80ba3b-8n8g-megatron.sh
📚 Learning: 2025-10-12T14:46:55.513Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Applied to files:

  • tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh
📚 Learning: 2025-09-18T13:26:43.307Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml:19-26
Timestamp: 2025-09-18T13:26:43.307Z
Learning: In on-policy distillation workflows, validation can use downstream task performance (like math problem solving) as RL-like reward metrics rather than traditional distillation metrics like KL divergence. In this case, "val_reward" with "higher_is_better: true" is the correct checkpoint monitoring configuration.

Applied to files:

  • examples/configs/vlm_grpo_3B.yaml
  • examples/configs/vlm_grpo_3B_megatron.yaml
  • examples/configs/sft.yaml
📚 Learning: 2025-09-19T03:00:58.662Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

Applied to files:

  • examples/configs/recipes/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.yaml
📚 Learning: 2025-09-17T01:52:21.399Z
Learnt from: ffrujeri
Repo: NVIDIA-NeMo/RL PR: 1023
File: nemo_rl/utils/checkpoint.py:58-65
Timestamp: 2025-09-17T01:52:21.399Z
Learning: model_state_dict_keys is not intended to be part of the nemo-rl CheckpointingConfig TypedDict - it's handled at the automodel implementation layer, not as a general checkpointing configuration parameter.

Applied to files:

  • nemo_rl/models/policy/megatron_policy_worker.py
  • nemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-19T02:44:38.451Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:73-84
Timestamp: 2025-09-19T02:44:38.451Z
Learning: The scheduler configuration format with a separate "milestones: [20]" entry (not wrapped under name/kwargs) is a valid and established pattern used across GRPO, DPO, and distillation configs in the NeMo RL codebase. This format specifies transition points between different schedulers (e.g., LinearLR for warmup steps, then ConstantLR).

Applied to files:

  • nemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-24T18:36:06.287Z
Learnt from: terrykong
Repo: NVIDIA-NeMo/RL PR: 1024
File: examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp4.yaml:1-1
Timestamp: 2025-09-24T18:36:06.287Z
Learning: In the NVIDIA NeMo RL repository, when working with Hydra config defaults, the scalar string format (defaults: ../../dpo.yaml) is acceptable and preferred over the list format, even though Hydra typically expects defaults to be a list.

Applied to files:

  • examples/configs/recipes/llm/grpo-deepscaler-1.5b-16K.yaml
🧬 Code graph analysis (1)
nemo_rl/models/policy/__init__.py (1)
nemo_rl/models/generation/interfaces.py (1)
  • GenerationConfig (118-131)
🪛 GitHub Actions: Automodel Integration and Submodule Checks
3rdparty/Automodel-workspace/Automodel

[error] 1-1: Submodule Automodel has diverged from the main branch. TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/ CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/ This indicates parallel development; manual merge may be required.

🪛 Shellcheck (0.11.0)
tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/sft-qwen3-next-80ba3b-8n8g-megatron.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: semantic-pull-request / semantic-pull-request
  • GitHub Check: Lint check
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (19)
nemo_rl/models/policy/utils.py (1)

295-295: No changes required; the commented assertion is an intentional workaround.

The commented-out assertion on line 295 is part of a documented temporary workaround for HuggingFace issue #38002 (referenced in the TODO on line 290). This is not dead code but rather a controlled relaxation of validation to support models like qwen3-next that may lack the sliding_window attribute while having use_sliding_window=False.

Backward compatibility is maintained: the function sets "sliding_window": None regardless of whether the attribute exists, and existing tests in tests/unit/models/policy/test_dtensor_worker.py and tests/unit/models/policy/test_dtensor_worker_v2.py extensively exercise this code path with qwen2 and qwen3 models, confirming that models both with and without the attribute continue to work.

pyproject.toml (3)

54-54: No critical issues found, but verify build-time dependency duplication.

Verification results:

  • Torch 2.8.0 compatibility: transformers ≥4.45.0 (required by rwkv-fla) is compatible with torch 2.8.0
  • Version string intentional: 0.7.202508221413 is a specific date-based build identifier (2025-08-22 14:13)
  • Duplication context: rwkv-fla>=0.7.202508221413 appears in both main dependencies (line 54) and mcore optional-dependencies (line 104) as separate dependency groups. This pattern may be intentional for mcore-specific builds or could be redundant; the codebase doesn't provide clear indication either way.

Recommend confirming with the team whether both main and mcore sections need rwkv-fla, or if mcore inherits from main dependencies.


28-28: Pin is correct and necessary — Qwen3-Next support justifies the specific version.

Qwen3-Next was added to Transformers in v4.57.0 (September 2025), making v4.57.1 the stable patch release required for Qwen3-Next support. Earlier versions are incompatible, so the specific pinning prevents accidental downgrades to unsupported versions. v4.57.1 is a small patch release with bug fixes and no breaking API changes, confirming compatibility with the rest of the dependency stack.


50-50: Remove the mlflow constraint concern—the original premise is invalid.

MLflow 3.6.0 was released Nov 7, 2025, and MLflow 4.0 has not been released. The constraint relaxation from >=3.5.0,<3.6.0 to bare mlflow simply allows newer 3.x releases without re-pinning, which is reasonable during development. There is no exposure to 4.x breaking changes.

.gitignore (1)

50-52: Good practice to exclude local environment variable files.

Adding .env and .envrc to the gitignore prevents accidental commits of local development configurations and secrets. This aligns with standard development practices.

examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml (1)

1-3: Minor formatting adjustment to defaults list.

The reordering of defaults entries has no semantic impact on configuration inheritance. Both entries remain functionally identical in their order.

examples/configs/grpo_math_1B.yaml (1)

154-160: Parameter verified as valid—approve changes.

The average_in_collective parameter is confirmed as a valid boolean in Megatron-LM's DistributedDataParallelConfig, where it controls gradient averaging inside the collective operation. The configuration change is appropriate and consistent with Megatron-LM's API.

examples/configs/recipes/llm/grpo-deepscaler-1.5b-16K.yaml (2)

1-3: YAML list syntax correction is correct.

The fix properly restores valid YAML list syntax by removing extraneous dashes from each list item. This enables Hydra to correctly parse the defaults configuration. With multiple defaults items, a proper YAML list structure (single dash per item) is required.


2-3: Configuration references verified and are compatible.

Both config files exist, contain valid YAML syntax, and are compatible:

  • examples/configs/grpo_math_1B.yaml
  • examples/configs/recipes/llm/grpo-deepscaler-1.5b-8K.yaml

The second file correctly uses a defaults inheritance pattern with selective parameter overrides (e.g., adjusted training batch sizes, learning rate, and model specifics), which is a standard and compatible configuration composition approach.

3rdparty/Megatron-LM-workspace/Megatron-LM (1)

1-1: Submodule update is targeted for Qwen3-Next support; verify through test execution.

The Megatron-LM update is tied to git commit f9df826 ("qwen3-next mcore path recipes") on a custom fork branch specifically for RL+Qwen3 integration. Verification confirms:

  • Dependency constraints are in place (megatron-core>=0.15.0a0,<0.17.0) with consistency validation in setup.py
  • Integration is config-driven via Megatron-Bridge abstraction (YAML recipes), not direct code imports, reducing API coupling
  • Functional tests exist (grpo_megatron.sh, sft_megatron.sh, dpo_megatron.sh, distillation_megatron.sh)

Recommended: Run the provided Megatron functional tests locally to validate integration and ensure no regressions with the Qwen3-Next recipes.

.gitmodules (1)

3-4: Confirm the purpose and timeline for the personal fork and feature branch.

The verification confirms these are intentional changes to use your personal forks (ahmadki/Megatron-LM and ahmadki/Automodel) on matching feature branches (ahmadki/rl-qwen3-from-nemo-rl-submodule). This is part of active qwen3-next development, and the previous version also used a personal fork.

However, no documented merge plan or timeline was found in the commit history. To clear this concern:

  • Confirm whether these changes are intended to remain temporary until merged upstream to the official NVIDIA-NeMo repositories, or if they are intended as a longer-term pattern.
  • If temporary, document the expected merge timeline or add a note in the code or PR description for clarity.
3rdparty/Megatron-Bridge-workspace/setup.py (1)

40-40: I'll search for the official release notes and breaking changes for megatron-core 0.16.0.

The first search had limited details. Let me search for more specific information about megatron-core version history and the specific 0.16.0 release.

Let me search for more information about planned Megatron-Core releases and any announced breaking changes.

Verify that the version range actually targets intended megatron-core versions.

The latest released version on PyPI is megatron-core 0.14.0 (October 8, 2025). The dependency specification allows >=0.15.0a0,<0.17.0, which permits versions 0.15.x and 0.16.x that have not yet been released. Since these versions don't exist publicly, it's not possible to verify breaking changes against them at this time.

Clarify whether:

  • You intend to support released versions (currently up to 0.14.0) or wait for future releases
  • The lower bound 0.15.0a0 is correct, or if 0.14.0 is the intended minimum
  • You have specific knowledge of breaking changes planned for 0.15.x or 0.16.x from NVIDIA's roadmap
nemo_rl/models/policy/__init__.py (3)

20-31: DTensor, dynamic batching, and sequence packing gating look consistent

The pattern of required dtensor_cfg / dynamic_batching blocks with an enabled: bool flag (and other fields NotRequired[...]) looks coherent and matches how the new recipes use dtensor_cfg.enabled=false plus megatron_cfg.enabled=true. This should keep the config surface explicit while allowing minimal stubs like {enabled: false}.

Also applies to: 142-153, 169-173


81-88: Megatron config and DDP additions look correct

  • Adding average_in_collective: bool to MegatronDDPConfig matches how new configs set distributed_data_parallel_config.average_in_collective=true.
  • Moving several Megatron fields to NotRequired[...] and widening max_grad_norm to float | int matches real‑world usage and should reduce spurious type errors, while keeping enabled: bool required ensures explicit selection of the Megatron path.

No obvious inconsistencies with the new Megatron Qwen3‑next recipes.

Also applies to: 90-118, 176-176


121-126: Tokenizer chat_template_kwargs and logprob_chunk_size extensions

Typing chat_template_kwargs as NotRequired[dict[str, Any]] and adding optional logprob_chunk_size: NotRequired[int] on PolicyConfig gives enough flexibility for different tokenizers and logprob sharding strategies without over‑constraining the schema. This matches the looser generation config typing pattern shown in GenerationConfig.

Also applies to: 160-166

tests/test_suites/nightly.txt (1)

51-53: Nightly wiring for Qwen3-next GRPO/SFT looks good

The new Qwen3-next GRPO and SFT Megatron scripts are registered under the expected GRPO/SFT sections with consistent naming; this should integrate cleanly into the nightly suite.

Also applies to: 76-78

examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml (1)

8-18: Overall SFT Qwen3-next Megatron recipe looks coherent

  • dtensor is explicitly disabled while Megatron is enabled, aligning with the updated PolicyConfig typings.
  • make_sequence_length_divisible_by: ${policy.megatron_cfg.tensor_model_parallel_size} is consistent with existing recipes that tie divisibility to TP size.
  • Data, logging, and cluster settings are in line with other large-model SFT recipes.

No additional structural issues spotted.

Also applies to: 31-47

tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh (1)

5-13: GRPO Qwen3-next test harness matches existing patterns

The NUM_* config block, exit_if_max_steps_reached, uv run invocation, TB→JSON conversion, and jq/metric checks all mirror existing GRPO Megatron tests, including use of common.env variables and unquoted $@. No additional changes needed.

Also applies to: 16-33

tests/test_suites/llm/sft-qwen3-next-80ba3b-8n8g-megatron.sh (1)

5-13: SFT Qwen3-next Megatron test script looks solid

The script follows the standard tests/common.env pattern (NUM_* vars, exit_if_max_steps_reached, uv run, TB→JSON, gated metric checks). Metrics keyed at step 200 align with MAX_STEPS=200, and the bash syntax is correct.

Also applies to: 16-40

@@ -1 +1 @@
Subproject commit a2db048383cd54b3fafc928df4c30bf7bbf7c430
Subproject commit 45466962771e9aac6f6f26eb05a0b287983a9b33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Resolve submodule divergence flagged by pipeline.

The GitHub Actions pipeline has detected that the Automodel submodule has diverged from the main branch. The current PR points to commit 45466962771e9aac6f6f26eb05a0b287983a9b33, while main is at a2db048383cd54b3fafc928df4c30bf7bbf7c430. The pipeline indicates "manual merge may be required," which could block PR integration.

Please clarify:

  1. Is this submodule divergence intentional for qwen3-next support?
  2. What commits were added in the new branch and why weren't they in main?
  3. Has this been coordinated with the Automodel maintainers, or does main need to be updated first?

Run the following script to investigate the divergence:


🏁 Script executed:

#!/bin/bash
# Description: Investigate Automodel submodule divergence between branches

# Show commit details for both the current PR commit and main branch commit
echo "=== PR Commit (45466962771e9aac6f6f26eb05a0b287983a9b33) ==="
curl -s https://api.github.com/repos/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33 | jq '{sha: .sha, message: .commit.message, author: .commit.author.name, date: .commit.author.date}'

echo ""
echo "=== Main Branch Commit (a2db048383cd54b3fafc928df4c30bf7bbf7c430) ==="
curl -s https://api.github.com/repos/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430 | jq '{sha: .sha, message: .commit.message, author: .commit.author.name, date: .commit.author.date}'

echo ""
echo "=== Checking commit history to find common ancestor ==="
curl -s "https://api.github.com/repos/NVIDIA-NeMo/Automodel/compare/a2db048383cd54b3fafc928df4c30bf7bbf7c430...45466962771e9aac6f6f26eb05a0b287983a9b33" | jq '{ahead_by: .ahead_by, behind_by: .behind_by, diverged: (.ahead_by > 0 and .behind_by > 0)}'

Length of output: 1152


Resolve submodule divergence—PR targets outdated Automodel commit.

The Automodel submodule has confirmed divergence: the PR points to commit 45466962771e9aac6f6f26eb05a0b287983a9b33 (Sept 10, 2025), while main is at a2db048383cd54b3fafc928df4c30bf7bbf7c430 (Oct 10, 2025). Both branches have evolved independently (109 commits ahead, 3 commits behind).

More importantly, main contains critical updates: transformer-engine 2.5 → 2.8 and CUDA 12.8 → 12.9 (applied Oct 10). The PR's commit predates these by ~1 month.

Action required:

  1. Confirm whether qwen3-next support intentionally requires the Sept 10 commit despite missing the latest CUDA 12.9 and TE 2.8 updates.
  2. If the latest main is compatible with qwen3-next, rebase/update the submodule pointer to a2db048383cd54b3fafc928df4c30bf7bbf7c430.
  3. If the Sept 10 commit is necessary, document why the newer CUDA/TE updates cannot be included.
🧰 Tools
🪛 GitHub Actions: Automodel Integration and Submodule Checks

[error] 1-1: Submodule Automodel has diverged from the main branch. TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/ CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/ This indicates parallel development; manual merge may be required.

🤖 Prompt for AI Agents
In 3rdparty/Automodel-workspace/Automodel around lines 1-1 the PR points the
Automodel submodule at an outdated commit (4546696...) that is diverged from
main (a2db048...); you must either confirm the Sept 10 commit is intentionally
required or update the submodule to main — if qwen3-next does not need the older
commit, fetch the Automodel repo, checkout or fetch main at commit
a2db048383cd54b3fafc928df4c30bf7bbf7c430, update the superproject submodule
pointer to that commit (git submodule update --init; cd
3rdparty/Automodel-workspace/Automodel; git fetch && git checkout a2db048...; cd
../..; git add 3rdparty/Automodel-workspace/Automodel; git commit -m "bump
Automodel submodule to a2db048... (Oct 10)"), run CI/build and any
CUDA/transformer-engine compatibility tests, and push the submodule-pointer
commit; if the older Sept 10 commit is required, add a brief rationale in the PR
describing why the October updates (CUDA 12.9 / TE 2.8) cannot be used and
document any blockers and test results.

@@ -1 +1 @@
Subproject commit f003cd8ca3e4876853b6097e816f0a94ea8fefc1
Subproject commit b6728896fd2215da024981169f30baaee1e70e56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify submodule commit and dependency compatibility with version bumps.

The submodule pointer is being updated alongside dependency constraint changes (megatron-core <0.16.0 → <0.17.0, new transformers>=4.57.1). Ensure the new commit in Megatron-Bridge is compatible with these changes and that no breaking changes exist in the upgraded versions.

Run the following checks to verify the changes:

Additionally, verify that the setup.py dependency constraint updates are intentional and documented:

  • Confirm megatron-core version bump from <0.16.0 to <0.17.0 aligns with Qwen3-Next support
  • Confirm transformers>=4.57.1 requirement is necessary for model tokenization or features

🏁 Script executed:

#!/bin/bash
# Description: Verify Megatron-Bridge submodule commit exists and validate dependency metadata

# Check if the commit hash is valid and retrieve commit info
git ls-remote --heads --tags https://github.com/NVIDIA/Megatron-Bridge.git b6728896fd2215da024981169f30baaee1e70e56 || \
  echo "Commit not found; checking if it exists in the repository..."

# Alternative: Clone and check directly
git clone --depth 1 https://github.com/NVIDIA/Megatron-Bridge.git /tmp/megatron-bridge-check 2>/dev/null && \
  cd /tmp/megatron-bridge-check && \
  git log --oneline -1 b6728896fd2215da024981169f30baaee1e70e56 2>/dev/null || \
  echo "Commit verification: unable to confirm commit exists"

# Verify PyPI availability of dependencies with version constraints
echo "=== Checking PyPI for megatron-core versions ===" 
pip index versions megatron-core 2>/dev/null | head -20

echo "=== Checking PyPI for transformers versions ==="
pip index versions transformers 2>/dev/null | head -20

Length of output: 2025


Let me verify the actual setup.py file changes to understand the dependency constraints better:


🏁 Script executed:

#!/bin/bash
# Find and display setup.py to verify dependency version constraints

find . -name "setup.py" -type f | head -5 | while read setup_file; do
  echo "=== File: $setup_file ==="
  grep -n "megatron-core\|transformers" "$setup_file" || echo "No megatron-core or transformers found"
  echo ""
done

# Also check setup.cfg and pyproject.toml if they exist
find . -name "setup.cfg" -o -name "pyproject.toml" | head -5 | while read config_file; do
  echo "=== File: $config_file ==="
  grep -n "megatron-core\|transformers" "$config_file" || echo "No megatron-core or transformers found"
  echo ""
done

Length of output: 769


Let me check the git history to see what the previous dependency constraint was:


🏁 Script executed:

#!/bin/bash
# Check git history for setup.py changes to Megatron-Bridge-workspace

cd ./git && git log --oneline -20 -- 3rdparty/Megatron-Bridge-workspace/setup.py 2>/dev/null || echo "No git history available"

echo "=== Current setup.py dependency constraints ==="
grep -A2 -B2 "megatron-core\|transformers" 3rdparty/Megatron-Bridge-workspace/setup.py | head -20

echo ""
echo "=== Checking if .gitmodules exists for submodule info ==="
if [ -f .gitmodules ]; then
  echo "=== .gitmodules content ==="
  cat .gitmodules
else
  echo ".gitmodules not found"
fi

Length of output: 1068


Let me verify the commit with the correct repository URL and check the git diff for setup.py changes:


🏁 Script executed:

#!/bin/bash
# Verify commit with CORRECT Megatron-Bridge URL from .gitmodules

echo "=== Verifying commit b6728896fd2215da024981169f30baaee1e70e56 in correct repo ==="
git ls-remote --heads --tags https://github.com/NVIDIA-NeMo/Megatron-Bridge.git b6728896fd2215da024981169f30baaee1e70e56 2>/dev/null || echo "Commit not found as branch/tag"

echo ""
echo "=== Checking if commit exists in repository history ==="
git ls-remote https://github.com/NVIDIA-NeMo/Megatron-Bridge.git | head -5

echo ""
echo "=== Get git diff for setup.py to see what changed ==="
git diff HEAD~1 HEAD -- 3rdparty/Megatron-Bridge-workspace/setup.py 2>/dev/null || echo "Git diff not available or no prior commit"

echo ""
echo "=== Current HEAD info ==="
git log -1 --oneline 2>/dev/null || echo "Git log not available"

Length of output: 557


Let me check the current state of .gitmodules and the submodule pointer to understand what's being changed:


🏁 Script executed:

#!/bin/bash
# Check current submodule state and what pointer is being set

echo "=== Current submodule commit pointers ==="
git config --file .gitmodules --get-regexp path | while read path_line; do
  path=$(echo "$path_line" | awk '{print $2}')
  echo "Path: $path"
  if [ -d "$path/.git" ]; then
    cd "$path" 2>/dev/null && git rev-parse HEAD 2>/dev/null && cd - > /dev/null
  else
    echo "  (not initialized)"
  fi
done

echo ""
echo "=== Check git ls-files for submodule information ==="
git ls-files --stage | grep 160000

echo ""
echo "=== Search for commit hash in codebase (may indicate intended change) ==="
grep -r "b6728896fd2215da024981169f30baaee1e70e56" . --exclude-dir=.git 2>/dev/null || echo "Commit hash not found in codebase"

Length of output: 698


Critical: Megatron-Bridge submodule commit does not exist in remote repository.

The submodule pointer b6728896fd2215da024981169f30baaee1e70e56 cannot be resolved in https://github.com/NVIDIA-NeMo/Megatron-Bridge.git. The current HEAD in that repository is 4ab7b62b40eb0f43f51bb5a323082e6af18173de. Users attempting to clone and initialize submodules will encounter a "commit not found" error, causing build failures.

Verify that:

  • The commit hash is correct
  • The commit has been pushed to the Megatron-Bridge repository main branch
  • Or provide the correct commit hash that exists in the repository

The dependency versions (megatron-core pre-release and transformers 4.57.1) are valid and available.

🤖 Prompt for AI Agents
In 3rdparty/Megatron-Bridge-workspace/Megatron-Bridge around lines 1-1, the
submodule commit pointer b6728896fd2215da024981169f30baaee1e70e56 does not exist
in the remote https://github.com/NVIDIA-NeMo/Megatron-Bridge.git; update the
submodule reference to a valid pushed commit (e.g., the current remote HEAD
4ab7b62b40eb0f43f51bb5a323082e6af18173de) or replace it with the correct commit
hash that exists in the remote, then commit the updated .gitmodules/.git link
(or re-add the submodule) so users can clone and init submodules without a
"commit not found" error.

"transformer-engine[pytorch]>=2.9.0a0,<2.10.0",
"mamba-ssm",
"nvidia-resiliency-ext",
"transformers>=4.57.1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify transformers version 4.57.1 availability and qwen3-next requirements.

The dependency on transformers>=4.57.1 has been added. Please confirm:

  • This version exists and is available on PyPI
  • This specific version is required for qwen3-next model support
  • There are no known compatibility issues with other dependencies

🌐 Web query:

What is the latest version of transformers library on PyPI?

💡 Result:

The latest version on PyPI is transformers 4.57.1 — uploaded Oct 14, 2025. [1]

Sources
[1] PyPI — transformers release files (transformers‑4.57.1).



---

🏁 Script executed:

```shell
#!/bin/bash
# Search for qwen3-next references and version rationale in codebase
rg -i 'qwen3-next' -A 3 -B 3

Length of output: 5094


🏁 Script executed:

#!/bin/bash
# Check full setup.py to see all dependencies and constraints
cat 3rdparty/Megatron-Bridge-workspace/setup.py

Length of output: 3891



---

🏁 Script executed:

```shell
#!/bin/bash
# Check if Megatron-Bridge submodule has pyproject.toml and its transformers spec
cat 3rdparty/Megatron-Bridge/pyproject.toml 2>/dev/null | grep -A 50 dependencies

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search for any comments or documentation about transformers version in codebase
rg 'transformers' -i --type yaml --type py -B 2 -A 2 | grep -i 'version\|4\.57\|qwen' | head -30

Length of output: 2294



---

🏁 Script executed:

```shell
#!/bin/bash
# View full test_transformer_memory_regression.py to understand memory regression issue
cat tests/unit/test_transformer_memory_regression.py

Length of output: 3092


🏁 Script executed:

#!/bin/bash
# Search for skip decorators or conditionals around this test
rg 'test_transformers_version_memory_regression' -B 10 -A 15

Length of output: 2446


🏁 Script executed:

#!/bin/bash
# Check for any recent changes or comments about transformers version bump
git log --oneline --all -S 'transformers>=4.57.1' 2>/dev/null | head -5 || echo "Git log not available"

Length of output: 58


Remove test_transformer_memory_regression.py and reinstate disabled nightly test per the documented cleanup instructions.

The transformers>=4.57.1 constraint in setup.py (added for qwen3-next support) will immediately fail tests/unit/test_transformer_memory_regression.py, which enforces version [4.54, 4.56). This is actually good news—the memory regression is fixed in 4.57.1—but the cleanup steps documented in the test must be completed:

  1. Delete tests/unit/test_transformer_memory_regression.py
  2. Reinstate the disabled nightly test: tests/test_suites/llm/dpo-mistral-nemo-instruct-2407-1n8g-fsdp2tp8-actckpt-long.sh
  3. Update GitHub issue: Dtensor memory regression with TP and long sequence length present in transformers>=4.54,<4.56 #1343
🤖 Prompt for AI Agents
In 3rdparty/Megatron-Bridge-workspace/setup.py around line 45, the bump to
"transformers>=4.57.1" makes the existing unit test
test_transformer_memory_regression.py invalid; follow the documented cleanup:
delete tests/unit/test_transformer_memory_regression.py, re-enable the
previously disabled nightly test
tests/test_suites/llm/dpo-mistral-nemo-instruct-2407-1n8g-fsdp2tp8-actckpt-long.sh
(restore it to the test matrix), and update the referenced GitHub issue
https://github.com/NVIDIA-NeMo/RL/issues/1343 to reflect that the regression is
fixed and cleanup completed.

Comment on lines +19 to +30
megatron_cfg:
enabled: true
converter_type: Qwen3NextForCausalLM
pipeline_model_parallel_size: 2
expert_model_parallel_size: 8
optimizer:
lr: 2.0e-05
min_lr: 1.99999e-05
weight_decay: 0.01
bf16: true
scheduler:
lr_warmup_init: 1.9999e-65
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Double-check lr_warmup_init magnitude

lr: 2.0e-05, min_lr: 1.99999e-05, but scheduler.lr_warmup_init: 1.9999e-65 is extremely close to zero and looks like it might be a -05 vs -65 typo. That won’t break anything but will effectively start from ~0 lr during warmup.

If this wasn’t intentional (e.g., you meant 1.9999e-05 or similar), it’s worth correcting now to avoid confusing future readers.

🤖 Prompt for AI Agents
In examples/configs/recipes/llm/sft-qwen3-next-80ba3b-8n8g-megatron.yaml around
lines 19 to 30, the scheduler.lr_warmup_init value is set to 1.9999e-65 which is
extremely close to zero and likely a typo (should probably be 1.9999e-05);
update lr_warmup_init to the intended magnitude (e.g., 1.9999e-05 or another
value consistent with lr/min_lr) so warmup starts at the correct learning rate
and avoid confusing future readers.

Comment on lines +694 to +696
average_in_collective=self.cfg["megatron_cfg"][
"distributed_data_parallel_config"
]["average_in_collective"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restore fallback for average_in_collective.

Directly indexing ["average_in_collective"] will raise a KeyError for any existing config (or user override) that hasn’t been updated with the new flag. That regresses previously working setups. Please retain a default (e.g., False) when the key is absent.

-                average_in_collective=self.cfg["megatron_cfg"][
-                    "distributed_data_parallel_config"
-                ]["average_in_collective"],
+                average_in_collective=self.cfg["megatron_cfg"][
+                    "distributed_data_parallel_config"
+                ].get("average_in_collective", False),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
average_in_collective=self.cfg["megatron_cfg"][
"distributed_data_parallel_config"
]["average_in_collective"],
average_in_collective=self.cfg["megatron_cfg"][
"distributed_data_parallel_config"
].get("average_in_collective", False),
🤖 Prompt for AI Agents
In nemo_rl/models/policy/megatron_policy_worker.py around lines 694 to 696, the
code directly indexes ["average_in_collective"] which will raise KeyError if the
key is missing; update the access to use a safe lookup with a default (e.g., use
.get on the parent dict and then .get("average_in_collective", False) or
equivalent) so that when the key is absent the value falls back to False instead
of raising an exception.

Comment on lines +1142 to +1144
"grad_norm": torch.tensor(
mb_metrics["grad_norm"][-1]
).cpu(), # TODO @sahilj: return an average or something later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against empty grad-norm history.

When num_global_batches evaluates to zero (e.g., final partial batch smaller than gbs), mb_metrics["grad_norm"] stays empty and the new [-1] access throws. Please fall back to 0.0 (or skip emitting the metric) when the list is empty.

-            "grad_norm": torch.tensor(
-                mb_metrics["grad_norm"][-1]
-            ).cpu(),  # TODO @sahilj: return an average or something later
+            "grad_norm": torch.tensor(
+                (mb_metrics.get("grad_norm") or [0.0])[-1]
+            ).cpu(),  # TODO @sahilj: return an average or something later
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"grad_norm": torch.tensor(
mb_metrics["grad_norm"][-1]
).cpu(), # TODO @sahilj: return an average or something later
"grad_norm": torch.tensor(
(mb_metrics.get("grad_norm") or [0.0])[-1]
).cpu(), # TODO @sahilj: return an average or something later
🤖 Prompt for AI Agents
In nemo_rl/models/policy/megatron_policy_worker.py around lines 1142 to 1144,
guard against an empty mb_metrics["grad_norm"] list before indexing [-1]; if the
list is non-empty use the last value, otherwise use 0.0 as the fallback (convert
to a tensor and .cpu() as before) so the code does not throw when
num_global_batches == 0.

Comment on lines 34 to 40
Only run metrics if the target step is reached
if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then
uv run tests/check_metrics.py $JSON_METRICS \
'data["train/token_mult_prob_error"]["30"] < 1.1' \
'data["train/reward"]["30"] > 0.43' \
'mean(data["timing/train/total_step_time"], -6, -1) < 305'
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix missing comment marker before metrics gate

Line 34 (Only run metrics if the target step is reached) is missing # and will be executed as a command, causing the script to fail.

Recommend:

-Only run metrics if the target step is reached
+# Only run metrics if the target step is reached
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Only run metrics if the target step is reached
if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then
uv run tests/check_metrics.py $JSON_METRICS \
'data["train/token_mult_prob_error"]["30"] < 1.1' \
'data["train/reward"]["30"] > 0.43' \
'mean(data["timing/train/total_step_time"], -6, -1) < 305'
fi
# Only run metrics if the target step is reached
if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then
uv run tests/check_metrics.py $JSON_METRICS \
'data["train/token_mult_prob_error"]["30"] < 1.1' \
'data["train/reward"]["30"] > 0.43' \
'mean(data["timing/train/total_step_time"], -6, -1) < 305'
fi
🤖 Prompt for AI Agents
In tests/test_suites/llm/grpo-qwen3-next-80ba3b-8n8g-megatron.sh around lines 34
to 40 the explanatory line "Only run metrics if the target step is reached" is
missing a leading '#' so the shell will try to execute it; edit the file to
prefix that line with '#' to make it a comment (or remove the line entirely) so
the subsequent if-statement runs correctly and the script no longer attempts to
execute the text as a command.

Signed-off-by: Ahmad Kiswani <[email protected]>
@github-actions
Copy link

❌ Submodule Fast-Forward Check Failed

Check based on commit: 8a16bb5 (PR #1530 from ahmadki/qwen3-next)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1530 from ahmadki/qwen3-next): https://github.com/NVIDIA-NeMo/Automodel/commits/45466962771e9aac6f6f26eb05a0b287983a9b33/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@ahmadki
Copy link
Member Author

ahmadki commented Nov 19, 2025

With terrykong/Megatron-LM@0d40160 and after #1541 is merged, I'll rebase this PR which should reduce it to just configs and tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants