refactor: refactor env and data processor & add nemotron super 49b recipes #1506

yuki-97 · 2025-11-11T07:53:55Z

Follow up of #1472. Thanks @nv-mmanohara for adding this!

Add GRPO support for HelpSteer3 on LlamaNemotron 49B.
Add SFT support for tulu3 on LlamaNemotron 49B.
Add CodeJaccard environment.
Refactor env and data processor.
Introduce run_grpo.py, will replace run_grpo_math.py and run_grpo_rm.py in a subsequent PR.

Summary by CodeRabbit

New Features
- Added HelpSteer3 and Tulu3 datasets for training and evaluation.
- Introduced CodeJaccard environment for code-based similarity scoring.
- Enhanced data processor registry system with improved task flexibility.
- Added new GRPO training script and SFT configurations for Nemotron-49B models.
- Support for dynamic task naming across datasets.
Documentation
- Updated custom parallel plan paths.
Tests
- Added new GRPO HelpSteer3 and SFT test suites.

Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>

Signed-off-by: ruit <[email protected]>

Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>

… processors. Added raw_dataset.py and path.py for improved dataset processing. Updated project-includes in pyrefly.toml and modified grpo.md to reflect new task-dataset mapping. Cleaned up unused code and configurations in various YAML files. Signed-off-by: ruit <[email protected]>

…or handling - Introduced documentation for the new Code Jaccard Environment, detailing its functionality, usage, and configuration. - Updated RawDataset class to provide a default processor if none is specified in the data configuration. - Enhanced test coverage for the helpsteer3 data processor to ensure correct functionality and output. Signed-off-by: ruit <[email protected]> Signed-off-by: ruit <[email protected]>

github-actions · 2025-11-20T09:26:10Z

⚠️ File Consistency Check

Check based on commit: eb8aa50 (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
Update the appropriate related file(s) if necessary to maintain functional consistency
Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/dtensor/parallelize.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

- Updated CLEVRCoGenTDataset, OpenAIFormatDataset, and SquadDataset to inherit from the RawDataset class for improved dataset handling. - Added necessary imports for RawDataset in the respective files. Signed-off-by: ruit <[email protected]>

github-actions · 2025-11-21T03:40:11Z

⚠️ File Consistency Check

Check based on commit: e842bfd (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
Update the appropriate related file(s) if necessary to maintain functional consistency
Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/dtensor/parallelize.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

…up for vlm grpo - Added `env_name` to `vlm_grpo_3B_megatron.yaml` and `vlm_grpo_3B.yaml` for environment specification. - Modified `setup_data` function in `run_vlm_grpo.py` to use `env_name` for environment configuration, enhancing flexibility in dataset processing. Signed-off-by: ruit <[email protected]>

github-actions · 2025-11-21T06:47:16Z

⚠️ File Consistency Check

Check based on commit: 6e4393e (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
Update the appropriate related file(s) if necessary to maintain functional consistency
Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/dtensor/parallelize.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions bot added the documentation Improvements or additions to documentation label Nov 11, 2025

yuki-97 force-pushed the yukih/pr-1472 branch from 75f3d5c to 5ebbc73 Compare November 11, 2025 07:54

yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Nov 11, 2025

yuki-97 temporarily deployed to nemo-ci November 11, 2025 07:56 — with GitHub Actions Inactive

yuki-97 force-pushed the yukih/pr-1472 branch 2 times, most recently from c9335d4 to a872ed6 Compare November 11, 2025 09:27

yuki-97 removed the CI:L1 Run doctests, unit tests, and functional tests label Nov 11, 2025

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Nov 16, 2025

RayenTian temporarily deployed to nemo-ci November 16, 2025 03:31 — with GitHub Actions Inactive

RayenTian removed the CI:L1 Run doctests, unit tests, and functional tests label Nov 16, 2025

RayenTian had a problem deploying to nemo-ci November 16, 2025 03:35 — with GitHub Actions Error

RayenTian force-pushed the yukih/pr-1472 branch 2 times, most recently from b7fedb9 to 9078e33 Compare November 16, 2025 03:37

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Nov 16, 2025

RayenTian temporarily deployed to nemo-ci November 16, 2025 03:38 — with GitHub Actions Inactive

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 16, 2025

RayenTian temporarily deployed to nemo-ci November 16, 2025 08:50 — with GitHub Actions Inactive

RayenTian force-pushed the yukih/pr-1472 branch from c0bfaa6 to ab0ac80 Compare November 17, 2025 08:44

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 17, 2025

RayenTian temporarily deployed to nemo-ci November 17, 2025 08:58 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci November 17, 2025 08:59 — with GitHub Actions Inactive

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 17, 2025

RayenTian temporarily deployed to nemo-ci November 17, 2025 14:21 — with GitHub Actions Inactive

yuki-97 and others added 7 commits November 20, 2025 01:25

refactor yaml

2b30428

Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>

update custom parallel plan doc

f2ac388

Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>

revert logger.py

d0ea228

Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>

unify run_grpo with multiple env

60f48a4

Signed-off-by: ruit <[email protected]>

remove useless code

31205e3

Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>

RayenTian force-pushed the yukih/pr-1472 branch from f41ee48 to eb8aa50 Compare November 20, 2025 09:25

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 20, 2025

RayenTian temporarily deployed to nemo-ci November 20, 2025 09:30 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci November 20, 2025 09:59 — with GitHub Actions Inactive

RayenTian had a problem deploying to nemo-ci November 20, 2025 15:44 — with GitHub Actions Failure

RayenTian had a problem deploying to nemo-ci November 21, 2025 00:13 — with GitHub Actions Failure

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 21, 2025

RayenTian temporarily deployed to nemo-ci November 21, 2025 03:40 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci November 21, 2025 04:02 — with GitHub Actions Inactive

RayenTian had a problem deploying to nemo-ci November 21, 2025 05:41 — with GitHub Actions Failure

RayenTian requested a review from a team as a code owner November 21, 2025 06:46

RayenTian added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 21, 2025

RayenTian temporarily deployed to nemo-ci November 21, 2025 06:47 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci November 21, 2025 06:48 — with GitHub Actions Inactive

RayenTian had a problem deploying to nemo-ci November 21, 2025 08:27 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: refactor env and data processor & add nemotron super 49b recipes #1506

refactor: refactor env and data processor & add nemotron super 49b recipes #1506

Uh oh!

yuki-97 commented Nov 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

refactor: refactor env and data processor & add nemotron super 49b recipes #1506

Are you sure you want to change the base?

refactor: refactor env and data processor & add nemotron super 49b recipes #1506

Uh oh!

Conversation

yuki-97 commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 20, 2025

⚠️ File Consistency Check

⚠️ Parallel Plans Synchronization Warning

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Nov 21, 2025

⚠️ File Consistency Check

⚠️ Parallel Plans Synchronization Warning

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Nov 21, 2025

⚠️ File Consistency Check

⚠️ Parallel Plans Synchronization Warning

✅ DTensor Policy Worker Synchronization Check

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yuki-97 commented Nov 11, 2025 •

edited

Loading