-
Notifications
You must be signed in to change notification settings - Fork 167
feat: DTensorPolicyV2 GPT-OSS support #1470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: Hemil Desai <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: e936ebf (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
Signed-off-by: adil-a <[email protected]>
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 7df0cc5 (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
Signed-off-by: adil-a <[email protected]>
❌ Submodule Fast-Forward Check FailedCheck based on commit: b4139f1 (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
Signed-off-by: adil-a <[email protected]>
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: a55a2f1 (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
📝 WalkthroughWalkthroughThe pull request updates the Automodel submodule pointer, integrates Automodel Checkpointer into DTensorPolicyWorkerV2 for improved checkpoint handling, adds a comprehensive SFT training configuration example, removes the legacy automodel_checkpoint utility module, updates package dependencies (grouped_gemm, transformer-engine, deep_ep), and enhances venv setup with CUDA architecture configuration. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DTensorPolicyWorkerV2
participant Checkpointer
participant FSDP2Manager
participant HFModel
participant Cache
User->>DTensorPolicyWorkerV2: load_checkpoint(path, ...)
DTensorPolicyWorkerV2->>DTensorPolicyWorkerV2: detect_checkpoint_format(path)
DTensorPolicyWorkerV2->>DTensorPolicyWorkerV2: _infer_checkpoint_root(path)
DTensorPolicyWorkerV2->>HFModel: initialize empty model
DTensorPolicyWorkerV2->>FSDP2Manager: parallelize model (FSDP2)
DTensorPolicyWorkerV2->>DTensorPolicyWorkerV2: _ensure_checkpointer(config)
DTensorPolicyWorkerV2->>Checkpointer: load via Automodel API
Checkpointer->>Cache: fetch or download weights
Checkpointer->>HFModel: apply loaded state dict
HFModel-->>DTensorPolicyWorkerV2: model ready
DTensorPolicyWorkerV2-->>User: checkpoint loaded
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
3rdparty/Automodel-workspace/Automodel(1 hunks)examples/configs/sft_automodel.yaml(1 hunks)nemo_rl/models/policy/dtensor_policy_worker_v2.py(20 hunks)nemo_rl/models/policy/utils.py(1 hunks)nemo_rl/utils/automodel_checkpoint.py(0 hunks)nemo_rl/utils/venvs.py(2 hunks)pyproject.toml(4 hunks)
💤 Files with no reviewable changes (1)
- nemo_rl/utils/automodel_checkpoint.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Files:
nemo_rl/models/policy/utils.pynemo_rl/models/policy/dtensor_policy_worker_v2.pynemo_rl/utils/venvs.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)
Files:
nemo_rl/models/policy/utils.pynemo_rl/models/policy/dtensor_policy_worker_v2.pynemo_rl/utils/venvs.py
examples/configs/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/.yaml
Files:
examples/configs/sft_automodel.yaml
🧠 Learnings (4)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Applied to files:
nemo_rl/models/policy/utils.py
📚 Learning: 2025-09-17T01:52:21.399Z
Learnt from: ffrujeri
Repo: NVIDIA-NeMo/RL PR: 1023
File: nemo_rl/utils/checkpoint.py:58-65
Timestamp: 2025-09-17T01:52:21.399Z
Learning: model_state_dict_keys is not intended to be part of the nemo-rl CheckpointingConfig TypedDict - it's handled at the automodel implementation layer, not as a general checkpointing configuration parameter.
Applied to files:
nemo_rl/models/policy/dtensor_policy_worker_v2.py
📚 Learning: 2025-10-30T20:50:44.126Z
Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.
Applied to files:
nemo_rl/models/policy/dtensor_policy_worker_v2.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to **/*.sh : Use `uv run` to execute Python scripts in shell/driver scripts instead of activating virtualenvs and calling `python` directly
Applied to files:
nemo_rl/utils/venvs.py
🧬 Code graph analysis (1)
nemo_rl/models/policy/dtensor_policy_worker_v2.py (3)
nemo_rl/utils/checkpoint.py (1)
CheckpointingConfig(36-67)nemo_rl/models/policy/dtensor_policy_worker.py (3)
create_context_parallel_ctx(456-480)get_cpu_state_dict(104-134)load_checkpoint(1920-1930)nemo_rl/models/dtensor/parallelize.py (1)
to_local_if_dtensor(709-715)
🪛 GitHub Actions: Automodel Integration and Submodule Checks
3rdparty/Automodel-workspace/Automodel
[error] 1-1: One or more submodules are not fast-forwarded. Automodel: Commits have DIVERGED from a common ancestor. Please ensure submodule commits are fast-forwards of the main branch.
🪛 Ruff (0.14.3)
nemo_rl/models/policy/dtensor_policy_worker_v2.py
2105-2105: Loop control variable root not used within loop body
Rename unused root to _root
(B007)
2105-2105: Loop control variable dirs not used within loop body
Rename unused dirs to _dirs
(B007)
🔇 Additional comments (1)
nemo_rl/models/policy/utils.py (1)
32-32: Verify the new import path is valid in the updated nemo_automodel submodule.The import change appears scoped and intentional—only
_transformersis being reorganized out of.components, while other submodules remain unchanged. The try-except block provides appropriate fallback handling, and the imported classes are actively used inAUTOMODEL_FACTORYandresolve_model_class().However, I could not locate test files confirming the new import path works. Please verify:
- The new path (
nemo_automodel._transformers.auto_model) exists and is importable in the updated nemo_automodel submodule version- Tests pass with this change, or if untested, manually confirm the import succeeds at runtime
NEMO_AUTOMODEL_AVAILABLEresolves toTruein your environment after this change
| @@ -1 +1 @@ | |||
| Subproject commit a2db048383cd54b3fafc928df4c30bf7bbf7c430 | |||
| Subproject commit 5e995e9535e63cbe3358dc2bd81a8ed3a696cee7 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Critical: Submodule commit divergence blocks integration.
The pipeline reports that the Automodel submodule commits have diverged from a common ancestor and are not fast-forwarded. This is a blocker that must be resolved before merging.
Submodule divergence can:
- Break reproducibility and introduce non-deterministic behavior
- Cause merge conflicts and complicate future submodule updates
- Obscure commit history and make debugging harder
The AI summary indicates significant integration work (Automodel Checkpointer integration in DTensorPolicyWorkerV2, removal of legacy automodel_checkpoint utility). This context makes it critical to understand what changes in the Automodel submodule underpin these refactors and ensure the submodule update is intentional and correct.
Please clarify and resolve the divergence:
- Understand the divergence: Run the verification script below to inspect the commit history and identify which commits diverged and why.
- Align the branch: Consider whether the divergence should be resolved by:
- Rebasing the Automodel branch onto main to make it a proper fast-forward
- Explicitly merging main into the Automodel branch if concurrent work requires it
- Selecting specific commits via cherry-pick if only certain commits are needed (with clear justification in the PR description)
- Document the strategy: Update the PR description to explain why this submodule update was chosen and how it relates to the DTensorPolicyWorkerV2 and Checkpointer integration work.
🏁 Script executed:
#!/bin/bash
# Description: Inspect the Automodel submodule divergence to understand the commit history.
echo "=== Current submodule pointer ==="
cat 3rdparty/Automodel-workspace/Automodel
echo -e "\n=== Submodule remote info ==="
cd 3rdparty/Automodel-workspace/Automodel
git log --oneline -20
git log --oneline --all --graph | head -30
cd - > /dev/null
echo -e "\n=== Check if current commit is reachable from main ==="
cd 3rdparty/Automodel-workspace/Automodel
if git merge-base --is-ancestor 5e995e9535e63cbe3358dc2bd81a8ed3a696cee7 origin/main; then
echo "Current commit IS reachable from origin/main (fast-forward possible)"
else
echo "Current commit is NOT reachable from origin/main (divergence exists)"
echo "Commits in current branch not in main:"
git log origin/main..5e995e9535e63cbe3358dc2bd81a8ed3a696cee7 --oneline
echo -e "\nCommits in main not in current branch:"
git log 5e995e9535e63cbe3358dc2bd81a8ed3a696cee7..origin/main --oneline | head -20
fiLength of output: 2733
I need to investigate further. The script revealed a critical issue: the target commit doesn't exist in the remote repository. Let me verify the actual state of the submodule.
🏁 Script executed:
#!/bin/bash
# Verify the actual submodule state and check if the target commit exists locally
cd 3rdparty/Automodel-workspace/Automodel
echo "=== Current HEAD ==="
git rev-parse HEAD
git log -1 --oneline
echo -e "\n=== Check if target commit exists in any local ref ==="
if git cat-file -t 5e995e9535e63cbe3358dc2bd81a8ed3a696cee7 2>/dev/null; then
echo "Target commit 5e995e9535e63cbe3358dc2bd81a8ed3a696cee7 EXISTS locally"
git log -1 --oneline 5e995e9535e63cbe3358dc2bd81a8ed3a696cee7
else
echo "Target commit 5e995e9535e63cbe3358dc2bd81a8ed3a696cee7 DOES NOT exist locally"
fi
echo -e "\n=== Check if old commit exists ==="
if git cat-file -t a2db048383cd54b3fafc928df4c30bf7bbf7c430 2>/dev/null; then
echo "Old commit a2db048383cd54b3fafc928df4c30bf7bbf7c430 EXISTS locally"
git log -1 --oneline a2db048383cd54b3fafc928df4c30bf7bbf7c430
else
echo "Old commit a2db048383cd54b3fafc928df4c30bf7bbf7c430 DOES NOT exist locally"
fi
echo -e "\n=== List all local refs for both commits ==="
git show-ref | grep -E "5e995e9535e63cbe3358dc2bd81a8ed3a696cee7|a2db048383cd54b3fafc928df4c30bf7bbf7c430" || echo "No matching refs found"
echo -e "\n=== Check .git/config for remote URL ==="
git config --get remote.origin.urlLength of output: 471
Critical: Submodule pointer references non-existent commit—PR is un-mergeable.
The target commit 5e995e9535e63cbe3358dc2bd81a8ed3a696cee7 does not exist in the Automodel repository. This is not a divergence issue—the commit is simply missing and cannot be resolved.
Current state:
- Submodule HEAD:
7df0cc5("uv lock") - PR target:
5e995e9535e63cbe3358dc2bd81a8ed3a696cee7(does not exist) - Remote:
https://github.com/NVIDIA-NeMo/RL.git(correct)
This commit pointer must be corrected or regenerated. Options:
- Update the pointer to an existing commit in the Automodel repository (e.g.,
7df0cc5if that's the intended state, or another valid commit) - Push the intended commit to the Automodel remote and then reference it
- Force-push to update the submodule to the correct branch/commit
Without a valid commit pointer, checkout and CI will fail.
🧰 Tools
🪛 GitHub Actions: Automodel Integration and Submodule Checks
[error] 1-1: One or more submodules are not fast-forwarded. Automodel: Commits have DIVERGED from a common ancestor. Please ensure submodule commits are fast-forwards of the main branch.
🤖 Prompt for AI Agents
In 3rdparty/Automodel-workspace/Automodel around lines 1 to 1, the submodule
HEAD points to a non-existent commit (5e995e9535e63cbe3358dc2bd81a8ed3a696cee7)
which makes the PR unmergeable; fix it by updating the submodule ref to a valid
commit (for example change to the existing 7df0cc5 or another confirmed commit
SHA), or alternatively push the missing commit to the Automodel remote and
update the submodule to reference that new commit, then commit the updated
.gitmodules/.git link and ensure CI/checkout succeeds.
| self.optimizer = optimizer_cls( | ||
| self.model.parameters(), **self.cfg["optimizer"]["kwargs"] | ||
| self.model.parameters(), | ||
| **self.cfg["optimizer"]["kwargs"], | ||
| exp_avg_dtype=torch.bfloat16, | ||
| exp_avg_sq_dtype=torch.bfloat16, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t inject Transformer-Engine-only kwargs into arbitrary optimizers.
Lines 423-428 always pass exp_avg_dtype/exp_avg_sq_dtype. That works for TE’s FusedAdam, but any existing config that still points to torch.optim.AdamW (or similar) will now throw TypeError: __init__() got an unexpected keyword argument 'exp_avg_dtype'. Please gate those kwargs behind a TE-specific check so the generic optimizer path keeps working.
@@
- self.optimizer = optimizer_cls(
- self.model.parameters(),
- **self.cfg["optimizer"]["kwargs"],
- exp_avg_dtype=torch.bfloat16,
- exp_avg_sq_dtype=torch.bfloat16,
- )
+ optimizer_kwargs = dict(self.cfg["optimizer"]["kwargs"])
+ if optimizer_cls.__module__.startswith(
+ "transformer_engine.pytorch.optimizers"
+ ):
+ optimizer_kwargs.setdefault("exp_avg_dtype", torch.bfloat16)
+ optimizer_kwargs.setdefault("exp_avg_sq_dtype", torch.bfloat16)
+
+ self.optimizer = optimizer_cls(
+ self.model.parameters(),
+ **optimizer_kwargs,
+ )🤖 Prompt for AI Agents
In nemo_rl/models/policy/dtensor_policy_worker_v2.py around lines 423-428, avoid
passing Transformer-Engine-specific keyword args to arbitrary optimizers; detect
whether optimizer_cls is the TE fused optimizer (e.g., by checking the
class/module name or a TE-specific attribute) and only include exp_avg_dtype and
exp_avg_sq_dtype when that check is true; for non-TE optimizers call
optimizer_cls(self.model.parameters(), **self.cfg["optimizer"]["kwargs"]) so
existing torch.optim.* classes don’t receive unexpected kwargs.
| if "TORCH_CUDA_ARCH_LIST" not in env: | ||
| env["TORCH_CUDA_ARCH_LIST"] = "9.0 10.0 12.0" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid Hopper-only default for TORCH_CUDA_ARCH_LIST.
Line 92 forces TORCH_CUDA_ARCH_LIST to "9.0 10.0 12.0" whenever the variable is unset. That drops support for Ampere (sm80/sm86), which is still a common deployment target; any native extensions built in this venv (grouped_gemm, DeepEP, etc.) will then miss sm80 kernels and crash at runtime with “no kernel image is available for execution on the device.” Please include Ampere in the default (or derive the list from an override/env hint) so older clusters don’t regress.
@@
- if "TORCH_CUDA_ARCH_LIST" not in env:
- env["TORCH_CUDA_ARCH_LIST"] = "9.0 10.0 12.0"
+ if "TORCH_CUDA_ARCH_LIST" not in env:
+ env[
+ "TORCH_CUDA_ARCH_LIST"
+ ] = os.environ.get(
+ "NEMO_RL_TORCH_CUDA_ARCH_LIST",
+ "8.0 8.6 9.0 10.0",
+ )🤖 Prompt for AI Agents
In nemo_rl/utils/venvs.py around lines 92 to 94, the default
TORCH_CUDA_ARCH_LIST currently sets only Hopper archs ("9.0 10.0 12.0"), which
omits Ampere (sm80/sm86) and causes missing-kernel runtime crashes; update the
logic so the default includes Ampere (e.g., include sm80 and/or sm86) or derive
the list from an env override/hint (check CUDA_VISIBLE_DEVICES or a provided
override var) and set TORCH_CUDA_ARCH_LIST to a broader set (or the override)
instead of the Hopper-only value.
Signed-off-by: adil-a <[email protected]>
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 39bd74c (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
Signed-off-by: adil-a <[email protected]>
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 4e151cb (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
|
|
||
| with get_train_context(False, False, context_parallel_ctx)(): | ||
| with torch.autocast(device_type="cuda", dtype=self.dtype): | ||
| with nullcontext(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @terrykong @hemildesai is this going to be a problem? Do we revert back to autocast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make it configurable and keep the default to be autocast for backward compatibility.
| name = "deep_ep" | ||
| # This version has to match the version in the commit/rev/tag used | ||
| version = "v1.1.0+e3908bf" | ||
| version = "v1.2.1+bfded34" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @terrykong @yaoyu-33 can you comment on whether this deepep version bump is ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we try with the old version with the TORCH_CUDA_ARCH_LIST change?
Signed-off-by: adil-a <[email protected]>
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 4b6ce6d (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
Signed-off-by: adil-a <[email protected]>
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: ef2f92c (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
Signed-off-by: adil-a <[email protected]>
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 1eef903 (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
Signed-off-by: Adil Asif <[email protected]>
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 24214e9 (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
|
@adil-a what's the current status of this PR? |
What does this PR do ?
Adds GPT-OSS SFT using AutoModel custom models + DeepEP.
To run, launch the nightly container and run