Skip to content

Conversation

@jstjohn
Copy link
Collaborator

@jstjohn jstjohn commented Jan 21, 2026

Description

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run.

  • ciflow:skip - Skip all CI tests for this PR
  • ciflow:notebooks - Run Jupyter notebooks execution tests for bionemo2
  • ciflow:slow - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2
  • ciflow:all - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2.
  • ciflow:all-recipes - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes.

Unit tests marked as @pytest.mark.multi_gpu or @pytest.mark.distributed are not run in the PR pipeline.

For more details, see CONTRIBUTING

Note

By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

  • If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
    automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
  • If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
    /ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Triggering Code Rabbit AI Review

To trigger a code review from code rabbit, comment on a pull request with one of these commands:

See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

Summary by CodeRabbit

Release Notes

  • New Features

    • Added no_weight_decay_embeddings configuration parameter for Evo2 training recipes to control embedding weight decay behavior.
  • Chores

    • Updated Megatron-related dependency versions.
  • Tests

    • Improved test fixture scoping for better test isolation.

✏️ Tip: You can customize this high-level summary in your review settings.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 21, 2026

📝 Walkthrough

Walkthrough

The PR updates Megatron dependency versions and refactors weight-decay embedding handling in Evo2 by introducing a dedicated HyenaOptimizerConfigOverrideProvider class to manage optimizer parameter-specific overrides, replacing the previous scheduler-based configuration approach with a recipe-integrated solution.

Changes

Cohort / File(s) Summary
Dependency Updates
bionemo-recipes/recipes/evo2_megatron/pyproject.toml
Updated Git revisions for megatron-bridge and megatron-core from 18ef1b61309dd45bc0535fb7c60064b9d8829a35 to c415f4616340a0431c0eae776d0482ab5cc3770e in UV sources map.
Optimizer Provider Implementation
bionemo-recipes/recipes/evo2_megatron/src/bionemo/evo2/models/evo2_provider.py
Added HyenaOptimizerConfigOverrideProvider class implementing build_config_overrides() to emit weight-decay and optional decoupled LR overrides for 1D parameters. Updated imports to support optimizer configuration framework.
Recipe Configuration Integration
bionemo-recipes/recipes/evo2_megatron/src/bionemo/evo2/recipes/evo2.py
Added no_weight_decay_embeddings parameter to Evo2CommonKwargs and _evo2_common(); instantiated HyenaOptimizerConfigOverrideProvider in ConfigContainer; removed scheduler-based weight-decay configuration; imported new provider class.
Training CLI Updates
bionemo-recipes/recipes/evo2_megatron/src/bionemo/evo2/run/train.py
Removed hyena_no_weight_decay_cond_with_embeddings import; propagated no_weight_decay_embeddings via recipe kwargs instead of runtime scheduler modification.
Test Fixture Scope Changes
bionemo-recipes/recipes/evo2_megatron/tests/bionemo/evo2/run/test_train.py
Transitioned mbridge_checkpoint_7b_1m and base_checkpoint fixtures from session-scoped to module-scoped; updated temporary directory naming and docstrings accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Poem

🐰 A bundled approach, so clean and so tight,
Where optimizers dance with overrides right,
No more scheduler tricks in the shadows to hide,
Just config providers with structured pride! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description covers the key refactoring objectives and dependencies but has an incomplete usage section marked as TODO. Complete the Usage section with a code snippet demonstrating the new weight decay behavior, or clarify why this change does not require user-facing code examples.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main refactoring effort: updating weight decay skipping for Hyena to align with Megatron-Bridge's current approach.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch jstjohn/update_evo2_mbridge_optimizer_override

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jstjohn
Copy link
Collaborator Author

jstjohn commented Jan 21, 2026

Depends on NVIDIA-NeMo/Megatron-Bridge#2010

@jstjohn
Copy link
Collaborator Author

jstjohn commented Jan 21, 2026

/ok to test dcadb56

@jstjohn
Copy link
Collaborator Author

jstjohn commented Jan 21, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@jstjohn
Copy link
Collaborator Author

jstjohn commented Jan 21, 2026

/ok to test e71d3a3

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@bionemo-recipes/recipes/evo2_megatron/src/bionemo/evo2/recipes/evo2.py`:
- Around line 246-248: The HyenaOptimizerConfigOverrideProvider is hardcoding
no_weight_decay_embeddings=True which ignores the recipe default and CLI flag;
change the call to forward the actual configured value instead of True (e.g.,
no_weight_decay_embeddings=no_weight_decay_embeddings or
no_weight_decay_embeddings=self.config.optimizer.no_weight_decay_embeddings /
the passed-in optimizer flag) so the provider respects the recipe/CLI setting;
update the invocation of HyenaOptimizerConfigOverrideProvider to use the
existing config/parameter name rather than a literal True.

In `@bionemo-recipes/recipes/evo2_megatron/tests/bionemo/evo2/run/test_train.py`:
- Around line 535-539: The docstring for the fixture mbridge_checkpoint_7b_1m
still says "Session-scoped" even though the fixture scope is module; update the
docstring to say "Module-scoped" (or remove the scope mention) and adjust the
following sentence if needed to reflect that the fixture exists for the duration
of the module's tests so it no longer implies session lifetime.

@jstjohn
Copy link
Collaborator Author

jstjohn commented Jan 21, 2026

/ok to test 01f238c

@jstjohn
Copy link
Collaborator Author

jstjohn commented Jan 22, 2026

/ok to test 1f7235d

@jstjohn jstjohn force-pushed the jstjohn/update_evo2_mbridge_optimizer_override branch from a813221 to ca74022 Compare February 2, 2026 17:18
@jstjohn
Copy link
Collaborator Author

jstjohn commented Feb 2, 2026

/ok to test ca74022

@jstjohn jstjohn force-pushed the jstjohn/update_evo2_mbridge_optimizer_override branch from 6fe2513 to 4066f50 Compare February 2, 2026 22:49
@jstjohn
Copy link
Collaborator Author

jstjohn commented Feb 2, 2026

/ok to test 4066f50

@jstjohn jstjohn enabled auto-merge February 3, 2026 00:37
@jstjohn jstjohn added this pull request to the merge queue Feb 3, 2026
Merged via the queue into main with commit 5778321 Feb 3, 2026
19 checks passed
@jstjohn jstjohn deleted the jstjohn/update_evo2_mbridge_optimizer_override branch February 3, 2026 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants