Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modular Co-Design Interpolants #554

Closed
wants to merge 21 commits into from
Closed

Conversation

nvdreidenbach
Copy link
Collaborator

Release of v1.0 of BioNeMo Modular Co-Design (MoCo)

Introduces modular interpolants for various popular generative model frameworks including continuous and discrete diffusion and flow matching.

Summary

Introduces MoCo.

Details

See documentation.md for details.

Usage

pip install bionemo-moco

from bionemo.moco.interpolants import ContinuousFlowMatcher
from bionemo.moco.distributions.time import UniformTimeDistribution
from bionemo.moco.distributions.prior import GaussianPrior

uniform_time = UniformTimeDistribution()
moon_prior = GaussianPrior()
sigma = 0.1
cfm = ContinuousFlowMatcher(time_distribution=uniform_time, 
                            prior_distribution=moon_prior, 
                            sigma=sigma, 
                            prediction_type="velocity")

see examples directory for notebook tutorials

Testing

Unit tests for all key functions.

Tests for these changes can be run via:

pytest -v tests

@nvdreidenbach nvdreidenbach added the SKIP_CI Completely skips the CI pipeline label Dec 24, 2024
@nvdreidenbach nvdreidenbach force-pushed the moco branch 2 times, most recently from 70fc9ce to 0af104d Compare December 24, 2024 18:29
@nvdreidenbach nvdreidenbach changed the title initial commit Modular Co-Design Interpolants Dec 24, 2024
@nvdreidenbach nvdreidenbach force-pushed the moco branch 3 times, most recently from 8c29d92 to 146985a Compare December 24, 2024 20:30
@jstjohn jstjohn removed the SKIP_CI Completely skips the CI pipeline label Dec 24, 2024
@jstjohn
Copy link
Collaborator

jstjohn commented Dec 24, 2024

/build-ci

@jstjohn
Copy link
Collaborator

jstjohn commented Jan 2, 2025

/build-ci

@nvdreidenbach
Copy link
Collaborator Author

/build-ci

nvdreidenbach and others added 21 commits January 2, 2025 15:55
Signed-off-by: Danny <[email protected]>
Signed-off-by: Danny <[email protected]>
When NvFaidx was used on Fasta files containing duplicate sequence ids,
which violates the FASTA spec, it would silently fail and use the
last-seen sequence as an entry.

This PR fails by default and exposes a parameter to ignore sequence_ids
and use integer indexing instead.

Signed-off-by: Danny <[email protected]>
Update DDP config to speed up ESM-2 15B pretraining

Turn off `grad_reduce_in_fp32` in mixed precision plugin (default is True) to reduce memory consumption and `overlap_grad_reduce, and `average_in_collective` to improve performance.

Pause `overlap_param_gather=True` to wait for NeMo's fix.

Signed-off-by: Danny <[email protected]>
Pins mistune to fix a jupyter notebook build issue introduced in 3.1.0

lepture/mistune#403

Bypassing review rules to fix CI due to holiday OOO

Signed-off-by: Danny <[email protected]>
Bumps [3rdparty/Megatron-LM](https://github.com/NVIDIA/Megatron-LM) from
`99f23d2` to `2da43ef`.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/2da43ef4c1b9e76f03b7567360cf7390e877f1b6"><code>2da43ef</code></a>
Merge branch 'mmodal_eval_in_folder' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/e51a3ac1dcd366f51bcb0339ecca31790c3cfcd1"><code>e51a3ac</code></a>
ADLR/megatron-lm!2491 - Move mmodal evaluation code to its own
folder</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/d3c585e90ebd5937243c8d4c9d5d5cf9d61665d6"><code>d3c585e</code></a>
Merge branch 'jbarker/pp_unfreeze' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/1468ab01c079d5e14888dda97d1c99d2cb62afb2"><code>1468ab0</code></a>
ADLR/megatron-lm!2285 - Support --freeze-LM and --freeze-ViT with ranks
that ...</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/cf25d44037af4e9d5ea723918823de9b2416a30c"><code>cf25d44</code></a>
Merge branch 'boxin/nvlm_ckpt_release' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/1da9dad62b97917caacb1fd271abaed403581caa"><code>1da9dad</code></a>
ADLR/megatron-lm!2494 - Add model checkpoint links</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/25b1f33035ad55eeae6b9a4367f987f1fac804dd"><code>25b1f33</code></a>
Merge branch 'helenn-rope-fusion-mem-layout' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/7bb53792831d80007789ff5c60bc1798cbd34548"><code>7bb5379</code></a>
ADLR/megatron-lm!2469 - Correct strides for bshd layout and revert RoPE
tests...</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/b8420a1909980aa3b6750f75b2d7ab8b23338948"><code>b8420a1</code></a>
Merge branch 'group_topk' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/d0df563d8739e4dfe2b0e90ba190ac389f165157"><code>d0df563</code></a>
ADLR/megatron-lm!1934 - Support Device-Limited Routing and Sequence
Auxiliary...</li>
<li>Additional commits viewable in <a
href="https://github.com/NVIDIA/Megatron-LM/compare/99f23d2f111d12b73b1fbf386c60517101ff8abe...2da43ef4c1b9e76f03b7567360cf7390e877f1b6">compare
view</a></li>
</ul>
</details>
<br />

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Danny <[email protected]>
Signed-off-by: Danny <[email protected]>
Bumps [3rdparty/Megatron-LM](https://github.com/NVIDIA/Megatron-LM) from
`99f23d2` to `2da43ef`.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/2da43ef4c1b9e76f03b7567360cf7390e877f1b6"><code>2da43ef</code></a>
Merge branch 'mmodal_eval_in_folder' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/e51a3ac1dcd366f51bcb0339ecca31790c3cfcd1"><code>e51a3ac</code></a>
ADLR/megatron-lm!2491 - Move mmodal evaluation code to its own
folder</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/d3c585e90ebd5937243c8d4c9d5d5cf9d61665d6"><code>d3c585e</code></a>
Merge branch 'jbarker/pp_unfreeze' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/1468ab01c079d5e14888dda97d1c99d2cb62afb2"><code>1468ab0</code></a>
ADLR/megatron-lm!2285 - Support --freeze-LM and --freeze-ViT with ranks
that ...</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/cf25d44037af4e9d5ea723918823de9b2416a30c"><code>cf25d44</code></a>
Merge branch 'boxin/nvlm_ckpt_release' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/1da9dad62b97917caacb1fd271abaed403581caa"><code>1da9dad</code></a>
ADLR/megatron-lm!2494 - Add model checkpoint links</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/25b1f33035ad55eeae6b9a4367f987f1fac804dd"><code>25b1f33</code></a>
Merge branch 'helenn-rope-fusion-mem-layout' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/7bb53792831d80007789ff5c60bc1798cbd34548"><code>7bb5379</code></a>
ADLR/megatron-lm!2469 - Correct strides for bshd layout and revert RoPE
tests...</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/b8420a1909980aa3b6750f75b2d7ab8b23338948"><code>b8420a1</code></a>
Merge branch 'group_topk' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/d0df563d8739e4dfe2b0e90ba190ac389f165157"><code>d0df563</code></a>
ADLR/megatron-lm!1934 - Support Device-Limited Routing and Sequence
Auxiliary...</li>
<li>Additional commits viewable in <a
href="https://github.com/NVIDIA/Megatron-LM/compare/99f23d2f111d12b73b1fbf386c60517101ff8abe...2da43ef4c1b9e76f03b7567360cf7390e877f1b6">compare
view</a></li>
</ul>
</details>
<br />

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Danny <[email protected]>
Attempts to update the base image to the most recent 24.12 release

Signed-off-by: Danny <[email protected]>
## Summary
Un-xfails a geneformer H100 test.

## Details
After base image upgrade to pytorch fw 24.12
(NVIDIA#553) , H100 geneformer
issue is fixed.

## Usage and Testing
```python
pytest ./sub-packages/bionemo-geneformer/tests/bionemo/geneformer/test_model.py::test_geneformer_nemo1_v_nemo2_inference_golden_values
```

Signed-off-by: Danny <[email protected]>
The new ubuntu base container contains a couple of changes that breaks
the (untested in CI) base container:
1. it now has a default 1000:1000 `ubuntu` user we can use, instead of
creating a new bionemo user.
2. it uses python 3.12, which changes some of our copy paths.

---------

Signed-off-by: Peter St. John <[email protected]>
Signed-off-by: Danny <[email protected]>
Bumps [3rdparty/Megatron-LM](https://github.com/NVIDIA/Megatron-LM) from
`99f23d2` to `2da43ef`.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/2da43ef4c1b9e76f03b7567360cf7390e877f1b6"><code>2da43ef</code></a>
Merge branch 'mmodal_eval_in_folder' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/e51a3ac1dcd366f51bcb0339ecca31790c3cfcd1"><code>e51a3ac</code></a>
ADLR/megatron-lm!2491 - Move mmodal evaluation code to its own
folder</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/d3c585e90ebd5937243c8d4c9d5d5cf9d61665d6"><code>d3c585e</code></a>
Merge branch 'jbarker/pp_unfreeze' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/1468ab01c079d5e14888dda97d1c99d2cb62afb2"><code>1468ab0</code></a>
ADLR/megatron-lm!2285 - Support --freeze-LM and --freeze-ViT with ranks
that ...</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/cf25d44037af4e9d5ea723918823de9b2416a30c"><code>cf25d44</code></a>
Merge branch 'boxin/nvlm_ckpt_release' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/1da9dad62b97917caacb1fd271abaed403581caa"><code>1da9dad</code></a>
ADLR/megatron-lm!2494 - Add model checkpoint links</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/25b1f33035ad55eeae6b9a4367f987f1fac804dd"><code>25b1f33</code></a>
Merge branch 'helenn-rope-fusion-mem-layout' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/7bb53792831d80007789ff5c60bc1798cbd34548"><code>7bb5379</code></a>
ADLR/megatron-lm!2469 - Correct strides for bshd layout and revert RoPE
tests...</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/b8420a1909980aa3b6750f75b2d7ab8b23338948"><code>b8420a1</code></a>
Merge branch 'group_topk' into 'main'</li>
<li><a
href="https://github.com/NVIDIA/Megatron-LM/commit/d0df563d8739e4dfe2b0e90ba190ac389f165157"><code>d0df563</code></a>
ADLR/megatron-lm!1934 - Support Device-Limited Routing and Sequence
Auxiliary...</li>
<li>Additional commits viewable in <a
href="https://github.com/NVIDIA/Megatron-LM/compare/99f23d2f111d12b73b1fbf386c60517101ff8abe...2da43ef4c1b9e76f03b7567360cf7390e877f1b6">compare
view</a></li>
</ul>
</details>
<br />

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Danny <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants