-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modular Co-Design Interpolants #554
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nvdreidenbach
requested review from
jstjohn,
malcolmgreaves,
skothenhill-nv,
dorotat-nv,
pstjohn,
trvachov and
ohadmo
as code owners
December 24, 2024 16:38
nvdreidenbach
force-pushed
the
moco
branch
2 times, most recently
from
December 24, 2024 18:29
70fc9ce
to
0af104d
Compare
nvdreidenbach
force-pushed
the
moco
branch
3 times, most recently
from
December 24, 2024 20:30
8c29d92
to
146985a
Compare
/build-ci |
jstjohn
reviewed
Dec 24, 2024
/build-ci |
jstjohn
reviewed
Jan 2, 2025
nvdreidenbach
requested review from
edawson,
cspades and
farhadrgh
as code owners
January 2, 2025 23:10
nvdreidenbach
force-pushed
the
moco
branch
2 times, most recently
from
January 2, 2025 23:20
83c26f4
to
970653a
Compare
/build-ci |
Signed-off-by: Danny <[email protected]>
Signed-off-by: Danny <[email protected]>
Signed-off-by: Danny <[email protected]>
Signed-off-by: Danny <[email protected]>
When NvFaidx was used on Fasta files containing duplicate sequence ids, which violates the FASTA spec, it would silently fail and use the last-seen sequence as an entry. This PR fails by default and exposes a parameter to ignore sequence_ids and use integer indexing instead. Signed-off-by: Danny <[email protected]>
Update DDP config to speed up ESM-2 15B pretraining Turn off `grad_reduce_in_fp32` in mixed precision plugin (default is True) to reduce memory consumption and `overlap_grad_reduce, and `average_in_collective` to improve performance. Pause `overlap_param_gather=True` to wait for NeMo's fix. Signed-off-by: Danny <[email protected]>
Pins mistune to fix a jupyter notebook build issue introduced in 3.1.0 lepture/mistune#403 Bypassing review rules to fix CI due to holiday OOO Signed-off-by: Danny <[email protected]>
Signed-off-by: Danny <[email protected]>
Bumps [3rdparty/Megatron-LM](https://github.com/NVIDIA/Megatron-LM) from `99f23d2` to `2da43ef`. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/2da43ef4c1b9e76f03b7567360cf7390e877f1b6"><code>2da43ef</code></a> Merge branch 'mmodal_eval_in_folder' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/e51a3ac1dcd366f51bcb0339ecca31790c3cfcd1"><code>e51a3ac</code></a> ADLR/megatron-lm!2491 - Move mmodal evaluation code to its own folder</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/d3c585e90ebd5937243c8d4c9d5d5cf9d61665d6"><code>d3c585e</code></a> Merge branch 'jbarker/pp_unfreeze' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/1468ab01c079d5e14888dda97d1c99d2cb62afb2"><code>1468ab0</code></a> ADLR/megatron-lm!2285 - Support --freeze-LM and --freeze-ViT with ranks that ...</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/cf25d44037af4e9d5ea723918823de9b2416a30c"><code>cf25d44</code></a> Merge branch 'boxin/nvlm_ckpt_release' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/1da9dad62b97917caacb1fd271abaed403581caa"><code>1da9dad</code></a> ADLR/megatron-lm!2494 - Add model checkpoint links</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/25b1f33035ad55eeae6b9a4367f987f1fac804dd"><code>25b1f33</code></a> Merge branch 'helenn-rope-fusion-mem-layout' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/7bb53792831d80007789ff5c60bc1798cbd34548"><code>7bb5379</code></a> ADLR/megatron-lm!2469 - Correct strides for bshd layout and revert RoPE tests...</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/b8420a1909980aa3b6750f75b2d7ab8b23338948"><code>b8420a1</code></a> Merge branch 'group_topk' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/d0df563d8739e4dfe2b0e90ba190ac389f165157"><code>d0df563</code></a> ADLR/megatron-lm!1934 - Support Device-Limited Routing and Sequence Auxiliary...</li> <li>Additional commits viewable in <a href="https://github.com/NVIDIA/Megatron-LM/compare/99f23d2f111d12b73b1fbf386c60517101ff8abe...2da43ef4c1b9e76f03b7567360cf7390e877f1b6">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Danny <[email protected]>
Signed-off-by: Danny <[email protected]>
Signed-off-by: Danny <[email protected]>
Signed-off-by: Danny <[email protected]>
Signed-off-by: Danny <[email protected]>
Bumps [3rdparty/Megatron-LM](https://github.com/NVIDIA/Megatron-LM) from `99f23d2` to `2da43ef`. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/2da43ef4c1b9e76f03b7567360cf7390e877f1b6"><code>2da43ef</code></a> Merge branch 'mmodal_eval_in_folder' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/e51a3ac1dcd366f51bcb0339ecca31790c3cfcd1"><code>e51a3ac</code></a> ADLR/megatron-lm!2491 - Move mmodal evaluation code to its own folder</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/d3c585e90ebd5937243c8d4c9d5d5cf9d61665d6"><code>d3c585e</code></a> Merge branch 'jbarker/pp_unfreeze' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/1468ab01c079d5e14888dda97d1c99d2cb62afb2"><code>1468ab0</code></a> ADLR/megatron-lm!2285 - Support --freeze-LM and --freeze-ViT with ranks that ...</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/cf25d44037af4e9d5ea723918823de9b2416a30c"><code>cf25d44</code></a> Merge branch 'boxin/nvlm_ckpt_release' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/1da9dad62b97917caacb1fd271abaed403581caa"><code>1da9dad</code></a> ADLR/megatron-lm!2494 - Add model checkpoint links</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/25b1f33035ad55eeae6b9a4367f987f1fac804dd"><code>25b1f33</code></a> Merge branch 'helenn-rope-fusion-mem-layout' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/7bb53792831d80007789ff5c60bc1798cbd34548"><code>7bb5379</code></a> ADLR/megatron-lm!2469 - Correct strides for bshd layout and revert RoPE tests...</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/b8420a1909980aa3b6750f75b2d7ab8b23338948"><code>b8420a1</code></a> Merge branch 'group_topk' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/d0df563d8739e4dfe2b0e90ba190ac389f165157"><code>d0df563</code></a> ADLR/megatron-lm!1934 - Support Device-Limited Routing and Sequence Auxiliary...</li> <li>Additional commits viewable in <a href="https://github.com/NVIDIA/Megatron-LM/compare/99f23d2f111d12b73b1fbf386c60517101ff8abe...2da43ef4c1b9e76f03b7567360cf7390e877f1b6">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Danny <[email protected]>
Signed-off-by: Danny <[email protected]>
Attempts to update the base image to the most recent 24.12 release Signed-off-by: Danny <[email protected]>
## Summary Un-xfails a geneformer H100 test. ## Details After base image upgrade to pytorch fw 24.12 (NVIDIA#553) , H100 geneformer issue is fixed. ## Usage and Testing ```python pytest ./sub-packages/bionemo-geneformer/tests/bionemo/geneformer/test_model.py::test_geneformer_nemo1_v_nemo2_inference_golden_values ``` Signed-off-by: Danny <[email protected]>
The new ubuntu base container contains a couple of changes that breaks the (untested in CI) base container: 1. it now has a default 1000:1000 `ubuntu` user we can use, instead of creating a new bionemo user. 2. it uses python 3.12, which changes some of our copy paths. --------- Signed-off-by: Peter St. John <[email protected]> Signed-off-by: Danny <[email protected]>
Signed-off-by: Danny <[email protected]>
Bumps [3rdparty/Megatron-LM](https://github.com/NVIDIA/Megatron-LM) from `99f23d2` to `2da43ef`. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/2da43ef4c1b9e76f03b7567360cf7390e877f1b6"><code>2da43ef</code></a> Merge branch 'mmodal_eval_in_folder' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/e51a3ac1dcd366f51bcb0339ecca31790c3cfcd1"><code>e51a3ac</code></a> ADLR/megatron-lm!2491 - Move mmodal evaluation code to its own folder</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/d3c585e90ebd5937243c8d4c9d5d5cf9d61665d6"><code>d3c585e</code></a> Merge branch 'jbarker/pp_unfreeze' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/1468ab01c079d5e14888dda97d1c99d2cb62afb2"><code>1468ab0</code></a> ADLR/megatron-lm!2285 - Support --freeze-LM and --freeze-ViT with ranks that ...</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/cf25d44037af4e9d5ea723918823de9b2416a30c"><code>cf25d44</code></a> Merge branch 'boxin/nvlm_ckpt_release' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/1da9dad62b97917caacb1fd271abaed403581caa"><code>1da9dad</code></a> ADLR/megatron-lm!2494 - Add model checkpoint links</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/25b1f33035ad55eeae6b9a4367f987f1fac804dd"><code>25b1f33</code></a> Merge branch 'helenn-rope-fusion-mem-layout' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/7bb53792831d80007789ff5c60bc1798cbd34548"><code>7bb5379</code></a> ADLR/megatron-lm!2469 - Correct strides for bshd layout and revert RoPE tests...</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/b8420a1909980aa3b6750f75b2d7ab8b23338948"><code>b8420a1</code></a> Merge branch 'group_topk' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/d0df563d8739e4dfe2b0e90ba190ac389f165157"><code>d0df563</code></a> ADLR/megatron-lm!1934 - Support Device-Limited Routing and Sequence Auxiliary...</li> <li>Additional commits viewable in <a href="https://github.com/NVIDIA/Megatron-LM/compare/99f23d2f111d12b73b1fbf386c60517101ff8abe...2da43ef4c1b9e76f03b7567360cf7390e877f1b6">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Danny <[email protected]>
Signed-off-by: Danny <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Release of v1.0 of BioNeMo Modular Co-Design (MoCo)
Introduces modular interpolants for various popular generative model frameworks including continuous and discrete diffusion and flow matching.
Summary
Introduces MoCo.
Details
See documentation.md for details.
Usage
pip install bionemo-moco
see examples directory for notebook tutorials
Testing
Unit tests for all key functions.
Tests for these changes can be run via: