New Features
- Small Molecule Featurization
- Implemented elementary and advanced atom, bond, and full molecule featurizers.
- GH200 Support for BioNeMo
- Added a
Dockerfile.arm
that builds a BioNeMo container that runs on GH200 machines. - Publish a version of the BioNeMo container that supports multiple architectures to NGC.
- Added a
Updates & Improvements
- Single-Cell Dataloader (SCDL)
- Changed metadata storage to
parquet
files, which creates a 30x speed up when iterating over a large dataset. - Added functionality to concatenate several
anndata
files without doubling disk memory usage.
- Changed metadata storage to
- ESM2
- Added support for
SIGTERM
preemption checkpoint saving. - Moved ESM-2 and Geneformer training scripts to new executables,
train_esm2
andtrain_geneformer
, respectively. - Moved inference script to a new executable
infer_esm2
, and deprecated the inference example in the fine-tuning tutorial. - Added new Jupyter notebook tutorials for inference and zero-shot protein design. These notebooks can be deployed on the cloud resources as a brev.dev launchable.
- Added support for
Known Issues
- Loading a checkpoint for Geneformer inference on H100 has a known regression in accuracy. Work is in progress to resolve by next release.
Changes
- Move ESM2 scripts to sub-packages by @farhadrgh in #406
- WAR: sets checkpoint filename to be more unique by @skothenhill-nv in #429
- Update NeMo and Megatron to TOT by @pstjohn in #424
- re-enable merge groups to trigger blossom-ci by @pstjohn in #431
- Revert "re-enable merge groups to trigger blossom-ci" by @pstjohn in #434
- Updated notebook, and nemo2 checkpoint with geneformer by @jstjohn in #430
- add pre-emption callback to esm2 train by @pstjohn in #433
- add rdkit dependency to bionemo-geometric by @sveccham in #432
- eliminate the need for NGC login - bionemo2 by @dorotat-nv in #440
- Add documentation and release info to README by @sirelkhatim in #447
- Bump 3rdparty/Megatron-LM from
aded519
to5438d15
by @dependabot in #444 - Launchable notebooks in docs! by @jstjohn in #451
- Cache dev build from our nightly public container by @jstjohn in #462
- set num_workers to 1 for esm2 tests by @pstjohn in #461
- ESM2 Tutorial Updates by @farhadrgh in #426
- BugFix: fix bugs on bionemo-size-aware-batching by @guoqing-zhou in #449
- Fix typos in geneformer benchmark description by @jstjohn in #470
- Pillow version bump into main by @polinabinder1 in #465
- Refactor SCDL Row Feature Index for Performance Improvement (Rebased) by @savitha-eng in #466
- pin correct tornado requirement by @polinabinder1 in #474
- Updating Brev.Dev documentation by @polinabinder1 in #483
- Add release notes for v2.1 by @tshimko-nv in #468
- Update VERSION by @polinabinder1 in #488
- Atom and bond features by @sveccham in #453
- Molecule featurizer and molecule graph by @sveccham in #484
- hillst/bionemo noodles by @skothenhill-nv in #458
- update collate mask_value by @pstjohn in #485
- override checkpoint precision by @farhadrgh in #475
- JSON -> YAML for CLI by @skothenhill-nv in #436
- [QA Bug] Remove NGC dependency by @farhadrgh in #494
- Bump 3rdparty/NeMo from
e2b0f0e
to06e6703
by @dependabot in #486 - Bump 3rdparty/Megatron-LM from
5438d15
to844119f
by @dependabot in #496 - change source for coverage report by @pstjohn in #495
- Pstjohn/stop and go test non validation by @pstjohn in #476
- Add support on num steps for learning rate scheduler by @sichu2023 in #489
- Initial compatibility testing images by @malcolmgreaves in #438
- Conda-Based Compatibility Test Images by @malcolmgreaves in #507
- Instructions on compatibility image build by @malcolmgreaves in #512
- Formatting by @malcolmgreaves in #513
- Pstjohn/fix ci by @pstjohn in #515
- [FEA][webdatamodule]: support webdataset invocable by @DejunL in #501
- GH200 support by @gagank1 in #369
- Remove quotes for Jupyter command on startup in init guide by @tshimko-nv in #523
- Reduce esm2 and geneformer test burden by @sichu2023 in #499
- [v2.2] Publish release notes for BioNeMo FW v2.2. by @cspades in #522
- Disable validation/test stages in ESM-2 and Geneformer by @sichu2023 in #492
- CI HOTFIX: ignore inrun_pytest.sh a notebook by @dorotat-nv in #526
- added NeMoLogger unit tests by @dorotat-nv in #511
- Bump 3rdparty/Megatron-LM from
844119f
to99f23d2
by @dependabot in #528 - [cye/wandb-fix] Fix WandB issue. by @cspades in #530
- xFail known bad tests on H100 and fix CVEs by @gagank1 in #547
New Contributors
- @sveccham made their first contribution in #432
- @sirelkhatim made their first contribution in #447
Full Changelog: v2.1...v2.2