Commit 4388d0f
authored
Recipe linting, cleaning up dataloader checkpointing (#1245)
Update some comments and remove some unnecessary lines in training
recipes.
Simplifies dataloader checkpointing at the expense of making it less
verbose, we'll see what we need to support here in the future and can
always revert some of this if needed.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Training preserves model heads so contact outputs remain available
during distributed runs.
* **Improved Checkpointing / Resume**
* Dataloader state handling moved into centralized checkpoint helpers;
resume restores dataloader state and advances past the last completed
step.
* **Observability**
* Per-step performance logging added immediately after optimizer
updates.
* **Breaking Changes**
* Checkpoint return types and some checkpoint helper signatures changed
— callers must adapt.
* **Tests / Style**
* Tests updated for checkpoint/dataloader changes; lint rules relaxed
for test files.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Peter St. John <[email protected]>1 parent a8ccbad commit 4388d0f
File tree
8 files changed
+162
-332
lines changed- bionemo-recipes/recipes
- esm2_native_te
- tests
8 files changed
+162
-332
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
55 | | - | |
| 55 | + | |
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| |||
0 commit comments