Release v0.14.0 · mosaicml/llm-foundry

New Features

Load Checkpoint Callback (#1570)

We added support for Composer's LoadCheckpoint callback, which loads a checkpoint at a specified event. This enables use cases like loading model base weights with peft.

callbacks:
    load_checkpoint:
        load_path: /path/to/your/weights

Breaking Changes

Accumulate over tokens in a Batch for Training Loss (#1618,#1610,#1595)

We added a new flag accumulate_train_batch_on_tokens which specifies whether training loss is accumulated over the number of tokens in a batch, rather than the number of samples. It is true by default. This will slightly change loss curves for models trained with padding. The old behavior can be recovered by simply setting this to False explicitly.

Default Run Name (#1611)

If no run name is provided, we now will default to using composer's randomly generated run names. (Previously, we defaulted to using "llm" for the run name.)

What's Changed

Update mcli examples to use 0.13.0 by @irenedea in #1594
Pass accumulate_train_batch_on_tokens through to composer by @dakinggg in #1595
Loosen MegaBlocks version pin by @mvpatel2000 in #1597
Add configurability for hf checkpointer register timeout by @dakinggg in #1599
Loosen MegaBlocks to <1.0 by @mvpatel2000 in #1598
Finetuning dataloader validation tweaks by @mvpatel2000 in #1600
Bump onnx from 1.16.2 to 1.17.0 by @dependabot in #1604
Remove TE from dockerfile and instead add as optional dependency by @snarayan21 in #1605
Data prep on multiple GPUs by @eitanturok in #1576
Add env var for configuring the maximum number of processes to use for dataset processing by @irenedea in #1606
Updated error message for cluster check by @nancyhung in #1602
Use fun default composer run names by @irenedea in #1611
Ensure log messages are properly formatted again by @snarayan21 in #1614
Add UC not enabled error for delta to json conversion by @irenedea in #1613
Use a temporary directory for downloading finetuning dataset files by @irenedea in #1608
Bump composer version to 0.26.0 by @irenedea in #1616
Add loss generating token counts by @dakinggg in #1610
Change accumulate_train_batch_on_tokens default to True by @dakinggg in #1618
Bump version to 0.15.0.dev0 by @irenedea in #1621
Add load checkpoint callback by @irenedea in #1570

Full Changelog: v0.13.0...v0.14.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.14.0