Releases: mosaicml/llm-foundry
v0.16.0
What's New
Streaming 0.11.0 🚀 (#1711)
We've upgraded streaming to 0.11.0. StreamingDataset can now be used with custom Stream implementations via a registry. See the documentation page for example usage.
What's Changed
- Fix llama3 example yamls by @j316chuck in #1688
- Update example yamls to use newest foundry version by @snarayan21 in #1689
- Update datasets requirement from <2.21,>=2.20.0 to >=2.20.0,<3.2 by @dependabot in #1670
- Catch multiple slashes in source dataset into one slash by @KuuCi in #1697
- Make loaded peft adapters optionally trainable by @snarayan21 in #1701
- Adding preprocessors for QA and messages datasets by @ShashankMosaicML in #1700
- Update pycln by @b-chu in #1704
- Add permission error by @b-chu in #1703
- Update datasets requirement from <3.2,>=2.20.0 to >=2.20.0,<3.3 by @dependabot in #1698
- Bump coverage[toml] from 7.6.4 to 7.6.10 by @dependabot in #1702
- Update mosaicml-streaming to 0.11.0 by @es94129 in #1711
- Bump version to 0.17.0.dev0 by @irenedea in #1712
Full Changelog: v0.15.1...v0.16.0
v0.15.1
What's Changed
- Bump version 0.16.0.dev0 by @j316chuck in #1667
- Update mlflow requirement from <2.18,>=2.14.1 to >=2.14.1,<2.19 by @dependabot in #1673
- Speed up embedding tests by @dakinggg in #1668
- Add mcli yaml version bump by @j316chuck in #1674
- Bump Openai version by @snarayan21 in #1684
- Bump Streaming to v0.10.0 by @snarayan21 in #1685
- Bugfix auto packing with streams + no remote path by @mattyding in #1679
- Bump Composer to v0.28.0 by @snarayan21 in #1687
- Expose
DistributedSampler
RNG seed argument by @janEbert in #1677 - Add llama3 ft example yamls by @j316chuck in #1686
New Contributors
Full Changelog: v0.15.0...v0.15.1
v0.15.0
New Features
Open Source Embedding + Contrastive Code (#1615)
LLM foundry now supports finetuning embedding models with contrastive loss. Foundry now supports various approaches to selecting negative passages for contrastive loss which can be either randomly selected or pre-defined. For more information, please view the the readme.
PyTorch 2.5.1 (#1665)
This release updates LLM Foundry to the PyTorch 2.5.1 release, bringing with it support for the new features and optimizations in PyTorch 2.5.1.
Improved error messages (#1657, #1660, #1623, #1625)
Various improved error messages, making debugging user errors more clear.
What's Changed
- Update mcli examples to use 0.14.0 by @irenedea in #1624
- Open Source Embedding + Contrastive Code by @KuuCi in #1615
- Catch delta table not found error by @milocress in #1625
- Add Mlflow 403 PL UserError by @mattyding in #1623
- Catches when data prep cluster fails to start by @milocress in #1628
- Bump mlflow max version by @dakinggg in #1629
- add another cluster connection failure wrapper by @milocress in #1630
- Add MLflow
log_model
option by @nancyhung in #1544 - Move loss generating token counting to the dataloader by @dakinggg in #1632
- Bump databricks-connect from 14.1.0 to 15.4.3 by @dependabot in #1636
- Fix dataset download location by @dakinggg in #1639
- Revert "Bump databricks-connect from 14.1.0 to 15.4.3" by @XiaohanZhangCMU in #1640
- Bump transformers version by @dakinggg in #1631
- Fix gpu tests test_tp_train and test_huggingface_conversion_callback_interval by @irenedea in #1642
- Update datasets requirement from <2.20,>=2.19 to >=2.20.0,<2.21 by @dependabot in #1330
- Add max shard size to transformers save_pretrained by @b-chu in #1648
- Update huggingface-hub requirement from <0.25,>=0.19.0 to >=0.19.0,<0.27 by @dependabot in #1652
- Update accelerate requirement from <0.34,>=0.25 to >=0.25,<1.2 by @dependabot in #1633
- Catch Delta Table Not Found by @KuuCi in #1653
- Add Exception for missing UC column by @milocress in #1654
- Infer step size for Embeddings by @KuuCi in #1647
- Pin FAv2 by @mvpatel2000 in #1656
- Retry catching BlockingIOError by @KuuCi in #1657
- Catch bad data prep by @milocress in #1644
- Update pytest-cov requirement from <6,>=4 to >=4,<7 by @dependabot in #1663
- Bump coverage[toml] from 7.6.1 to 7.6.4 by @dependabot in #1650
- Move transform_model_pre_registration in hf_checkpointer by @irenedea in #1664
- Catch Cluster Permissions Error by @KuuCi in #1660
- Mosaicml version bump by @j316chuck in #1661
- Changes for removing unused terms in CE loss fn by @gupta-abhay in #1643
- Update setuptools requirement from <68.0.0 to <76.0.0 by @dependabot in #1662
- Bump docker version to torch 2.5.1 by @j316chuck in #1665
- Bump ubuntu 22.04 + torch 2.5.1 by @KuuCi in #1666
New Contributors
- @mattyding made their first contribution in #1623
Full Changelog: v0.14.5...v0.15.0
v0.14.5
v0.14.4
v0.14.3
v0.14.2
v0.14.1
New Features
Use log_model for registering models (#1544 )
Instead of calling the mlflow register API directly, we use the intended log_model
API, which will both log the model to mlflow run artifacts, and register it to Unity Catalog.
What's Changed
- Catch delta table not found error by @milocress in #1625
- Add Mlflow 403 PL UserError @dakinggg in #1623
- Catches when data prep cluster fails to start by @milocress in #1628
- add another cluster connection failure wrapper by @milocress in #1630
- Use log_model API to register the model by @nancyhung @dakinggg in #1544
Full Changelog: v0.14.0...v0.14.1
v0.14.0
New Features
Load Checkpoint Callback (#1570)
We added support for Composer's LoadCheckpoint callback, which loads a checkpoint at a specified event. This enables use cases like loading model base weights with peft.
callbacks:
load_checkpoint:
load_path: /path/to/your/weights
Breaking Changes
Accumulate over tokens in a Batch for Training Loss (#1618,#1610,#1595)
We added a new flag accumulate_train_batch_on_tokens
which specifies whether training loss is accumulated over the number of tokens in a batch, rather than the number of samples. It is true by default. This will slightly change loss curves for models trained with padding. The old behavior can be recovered by simply setting this to False explicitly.
Default Run Name (#1611)
If no run name is provided, we now will default to using composer's randomly generated run names. (Previously, we defaulted to using "llm" for the run name.)
What's Changed
- Update mcli examples to use 0.13.0 by @irenedea in #1594
- Pass accumulate_train_batch_on_tokens through to composer by @dakinggg in #1595
- Loosen MegaBlocks version pin by @mvpatel2000 in #1597
- Add configurability for hf checkpointer register timeout by @dakinggg in #1599
- Loosen MegaBlocks to <1.0 by @mvpatel2000 in #1598
- Finetuning dataloader validation tweaks by @mvpatel2000 in #1600
- Bump onnx from 1.16.2 to 1.17.0 by @dependabot in #1604
- Remove TE from dockerfile and instead add as optional dependency by @snarayan21 in #1605
- Data prep on multiple GPUs by @eitanturok in #1576
- Add env var for configuring the maximum number of processes to use for dataset processing by @irenedea in #1606
- Updated error message for cluster check by @nancyhung in #1602
- Use fun default composer run names by @irenedea in #1611
- Ensure log messages are properly formatted again by @snarayan21 in #1614
- Add UC not enabled error for delta to json conversion by @irenedea in #1613
- Use a temporary directory for downloading finetuning dataset files by @irenedea in #1608
- Bump composer version to 0.26.0 by @irenedea in #1616
- Add loss generating token counts by @dakinggg in #1610
- Change accumulate_train_batch_on_tokens default to True by @dakinggg in #1618
- Bump version to 0.15.0.dev0 by @irenedea in #1621
- Add load checkpoint callback by @irenedea in #1570
Full Changelog: v0.13.0...v0.14.0