Releases: CarperAI/trlx
v0.7.0: NeMo PPO, PEFT Migration, and Fixes
The v0.7.0
release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:
🐠 NeMo PPO and SFT support
This release introduces NeMo-backed PPO and SFT implementations for capabilities and improved system performance under large-scale training.
- NeMo PPO by @cat-state in #472
- Add Supervised Fine-Tuning (SFT) support for NeMo backend by @jon-tow in #353
🦆 PEFT Migration
trlx
now supports parameter-efficient tuning methods via the peft
library, which we hope will provide greater access to RLHF training in low-resource settings.
Fixes and mores!
- Set pad_token for all tokenizers in tests by @cat-state in #414
- Convert tensors in the stats dict into scalars by @ZHAOTING in #417
- Add Translation Finetuning Example with T5 by @alexandremuzio in #392
- set torch dependency to version 2.0.0 for CUDA in installation instru… by @cauyxy in #409
- [fix] add
position_ids
toLlamaModelBranch
by @jon-tow in #418 - fix(CI): use pinned deps for CI testing by @jon-tow in #423
- Minibatch impl by @Dahoas in #364
- [feat] Support tying metadata to each prompt by @maxreciprocate in #421
- feat(examples): revamp simulacra example by @maxreciprocate in #430
- [fix] update pairwise dataloader. by @Chen9154 in #395
- fix(sft_trainer):
total_steps
calculation when running distributed by @maxreciprocate in #432 - fix(base_trainer): gather weights in
save_pretrained
under zero3 by @maxreciprocate in #429 - fix(offline_pipeline): ILQL negative indexing under truncation by @maxreciprocate in #435
- fix(ppo_trainer): compute mean KL sequence-wise by @maxreciprocate in #441
- Create Example training scripts to run in Stability cluster by @alexandremuzio in #419
- Upgrade official released Ray instead of an unstable one. by @jovany-wang in #455
- Pin transformers<=4.27.1 by @jovany-wang in #458
- fix(ppo_gpt): prevent position_ids being None by @li-plus in #451
- fix(trainer): init self.generate_sweep_kwarg at self.init by @mymusise in #460
- Ensure trailing EOS token is added correctly for shorter generated outputs by @mikljohansson in #420
- Pad prompts to the right in T5 examples and add EOS token to seq2seq prompts by @mikljohansson in #422
- docs(base_trainer): fill in missing
prepare_learning
method by @maxreciprocate in #449 - fix(modeling_ppo): invert padding percentage calculation by @maxreciprocate in #450
- fix(base_trainer): flatten tag list for tensorboard hparams logging by @maxreciprocate in #444
- feat(requirements.txt): upgrade dependencies by @maxreciprocate in #465
- fix(offline_pipeline): force
drop_last
only for distributed by @maxreciprocate in #475 - hotfix(bnb): install
scipy
withbitsanbytes
to avoidModuleNotFoundError
by @jon-tow in #492 - fix type hint in PromptPipeline.init by @g-simmons in #496
- fix(modeling_ilql): single q-head indexing by @maxreciprocate in #471
- Fix deprecated arguments for Accelerate >= v0.20.0 by @iwiwi in #506
- Fix PPO log_ratio bug by @TobiasNorlund in #509
- fix(ppo_trainer): default gen kwargs by @maxreciprocate in #510
New Contributors
- @ZHAOTING made their first contribution in #417
- @cauyxy made their first contribution in #409
- @Chen9154 made their first contribution in #395
- @jovany-wang made their first contribution in #455
- @li-plus made their first contribution in #451
- @mymusise made their first contribution in #460
- @mikljohansson made their first contribution in #420
- @g-simmons made their first contribution in #496
- @iwiwi made their first contribution in #506
- @TobiasNorlund made their first contribution in #509
- @glerzing made their first contribution in #486
Full Changelog: v0.6.0...v0.7.0
v0.6.0: LLaMa (Alpaca), Benchmark Util, T5 ILQL, Tests
The v0.6.0
release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:
📏 Benchmarking and Improved Unit Tests
This release introduces a new benchmark util to more easily track regressions in our training pipeline along with improved unit tests with the help of the hypothesis
package:
- [feat] Add benchmark tools by @reciprocated in #357
- Add
hypothesis
tests for ILQL and fix edge cases by @cat-state in #370
🦙 LLaMa and Alpaca PPO/SFT Support
PPO support and examples for LLaMa are now available and we’ve baked in an example for instruction fine-tuning models with the Alpaca dataset using our SFT trainer:
- [feat] Add LLaMa Model support for PPO by @PhungVanDuy in #375
- Add Alpaca by @cat-state in #400
5️⃣ T5 ILQL Support
T5 models can now be fine-tuned with ILQL:
- Support ILQL for T5 model, Fix PPO T5 for refactored code by @PhungVanDuy in #290
Fixes
- Remove example usage of deprecating
trlx.train
dataset arg by @jon-tow in #331 - Remove logit_mask unused argument by @cat-state in #332
- [fix] Convert the rest of configs from
ymls
by @reciprocated in #346 - fix default_ilql_config in notebook by @xu-song in #350
- hot-fix: update PPOConfig import in examples by @jon-tow in #352
- [fix] Update
AdaptiveKLController
with correct KL by @reciprocated in #361 - [fix] Drop
<eos>
from ILQL sample's phrases by @reciprocated in #362 - fixes half exp not implemented error by @Dahoas in #363
- [fix] ILQL
total_steps
calculation when running distributed by @reciprocated in #374 - [fix] split for validation by @hzwer in #369
- fix(docs): Update incorrect
PPORLElement
logprob tensor shape hint by @jon-tow in #377 - [fix] Enable HF downloads from a revision by @reciprocated in #382
- [fix] Fix ILQL head sync under ZeRO3 by @reciprocated in #387
- [fix] Preserve
<eos>
token and in-place it after trimming by @reciprocated in #401 - Nemo ILQL fixes by @cat-state in #404
What's Changed
- Move to Python config classes instead of
ymls
by @cat-state in #306 - Add intermediate checkpointing to
accelerate
trainers by @jon-tow in #349 - Enable infinite dataloader for prompt_dataloader in PPO Trainer by @alexandremuzio in #358
- [feat] Add optional dependency list by @reciprocated in #381
- Add some synchronization to the db download in the simulacra example by @dakinggg in #406
New Contributors
- @xu-song made their first contribution in #350
- @hzwer made their first contribution in #369
- @alexandremuzio made their first contribution in #358
- @dakinggg made their first contribution in #406
Full Changelog: v0.5.0...v0.6.0
v0.5.0: Initial NeMo integration, HH example, and improved Hugging Face integration
Highlights
- Initial NeMo ILQL integration leading way to large-scale RLHF efforts. See https://github.com/CarperAI/trlx/blob/main/trlx/models/README.md to get started.
- In-depth example showcasing
trlx
usage on AnthropicAI's Helpful & Harmless dataset https://github.com/CarperAI/trlx/tree/main/examples/hh - Improved ILQL modeling integration with Hugging Face
transformers
. Users can now work withAutoModelForCausalLMWithILQLHeads
objects to generate samples and save/load fine-tuned ILQL models that can be quickly pushed to the Hub.
What's Changed
- Add
wandb
group naming by @jon-tow in #188 - Update
reward_fn
signatures in examples by @jon-tow in #190 - Add tokenizer config by @reciprocated in #189
- Fix extraction of
mixed_precision
option for deepspeed by @reciprocated in #197 - Fix
summarize_rlhf
inference checkpoint paths by @jon-tow in #194 - Make the config loading consistent across all example scripts. by @shermansiu in #192
- Make
Trainer.save_pretrained
sub-directory optional by @jon-tow in #201 - Update Readme to include T5 models by @aaronrmm in #198
- Make
make_head
accept dtype parameter by @reciprocated in #213 - Enable training with Tensorboard tracking by @marcobellagente93 in #209
- Support nested updates in
merge
by @cat-state in #219 - Fix typo reward normalize summarize by @PhungVanDuy in #221
- Update stale comment from results table by @jon-tow in #222
- Fix undefined trackers property by @alan-cooney in #224
- Fix tokenizer missing form config.to_dict() by @alan-cooney in #228
- Make experiment tracking optional by @jon-tow in #226
- read tokenizer path from config correctly by @JustinAWei in #230
- Add devcontainer support by @alan-cooney in #196
- fix: change lora_a:float to lora_r:int by @aaronrmm in #235
- Bump
isort
to hotfix CI code quality workflow by @jon-tow in #237 - Fix optional tracking in
accelerator.log
by @jon-tow in #233 - Improve documentation/comments on the random walk example by @alan-cooney in #208
- Update link to "Learning to Summarize from Human Feedback" by @jon-tow in #241
- Fix deepspeed state saving under
save_best
condition by @reciprocated in #242 - added colab notebook by @smellslikeml in #244
- [style] Increase black's line length by @reciprocated in #250
- Add help string to get_advantages_and_returns by @pesvut in #225
- Filter out empty responses by @reciprocated in #265
- NeMo Integrate by @cat-state in #125
- Add multi-process logger utility for status monitoring by @jon-tow in #254
- Add
NeMo
support info toREADME
by @jon-tow in #275 - Fix distributed dataloaders & deduplicate eval by @reciprocated in #276
- Improve PPO readability by @alan-cooney in #210
- Add T5 to delta modifier map by @aaronrmm in #234
- [fix] Set deepspeed's fp16
auto_cast
to false by @reciprocated in #279 - Rename remaining
logprobs_from_logits
call by @jon-tow in #281 - [feat] Add Accelerate SFT Trainer by @reciprocated in #280
- Add Colab Notebook for Sentiment by @zswitten in #285
- Remove
pylance
installs from devcontainer by @jon-tow in #296 - Move notebooks to examples dir by @jon-tow in #294
- [fix] Summarize config discrepancy by @reciprocated in #293
- Make Git check optional by @cat-state in #299
- refactor: remove orchestrator abstraction from API by @jon-tow in #289
- Set
add_special_tokens=False
to not add EOS unexpectedly by @cat-state in #287 - [feat] Gather experience samples by @reciprocated in #305
- [fix] Make
gather_for_metrics
usage more strict by @reciprocated in #315 - Add helpful and harmless example by @reciprocated in #128
- Adopt
PreTrainedModelWrapper
for Hugging Face models by @jon-tow in #215
New Contributors
- @shermansiu made their first contribution in #192
- @aaronrmm made their first contribution in #198
- @marcobellagente93 made their first contribution in #209
- @alan-cooney made their first contribution in #224
- @JustinAWei made their first contribution in #230
- @smellslikeml made their first contribution in #244
- @pesvut made their first contribution in #225
- @zswitten made their first contribution in #285
Full Changelog: v0.4...v0.5.0
v0.4
Summary of release notes:
Along with many improvements to experiment tracking, rollout logging, and configuration flexibility, new highlight features include:
-
Support for T5-based student models. Check out this example, where we show how to fine-tune a FLAN-T5 model on CNN/DailyMail for summarization.
-
Support for parameter-efficient tuning methods. Some of our preliminary results have shown LoRA to be a promising technique in scaling RLHF under low-resource settings and hope users get the chance to explore its potential. We've seen a ~30% reduction in memory usage and ~20% reduction in wallclock time for the same performance (quick report here)
-
Out-of-the-box support for 8-bit Adam(W) optimizers via TimDettmers/bitsandbytes, leading to a 15% decrease in memory allocation in one of our baseline examples (related report).
Other interesting examples are in the works, so stay tuned!
What's Changed
- ILQL indicies on wrong device by @cat-state in #105
- Fix ppo ratio inaccuracy by @reciprocated in #108
- Set RNG seeds across multiple dependencies by @jon-tow in #113
- Set seed after default config instantiation by @jon-tow in #114
- Move queries on the device by @reciprocated in #115
- Add ppo randomwalks example by @reciprocated in #119
- Add unit tests to ensure valid example configs by @jon-tow in #120
- updating gptj-config by @Dahoas in #109
- Fix get distributed config by @reciprocated in #122
- Add local rollout logging by @thomfoster in #124
- Add support for more
CausalLM
s by @jon-tow in #103 - Add hydra head support for
GPTNeo
by @jon-tow in #126 - Add
BloomModel
hydra support by @jon-tow in #129 - Simplifying logic to merge configs by @leshanbog in #134
- add: load function for AccelerateRLModel by @dongs0104 in #136
- Add
OptimizerConfig
andSchedulerConfig
by @jon-tow in #135 - Remove incorrect default config settings by @jon-tow in #137
- Update TRL acknowledgement by @osanseviero in #138
- Fix context overflow by @reciprocated in #131
- Fix seeding per process by @reciprocated in #141
- Set device-specific seeding with global rank by @jon-tow in #143
- Freeze hydra model branches by @jon-tow in #140
- Refactor RL model wrapper into a
trainer
module by @jon-tow in #144 - Logging learning rate by @leshanbog in #147
- Fix instantiating base transformer from a custom config by @reciprocated in #149
- Linear LR scheduler by @leshanbog in #150
- Update
pre-commit
version and addisort
by @jon-tow in #152 - fix: configure flake8, fix errors, add
trackers
config by @Mistobaan in #157 - Features/use-python-3.8-in-ci by @Mistobaan in #159
- Add
bitsandbytes
optimizer support by @aicrumb in #133 - initial commit for trlx LORA support by @ethankim00 in #110
- Fix default
delta_kwargs
handling by @jon-tow in #171 - Add T5 model by @PhungVanDuy in #145
- Fix wandb.errors.RequireError as reported in #162 by @ayulockin in #167
- Update README.md by @LouisCastricato in #180
- Update ILQL details by @reciprocated in #156
- Add OpenAI Summarize RLHF with trlX by @PhungVanDuy in #175
- Fix HuggingFace
model.save_pretrained
for DDP by @jon-tow in #181 - Update generation utilities by @reciprocated in #172
New Contributors
- @thomfoster made their first contribution in #124
- @leshanbog made their first contribution in #134
- @dongs0104 made their first contribution in #136
- @osanseviero made their first contribution in #138
- @Mistobaan made their first contribution in #157
- @aicrumb made their first contribution in #133
- @ethankim00 made their first contribution in #110
- @PhungVanDuy made their first contribution in #145
Full Changelog: v0.3...v0.4
Pre alpha v0.3
What's Changed
- Download simulacra by @reciprocated in #62
- Update documentation (first review) by @simoninithomas in #64
- Add ckpt/ to gitignore by @ayulockin in #70
- change version in package to match lib by @cat-state in #73
- Docs by @shahbuland in #71
- [fix] Remove stale options from
ppo_gptj.yml
by @jon-tow in #77 - Add
entity
name config forwandb
logging by @jon-tow in #78 - EXAMPLE : Interpreter grounded Neural Program Synthesis [WIP] by @reshinthadithyan in #81
- Update
TrainConfig
optimizer hyperparameters by @jon-tow in #82 - Add examples tip to contribution guide by @jon-tow in #84
- Fix pipeline's context overflow by @reciprocated in #87
- Refactor PPO objective function by @jon-tow in #88
- Fix slow ilql eval by @reciprocated in #91
- rerun #89 by @cat-state in #92
- Hyperparameter Optimization with Ray Tune and Weights and Biases by @ayulockin in #76
- Update readme instructions by @reciprocated in #93
- Update README to align nomenclature correctness by @ayulockin in #97
- Add optional reward scaling by @reciprocated in #95
- Force class registry via imports by @jon-tow in #100
- Add optional normalization (cont.) by @reciprocated in #98
- Restructure sweeps for reuse by @reciprocated in #102
New Contributors
- @simoninithomas made their first contribution in #64
- @ayulockin made their first contribution in #70
- @reshinthadithyan made their first contribution in #81
Full Changelog: v0.2...v0.3
Alpha v0.2
Complete revamp of our initial release.
New features:
- Hydra models, 20x faster than vanilla PPO with minimal performance hits at large scales
- Massively revamped API, significantly less boiler plate.
- Save/load callbacks.
- Greatly improved orchestrator.
- Better commented RL code, easier to understand whats going on.
- Cool examples, including architext and simulacra.
- Better extendability, and standardized styling.
Features coming soon:
- Megatron support! we're already working on this.
- More interesting examples that are relevant to production use cases of TRLX.
- Better integration of W&B, including sweeps.
- Evaluation and benchmarking.
:)
Autogenerated release notes below:
What's Changed
- Fix typo by @mrm8488 in #2
- Create LICENSE by @LouisCastricato in #3
- QOL fixes by @LouisCastricato in #5
- stage ilql by @reciprocated in #6
- Adds style file and reward function capabilities to ppo orchestrator by @LouisCastricato in #8
- Update ppo value head + print logs by @Dahoas in #11
- Make ilql respect the config & remove sin by @reciprocated in #22
- Docs by @shahbuland in #31
- Implemented hydra heads + adaptive kl by @Dahoas in #33
- Add pre-commit with
black
by @cat-state in #36 - [update] Improve package setup by @jon-tow in #42
- Add initial issue templates by @jon-tow in #45
- Some readme improvements by @thedch in #44
- Add initial GitHub workflows by @jon-tow in #43
- [docs] Add
CONTRIBUTING.md
by @jon-tow in #52 - Simplify api by @reciprocated in #24
New Contributors
- @mrm8488 made their first contribution in #2
- @LouisCastricato made their first contribution in #3
- @reciprocated made their first contribution in #6
- @Dahoas made their first contribution in #11
- @shahbuland made their first contribution in #31
- @cat-state made their first contribution in #36
- @jon-tow made their first contribution in #42
- @thedch made their first contribution in #44
Full Changelog: https://github.com/CarperAI/trlx/commits/v0.2