Releases · CarperAI/trlx

23 Jun 22:21

jon-tow

v0.7.0

384b8b2

v0.7.0: NeMo PPO, PEFT Migration, and Fixes Latest

Latest

The v0.7.0 release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:

🐠 NeMo PPO and SFT support

This release introduces NeMo-backed PPO and SFT implementations for capabilities and improved system performance under large-scale training.

NeMo PPO by @cat-state in #472
Add Supervised Fine-Tuning (SFT) support for NeMo backend by @jon-tow in #353

🦆 PEFT Migration

trlx now supports parameter-efficient tuning methods via the peft library, which we hope will provide greater access to RLHF training in low-resource settings.

peft to opendelta migration (#434) + memory optimization (#320) by @glerzing in #486

Fixes and mores!

Set pad_token for all tokenizers in tests by @cat-state in #414
Convert tensors in the stats dict into scalars by @ZHAOTING in #417
Add Translation Finetuning Example with T5 by @alexandremuzio in #392
set torch dependency to version 2.0.0 for CUDA in installation instru… by @cauyxy in #409
[fix] add position_ids to LlamaModelBranch by @jon-tow in #418
fix(CI): use pinned deps for CI testing by @jon-tow in #423
Minibatch impl by @Dahoas in #364
[feat] Support tying metadata to each prompt by @maxreciprocate in #421
feat(examples): revamp simulacra example by @maxreciprocate in #430
[fix] update pairwise dataloader. by @Chen9154 in #395
fix(sft_trainer): total_steps calculation when running distributed by @maxreciprocate in #432
fix(base_trainer): gather weights in save_pretrained under zero3 by @maxreciprocate in #429
fix(offline_pipeline): ILQL negative indexing under truncation by @maxreciprocate in #435
fix(ppo_trainer): compute mean KL sequence-wise by @maxreciprocate in #441
Create Example training scripts to run in Stability cluster by @alexandremuzio in #419
Upgrade official released Ray instead of an unstable one. by @jovany-wang in #455
Pin transformers<=4.27.1 by @jovany-wang in #458
fix(ppo_gpt): prevent position_ids being None by @li-plus in #451
fix(trainer): init self.generate_sweep_kwarg at self.init by @mymusise in #460
Ensure trailing EOS token is added correctly for shorter generated outputs by @mikljohansson in #420
Pad prompts to the right in T5 examples and add EOS token to seq2seq prompts by @mikljohansson in #422
docs(base_trainer): fill in missing prepare_learning method by @maxreciprocate in #449
fix(modeling_ppo): invert padding percentage calculation by @maxreciprocate in #450
fix(base_trainer): flatten tag list for tensorboard hparams logging by @maxreciprocate in #444
feat(requirements.txt): upgrade dependencies by @maxreciprocate in #465
fix(offline_pipeline): force drop_last only for distributed by @maxreciprocate in #475
hotfix(bnb): install scipy with bitsanbytes to avoid ModuleNotFoundError by @jon-tow in #492
fix type hint in PromptPipeline.init by @g-simmons in #496
fix(modeling_ilql): single q-head indexing by @maxreciprocate in #471
Fix deprecated arguments for Accelerate >= v0.20.0 by @iwiwi in #506
Fix PPO log_ratio bug by @TobiasNorlund in #509
fix(ppo_trainer): default gen kwargs by @maxreciprocate in #510

New Contributors

@ZHAOTING made their first contribution in #417
@cauyxy made their first contribution in #409
@Chen9154 made their first contribution in #395
@jovany-wang made their first contribution in #455
@li-plus made their first contribution in #451
@mymusise made their first contribution in #460
@mikljohansson made their first contribution in #420
@g-simmons made their first contribution in #496
@iwiwi made their first contribution in #506
@TobiasNorlund made their first contribution in #509
@glerzing made their first contribution in #486

Full Changelog: v0.6.0...v0.7.0

Contributors

iwiwi, TobiasNorlund, and 14 other contributors

Assets 2

31 Mar 21:41

jon-tow

v0.6.0

b7db6f9

v0.6.0: LLaMa (Alpaca), Benchmark Util, T5 ILQL, Tests

The v0.6.0 release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:

📏 Benchmarking and Improved Unit Tests

This release introduces a new benchmark util to more easily track regressions in our training pipeline along with improved unit tests with the help of the hypothesis package:

[feat] Add benchmark tools by @reciprocated in #357
Add hypothesis tests for ILQL and fix edge cases by @cat-state in #370

🦙 LLaMa and Alpaca PPO/SFT Support

PPO support and examples for LLaMa are now available and we’ve baked in an example for instruction fine-tuning models with the Alpaca dataset using our SFT trainer:

[feat] Add LLaMa Model support for PPO by @PhungVanDuy in #375
Add Alpaca by @cat-state in #400

5️⃣ T5 ILQL Support

T5 models can now be fine-tuned with ILQL:

Support ILQL for T5 model, Fix PPO T5 for refactored code by @PhungVanDuy in #290

Fixes

Remove example usage of deprecating trlx.train dataset arg by @jon-tow in #331
Remove logit_mask unused argument by @cat-state in #332
[fix] Convert the rest of configs from ymls by @reciprocated in #346
fix default_ilql_config in notebook by @xu-song in #350
hot-fix: update PPOConfig import in examples by @jon-tow in #352
[fix] Update AdaptiveKLController with correct KL by @reciprocated in #361
[fix] Drop <eos> from ILQL sample's phrases by @reciprocated in #362
fixes half exp not implemented error by @Dahoas in #363
[fix] ILQL total_steps calculation when running distributed by @reciprocated in #374
[fix] split for validation by @hzwer in #369
fix(docs): Update incorrect PPORLElement logprob tensor shape hint by @jon-tow in #377
[fix] Enable HF downloads from a revision by @reciprocated in #382
[fix] Fix ILQL head sync under ZeRO3 by @reciprocated in #387
[fix] Preserve <eos> token and in-place it after trimming by @reciprocated in #401
Nemo ILQL fixes by @cat-state in #404

What's Changed

Move to Python config classes instead of ymls by @cat-state in #306
Add intermediate checkpointing to accelerate trainers by @jon-tow in #349
Enable infinite dataloader for prompt_dataloader in PPO Trainer by @alexandremuzio in #358
[feat] Add optional dependency list by @reciprocated in #381
Add some synchronization to the db download in the simulacra example by @dakinggg in #406

New Contributors

@xu-song made their first contribution in #350
@hzwer made their first contribution in #369
@alexandremuzio made their first contribution in #358
@dakinggg made their first contribution in #406

Full Changelog: v0.5.0...v0.6.0

Contributors

alexandremuzio, hzwer, and 7 other contributors

Assets 2

22 Feb 23:50

jon-tow

v0.5.0

165422d

v0.5.0: Initial NeMo integration, HH example, and improved Hugging Face integration

Highlights

Initial NeMo ILQL integration leading way to large-scale RLHF efforts. See https://github.com/CarperAI/trlx/blob/main/trlx/models/README.md to get started.
In-depth example showcasing trlx usage on AnthropicAI's Helpful & Harmless dataset https://github.com/CarperAI/trlx/tree/main/examples/hh
Improved ILQL modeling integration with Hugging Face transformers. Users can now work with AutoModelForCausalLMWithILQLHeads objects to generate samples and save/load fine-tuned ILQL models that can be quickly pushed to the Hub.

What's Changed

Add wandb group naming by @jon-tow in #188
Update reward_fn signatures in examples by @jon-tow in #190
Add tokenizer config by @reciprocated in #189
Fix extraction of mixed_precision option for deepspeed by @reciprocated in #197
Fix summarize_rlhf inference checkpoint paths by @jon-tow in #194
Make the config loading consistent across all example scripts. by @shermansiu in #192
Make Trainer.save_pretrained sub-directory optional by @jon-tow in #201
Update Readme to include T5 models by @aaronrmm in #198
Make make_head accept dtype parameter by @reciprocated in #213
Enable training with Tensorboard tracking by @marcobellagente93 in #209
Support nested updates in merge by @cat-state in #219
Fix typo reward normalize summarize by @PhungVanDuy in #221
Update stale comment from results table by @jon-tow in #222
Fix undefined trackers property by @alan-cooney in #224
Fix tokenizer missing form config.to_dict() by @alan-cooney in #228
Make experiment tracking optional by @jon-tow in #226
read tokenizer path from config correctly by @JustinAWei in #230
Add devcontainer support by @alan-cooney in #196
fix: change lora_a:float to lora_r:int by @aaronrmm in #235
Bump isort to hotfix CI code quality workflow by @jon-tow in #237
Fix optional tracking in accelerator.log by @jon-tow in #233
Improve documentation/comments on the random walk example by @alan-cooney in #208
Update link to "Learning to Summarize from Human Feedback" by @jon-tow in #241
Fix deepspeed state saving under save_best condition by @reciprocated in #242
added colab notebook by @smellslikeml in #244
[style] Increase black's line length by @reciprocated in #250
Add help string to get_advantages_and_returns by @pesvut in #225
Filter out empty responses by @reciprocated in #265
NeMo Integrate by @cat-state in #125
Add multi-process logger utility for status monitoring by @jon-tow in #254
Add NeMo support info to README by @jon-tow in #275
Fix distributed dataloaders & deduplicate eval by @reciprocated in #276
Improve PPO readability by @alan-cooney in #210
Add T5 to delta modifier map by @aaronrmm in #234
[fix] Set deepspeed's fp16 auto_cast to false by @reciprocated in #279
Rename remaining logprobs_from_logits call by @jon-tow in #281
[feat] Add Accelerate SFT Trainer by @reciprocated in #280
Add Colab Notebook for Sentiment by @zswitten in #285
Remove pylance installs from devcontainer by @jon-tow in #296
Move notebooks to examples dir by @jon-tow in #294
[fix] Summarize config discrepancy by @reciprocated in #293
Make Git check optional by @cat-state in #299
refactor: remove orchestrator abstraction from API by @jon-tow in #289
Set add_special_tokens=False to not add EOS unexpectedly by @cat-state in #287
[feat] Gather experience samples by @reciprocated in #305
[fix] Make gather_for_metrics usage more strict by @reciprocated in #315
Add helpful and harmless example by @reciprocated in #128
Adopt PreTrainedModelWrapper for Hugging Face models by @jon-tow in #215

New Contributors

@shermansiu made their first contribution in #192
@aaronrmm made their first contribution in #198
@marcobellagente93 made their first contribution in #209
@alan-cooney made their first contribution in #224
@JustinAWei made their first contribution in #230
@smellslikeml made their first contribution in #244
@pesvut made their first contribution in #225
@zswitten made their first contribution in #285

Full Changelog: v0.4...v0.5.0

Contributors

aaronrmm, zswitten, and 10 other contributors

Assets 2

13 Jan 16:50

LouisCastricato

v0.4

84dd156

v0.4 Pre-release

Pre-release

Summary of release notes:

Along with many improvements to experiment tracking, rollout logging, and configuration flexibility, new highlight features include:

Support for T5-based student models. Check out this example, where we show how to fine-tune a FLAN-T5 model on CNN/DailyMail for summarization.
Support for parameter-efficient tuning methods. Some of our preliminary results have shown LoRA to be a promising technique in scaling RLHF under low-resource settings and hope users get the chance to explore its potential. We've seen a ~30% reduction in memory usage and ~20% reduction in wallclock time for the same performance (quick report here)
Out-of-the-box support for 8-bit Adam(W) optimizers via TimDettmers/bitsandbytes, leading to a 15% decrease in memory allocation in one of our baseline examples (related report).

Other interesting examples are in the works, so stay tuned!

What's Changed

ILQL indicies on wrong device by @cat-state in #105
Fix ppo ratio inaccuracy by @reciprocated in #108
Set RNG seeds across multiple dependencies by @jon-tow in #113
Set seed after default config instantiation by @jon-tow in #114
Move queries on the device by @reciprocated in #115
Add ppo randomwalks example by @reciprocated in #119
Add unit tests to ensure valid example configs by @jon-tow in #120
updating gptj-config by @Dahoas in #109
Fix get distributed config by @reciprocated in #122
Add local rollout logging by @thomfoster in #124
Add support for more CausalLMs by @jon-tow in #103
Add hydra head support for GPTNeo by @jon-tow in #126
Add BloomModel hydra support by @jon-tow in #129
Simplifying logic to merge configs by @leshanbog in #134
add: load function for AccelerateRLModel by @dongs0104 in #136
Add OptimizerConfig and SchedulerConfig by @jon-tow in #135
Remove incorrect default config settings by @jon-tow in #137
Update TRL acknowledgement by @osanseviero in #138
Fix context overflow by @reciprocated in #131
Fix seeding per process by @reciprocated in #141
Set device-specific seeding with global rank by @jon-tow in #143
Freeze hydra model branches by @jon-tow in #140
Refactor RL model wrapper into a trainer module by @jon-tow in #144
Logging learning rate by @leshanbog in #147
Fix instantiating base transformer from a custom config by @reciprocated in #149
Linear LR scheduler by @leshanbog in #150
Update pre-commit version and add isort by @jon-tow in #152
fix: configure flake8, fix errors, add trackers config by @Mistobaan in #157
Features/use-python-3.8-in-ci by @Mistobaan in #159
Add bitsandbytes optimizer support by @aicrumb in #133
initial commit for trlx LORA support by @ethankim00 in #110
Fix default delta_kwargs handling by @jon-tow in #171
Add T5 model by @PhungVanDuy in #145
Fix wandb.errors.RequireError as reported in #162 by @ayulockin in #167
Update README.md by @LouisCastricato in #180
Update ILQL details by @reciprocated in #156
Add OpenAI Summarize RLHF with trlX by @PhungVanDuy in #175
Fix HuggingFace model.save_pretrained for DDP by @jon-tow in #181
Update generation utilities by @reciprocated in #172

New Contributors

@thomfoster made their first contribution in #124
@leshanbog made their first contribution in #134
@dongs0104 made their first contribution in #136
@osanseviero made their first contribution in #138
@Mistobaan made their first contribution in #157
@aicrumb made their first contribution in #133
@ethankim00 made their first contribution in #110
@PhungVanDuy made their first contribution in #145

Full Changelog: v0.3...v0.4

Contributors

Mistobaan, LouisCastricato, and 12 other contributors

Assets 2

21 Nov 16:27

LouisCastricato

v0.3

ff0d077

Pre alpha v0.3 Pre-release

Pre-release

What's Changed

Download simulacra by @reciprocated in #62
Update documentation (first review) by @simoninithomas in #64
Add ckpt/ to gitignore by @ayulockin in #70
change version in package to match lib by @cat-state in #73
Docs by @shahbuland in #71
[fix] Remove stale options from ppo_gptj.yml by @jon-tow in #77
Add entity name config for wandb logging by @jon-tow in #78
EXAMPLE : Interpreter grounded Neural Program Synthesis [WIP] by @reshinthadithyan in #81
Update TrainConfig optimizer hyperparameters by @jon-tow in #82
Add examples tip to contribution guide by @jon-tow in #84
Fix pipeline's context overflow by @reciprocated in #87
Refactor PPO objective function by @jon-tow in #88
Fix slow ilql eval by @reciprocated in #91
rerun #89 by @cat-state in #92
Hyperparameter Optimization with Ray Tune and Weights and Biases by @ayulockin in #76
Update readme instructions by @reciprocated in #93
Update README to align nomenclature correctness by @ayulockin in #97
Add optional reward scaling by @reciprocated in #95
Force class registry via imports by @jon-tow in #100
Add optional normalization (cont.) by @reciprocated in #98
Restructure sweeps for reuse by @reciprocated in #102

New Contributors

@simoninithomas made their first contribution in #64
@ayulockin made their first contribution in #70
@reshinthadithyan made their first contribution in #81

Full Changelog: v0.2...v0.3

Contributors

simoninithomas, ayulockin, and 5 other contributors

Assets 2

21 Oct 22:20

LouisCastricato

v0.2

06cd30f

Alpha v0.2 Pre-release

Pre-release

Complete revamp of our initial release.

New features:

Hydra models, 20x faster than vanilla PPO with minimal performance hits at large scales
Massively revamped API, significantly less boiler plate.
Save/load callbacks.
Greatly improved orchestrator.
Better commented RL code, easier to understand whats going on.
Cool examples, including architext and simulacra.
Better extendability, and standardized styling.

Features coming soon:

Megatron support! we're already working on this.
More interesting examples that are relevant to production use cases of TRLX.
Better integration of W&B, including sweeps.
Evaluation and benchmarking.

Autogenerated release notes below:

What's Changed

Fix typo by @mrm8488 in #2
Create LICENSE by @LouisCastricato in #3
QOL fixes by @LouisCastricato in #5
stage ilql by @reciprocated in #6
Adds style file and reward function capabilities to ppo orchestrator by @LouisCastricato in #8
Update ppo value head + print logs by @Dahoas in #11
Make ilql respect the config & remove sin by @reciprocated in #22
Docs by @shahbuland in #31
Implemented hydra heads + adaptive kl by @Dahoas in #33
Add pre-commit with black by @cat-state in #36
[update] Improve package setup by @jon-tow in #42
Add initial issue templates by @jon-tow in #45
Some readme improvements by @thedch in #44
Add initial GitHub workflows by @jon-tow in #43
[docs] Add CONTRIBUTING.md by @jon-tow in #52
Simplify api by @reciprocated in #24

New Contributors

@mrm8488 made their first contribution in #2
@LouisCastricato made their first contribution in #3
@reciprocated made their first contribution in #6
@Dahoas made their first contribution in #11
@shahbuland made their first contribution in #31
@cat-state made their first contribution in #36
@jon-tow made their first contribution in #42
@thedch made their first contribution in #44

Full Changelog: https://github.com/CarperAI/trlx/commits/v0.2

Contributors

mrm8488, LouisCastricato, and 6 other contributors

Assets 2

1 Join discussion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐠 NeMo PPO and SFT support

🦆 PEFT Migration

Fixes and mores!

New Contributors

Contributors

📏 Benchmarking and Improved Unit Tests

🦙 LLaMa and Alpaca PPO/SFT Support

5️⃣ T5 ILQL Support

Fixes

What's Changed

New Contributors

Contributors

Highlights

What's Changed

New Contributors

Contributors

Summary of release notes:

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: CarperAI/trlx

v0.7.0: NeMo PPO, PEFT Migration, and Fixes

🐠 NeMo PPO and SFT support

🦆 PEFT Migration

Fixes and mores!

New Contributors

Contributors

v0.6.0: LLaMa (Alpaca), Benchmark Util, T5 ILQL, Tests

📏 Benchmarking and Improved Unit Tests

🦙 LLaMa and Alpaca PPO/SFT Support

5️⃣ T5 ILQL Support

Fixes

What's Changed

New Contributors

Contributors

v0.5.0: Initial NeMo integration, HH example, and improved Hugging Face integration

Highlights

What's Changed

New Contributors

Contributors

v0.4

Summary of release notes:

What's Changed

New Contributors

Contributors

Pre alpha v0.3

What's Changed

New Contributors

Contributors

Alpha v0.2

What's Changed

New Contributors

Contributors