Add support for StableAdamW optimizer in Trainer #36564

capemox · 2025-03-05T15:14:19Z

Feature request

StableAdamW is an optimizer first introduced in Stable and low-precision training for large-scale vision-language models, an AdamW and AdaFactor hybrid optimizer, leading to more stable training. Most notably, however, it has been used in the modernBERT paper:

StableAdamW’s learning rate clipping outperformed standard gradient clipping on downstream tasks and led to more stable training

It would be great is this is available as an optimizer in Trainer!

Motivation

More models in the future may use StableAdamW because of its success in training modernBERT, and having it as an option in Trainer (as optim in TrainingArguments) would be convenient.

Your contribution

I'm interested to contribute! The modernBERT paper uses the implementation from optimi, which can be added as an import. I'd love to submit a PR.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-03-06T10:31:27Z

cc @muellerzr @SunMarc

SunMarc · 2025-03-06T10:38:16Z

Hi @capemox, feel free to submit a PR for that !

capemox added the Feature request Request for a new feature label Mar 5, 2025

capemox linked a pull request Mar 7, 2025 that will close this issue

Add StableAdamW Optimizer #36606

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for StableAdamW optimizer in Trainer #36564

Add support for StableAdamW optimizer in Trainer #36564

capemox commented Mar 5, 2025

Rocketknight1 commented Mar 6, 2025

SunMarc commented Mar 6, 2025

Add support for StableAdamW optimizer in Trainer #36564

Add support for StableAdamW optimizer in Trainer #36564

Comments

capemox commented Mar 5, 2025

Feature request

Motivation

Your contribution

Rocketknight1 commented Mar 6, 2025

SunMarc commented Mar 6, 2025