Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for StableAdamW optimizer in Trainer #36564

Open
capemox opened this issue Mar 5, 2025 · 2 comments · May be fixed by #36606
Open

Add support for StableAdamW optimizer in Trainer #36564

capemox opened this issue Mar 5, 2025 · 2 comments · May be fixed by #36606
Labels
Feature request Request for a new feature

Comments

@capemox
Copy link
Contributor

capemox commented Mar 5, 2025

Feature request

StableAdamW is an optimizer first introduced in Stable and low-precision training for large-scale vision-language models, an AdamW and AdaFactor hybrid optimizer, leading to more stable training. Most notably, however, it has been used in the modernBERT paper:

StableAdamW’s learning rate clipping outperformed standard gradient clipping on downstream tasks and led to more stable training

It would be great is this is available as an optimizer in Trainer!

Motivation

More models in the future may use StableAdamW because of its success in training modernBERT, and having it as an option in Trainer (as optim in TrainingArguments) would be convenient.

Your contribution

I'm interested to contribute! The modernBERT paper uses the implementation from optimi, which can be added as an import. I'd love to submit a PR.

@capemox capemox added the Feature request Request for a new feature label Mar 5, 2025
@Rocketknight1
Copy link
Member

cc @muellerzr @SunMarc

@SunMarc
Copy link
Member

SunMarc commented Mar 6, 2025

Hi @capemox, feel free to submit a PR for that !

@capemox capemox linked a pull request Mar 7, 2025 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants