Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: GaLore optimizer #1028

Open
gil2rok opened this issue Jul 26, 2024 · 4 comments
Open

Feature request: GaLore optimizer #1028

gil2rok opened this issue Jul 26, 2024 · 4 comments

Comments

@gil2rok
Copy link
Contributor

gil2rok commented Jul 26, 2024

Feature request for the Gradient Low-Rank Projection (GaLore) optimizer.

The GaLore optimizer computes low-rank gradients way to dramatically reduce memory. The ArXiv paper is here and the original Github implementation is here.

GaLore is quite popular: HuggingFace has implemented it here, PyTorch Lightning is trying to implement it here. The original implementation has 1K+ Github stars.

A good starting point may be this clean PyTorch implementation here in the PyTorch Optimizers library.

Lastly, readers should also be aware of the improved Q-GaLore paper and repository here and here respectively.

@vroulet
Copy link
Collaborator

vroulet commented Jul 27, 2024

Hello @gil2rok,

Thanks for pointing this out! Would you be willing to contribute with an implementation?
If not, we'll add it to our todo list but any contributions are super welcome! :)

@gil2rok
Copy link
Contributor Author

gil2rok commented Jul 29, 2024

At the moment, I do not have the bandwidth to add this optimizer. If I have some unexpected time, will try to come back to this cause I think it'd be super cool to implement.

Thanks for all you guys do for this library!

@viralvgupta
Copy link

Hi All,
I am working with my colleague and we have a working prototype of Galore implementation. We are doing some final tests. Can you suggest if we should follow PR diff. Please point us to any clear guidelines. super appreciate the help.

@vroulet
Copy link
Collaborator

vroulet commented Oct 6, 2024

That's great news!
Yes, the workflow currently would be to

  1. add this optimizer in the contrib folder, note that the PR you pointed out introduced some errors (like in the dtypes at initializations), so take a look at the latest version of e.g. prodigy or schedule_free.
  2. add this optimizer to the common_test of this folder, in particular making sure it can support some of optax functionalities like MultiSteps.
  3. if some specific functionalities of this optimizer requires some tests, feel free to add a file that make these tests (schedule_free for example has its own specific tests)

Feel free to start a PR and send it to us so that we can help reviewing it and integrating it.

Thank you for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants