Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage: new dynamic cache for models supporting sliding window attention #33619

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

Cyrilvallez
Copy link
Member

@Cyrilvallez Cyrilvallez commented Sep 20, 2024

What does this PR do?

This PR introduces DynamicSlidingWindowCache, a new kind of DynamicCache that will stop growing once its size is equal to the sliding window. This allows models using it to have a (dynamic) fix-sized cache, which is a big win for large inputs.
The idea is that it becomes the default for models with sliding window attention when no cache arguments are given in generate. Let me know what you think @gante @ArthurZucker

Here is a simple visual representation of the new cache for mistralai/Mistral-7B-v0.1 (we stop growing after hitting the 4096 sliding window):
old.pdf
new.pdf

BTW: SlidingWindowCache (the static one) is completely broken atm, cannot even be instantiated. Will take a look.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Cyrilvallez Cyrilvallez changed the title New dynamic cache for models supporting sliding window Memory usage: new dynamic cache for models supporting sliding window Sep 20, 2024
@Cyrilvallez Cyrilvallez changed the title Memory usage: new dynamic cache for models supporting sliding window Memory usage: new dynamic cache for models supporting sliding window attention Sep 20, 2024
@gante
Copy link
Member

gante commented Sep 20, 2024

BTW: SlidingWindowCache (the static one) is completely broken atm, cannot even be instantiated. Will take a look.

@Cyrilvallez in a recent PR, extra super().__init__() lines slipped in. I started working on it on #33297, where it is fixed and a test that checks all caches is added. That PR is actually about an often-requested feature, cache reuse.

I don't think I will be able to continue it next week -- would you like to finish that PR for me? :p (the test needs to be finished, a few cache classes are not yet working according to the test)

@gante
Copy link
Member

gante commented Sep 20, 2024

@Cyrilvallez suggestion: you can paste images directly to a PR header/comment, which will render the image directly here. It's more convenient for the reader than downloading a file 🤗

like this: screenshot -> drag image file into this text box
(old)
Screenshot 2024-09-20 at 16 31 11

(new)
Screenshot 2024-09-20 at 16 31 43

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for adding this cache 💛

Have you confirmed that slow tests for relevant models (like mistral) are passing? Or, at least, they introduce no new failure, in case some tests are failing on main

@gante
Copy link
Member

gante commented Sep 20, 2024

It seems Qwen2 is not happy with these changes :)

@Cyrilvallez
Copy link
Member Author

Cyrilvallez commented Sep 20, 2024

BTW: SlidingWindowCache (the static one) is completely broken atm, cannot even be instantiated. Will take a look.

@Cyrilvallez in a recent PR, extra super().__init__() lines slipped in. I started working on it on #33297, where it is fixed and a test that checks all caches is added. That PR is actually about an often-requested feature, cache reuse.

I don't think I will be able to continue it next week -- would you like to finish that PR for me? :p (the test needs to be finished, a few cache classes are not yet working according to the test)

Sure, I'll have a look into it next week 🤗

Slow tests with Mistral were passing, but indeed Qwen2 started to complain, I'm investigating

@ArthurZucker
Copy link
Collaborator

ping me once ready for review! 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants