Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache: don't show warning in forward passes when past_key_values is None #33541

Merged
merged 5 commits into from
Sep 19, 2024

Conversation

gante
Copy link
Member

@gante gante commented Sep 17, 2024

What does this PR do?

Because of the transition from tuple of tuples to Cache instances, we were throwing a warning when converting past_key_values to the new cache format in the forward passes.

One of those situations was when use_cache=True and past_key_values is None... but there is nothing to convert there. In fact, most of the times, the user didn't even specify the argument (see test script below). Moreover, after the transition is complete, we want to keep the default past_key_values=None argument.

As such, this PR removes the warning when past_key_values=None.

Fixes #33489


Test script:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-8B")

inputs = tokenizer(["The quick brown"], return_tensors="pt")
gen_out = model(**inputs)

Before:

We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/internal/generation_utils#transformers.Cache)

Now: no warning :)

past_key_values = DynamicCache.from_legacy_cache(past_key_values)
logger.warning_once(
"We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and "
"will be removed in v4.47. Please use an appropriate `Cache` class "
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bumped the deprecation to v4.47, we some key models like T5 are still missing

Comment on lines +772 to +774
next_cache = next_decoder_cache if use_cache else None
if return_legacy_cache:
next_cache = next_cache.to_legacy_cache()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy/paste from llama

(on some models, this pattern was slightly different)

"Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/internal/generation_utils#transformers.Cache)"
)
return_legacy_cache = False
if use_cache and not isinstance(past_key_values, Cache):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: not self.training was removed.

If we are training and we pass past_key_values as tuple of tuples, we definitely want to see the warning -- the code will break in the near future

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing, this is way much better than checking for self.training

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Joao!

logger.warning_once(
"We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and "
"will be removed in v4.47. Please use an appropriate `Cache` class "
"(https://huggingface.co/docs/transformers/internal/generation_utils#transformers.Cache)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit not really related to the PR but to the link which was already here before)

Linking to the Cache class is cool but you have to scroll down a bit to see an example. Would it be possible to link to a migration doc/example showcasing how a previously written code with past key values as a tuple of tuples can be adapted to be sent to the model?

The more copy-pastable the example, the less friction there will be here

Copy link
Member Author

@gante gante Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LysandreJik good point!

I've added a tiny section to our cache docs about the legacy cache and how to convert it to/from the new format, with an example (cc @zucchini-nlp). This warning now points to that section in the docs.

(will merge after confirming the docs with the doc builder)

EDIT: for some reason, the doc builder is not updating its contents, despite the doc job being successful 🤔 I'm going to merge and double-check the merged results

EDIT2: it worked :) https://huggingface.co/docs/transformers/main/en/kv_cache#legacy-cache-format

@gante gante merged commit 80b774e into huggingface:main Sep 19, 2024
17 checks passed
@gante gante deleted the avoid_warning branch September 19, 2024 11:02
itazap pushed a commit to NielsRogge/transformers that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

passing past_key_values as a tuple is deprecated, but unclear how to resolve
4 participants