Cache: don't show warning in forward passes when `past_key_values` is None #33541

gante · 2024-09-17T15:06:16Z

What does this PR do?

Because of the transition from tuple of tuples to Cache instances, we were throwing a warning when converting past_key_values to the new cache format in the forward passes.

One of those situations was when use_cache=True and past_key_values is None... but there is nothing to convert there. In fact, most of the times, the user didn't even specify the argument (see test script below). Moreover, after the transition is complete, we want to keep the default past_key_values=None argument.

As such, this PR removes the warning when past_key_values=None.

Fixes #33489

Test script:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-8B")

inputs = tokenizer(["The quick brown"], return_tensors="pt")
gen_out = model(**inputs)

Before:

We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/internal/generation_utils#transformers.Cache)

Now: no warning :)

gante · 2024-09-17T15:11:28Z

src/transformers/models/bloom/modeling_bloom.py

+                past_key_values = DynamicCache.from_legacy_cache(past_key_values)
+                logger.warning_once(
+                    "We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and "
+                    "will be removed in v4.47. Please use an appropriate `Cache` class "


bumped the deprecation to v4.47, we some key models like T5 are still missing

gante · 2024-09-17T15:12:08Z

src/transformers/models/bloom/modeling_bloom.py

+        next_cache = next_decoder_cache if use_cache else None
+        if return_legacy_cache:
+            next_cache = next_cache.to_legacy_cache()


copy/paste from llama

(on some models, this pattern was slightly different)

gante · 2024-09-17T15:13:02Z

src/transformers/models/bloom/modeling_bloom.py

-                "Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/internal/generation_utils#transformers.Cache)"
-            )
+        return_legacy_cache = False
+        if use_cache and not isinstance(past_key_values, Cache):


Note: not self.training was removed.

If we are training and we pass past_key_values as tuple of tuples, we definitely want to see the warning -- the code will break in the near future

HuggingFaceDocBuilderDev · 2024-09-17T15:40:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp

Thanks for fixing, this is way much better than checking for self.training

LysandreJik

Thanks Joao!

LysandreJik · 2024-09-18T13:42:57Z

src/transformers/models/dbrx/modeling_dbrx.py

+                logger.warning_once(
+                    "We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and "
+                    "will be removed in v4.47. Please use an appropriate `Cache` class "
+                    "(https://huggingface.co/docs/transformers/internal/generation_utils#transformers.Cache)"


(nit not really related to the PR but to the link which was already here before)

Linking to the Cache class is cool but you have to scroll down a bit to see an example. Would it be possible to link to a migration doc/example showcasing how a previously written code with past key values as a tuple of tuples can be adapted to be sent to the model?

The more copy-pastable the example, the less friction there will be here

@LysandreJik good point!

I've added a tiny section to our cache docs about the legacy cache and how to convert it to/from the new format, with an example (cc @zucchini-nlp). This warning now points to that section in the docs.

(will merge after confirming the docs with the doc builder)

EDIT: for some reason, the doc builder is not updating its contents, despite the doc job being successful 🤔 I'm going to merge and double-check the merged results

EDIT2: it worked :) https://huggingface.co/docs/transformers/main/en/kv_cache#legacy-cache-format

… None (huggingface#33541)

gante and others added 2 commits September 17, 2024 14:59

don't show warning when cache is None

79ca533

Merge branch 'main' into avoid_warning

4e7f564

gante commented Sep 17, 2024

View reviewed changes

gante requested review from LysandreJik and zucchini-nlp September 17, 2024 15:13

make fix-copies

02fc0f8

zucchini-nlp approved these changes Sep 18, 2024

View reviewed changes

LysandreJik approved these changes Sep 18, 2024

View reviewed changes

gante added 2 commits September 18, 2024 17:21

add legacy cache section to docs

f250b38

new link

e556965

gante merged commit 80b774e into huggingface:main Sep 19, 2024
17 checks passed

gante deleted the avoid_warning branch September 19, 2024 11:02

gante mentioned this pull request Sep 19, 2024

Cache: don't throw warnings on gemma2 when instantiating a new cache #33595

Merged

itazap pushed a commit to NielsRogge/transformers that referenced this pull request Sep 20, 2024

Cache: don't show warning in forward passes when past_key_values is…

a66fc30

… None (huggingface#33541)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache: don't show warning in forward passes when `past_key_values` is None #33541

Cache: don't show warning in forward passes when `past_key_values` is None #33541

gante commented Sep 17, 2024 •

edited

Loading

gante Sep 17, 2024

gante Sep 17, 2024

gante Sep 17, 2024

HuggingFaceDocBuilderDev commented Sep 17, 2024

zucchini-nlp left a comment

LysandreJik left a comment

LysandreJik Sep 18, 2024

gante Sep 18, 2024 •

edited

Loading

Cache: don't show warning in forward passes when past_key_values is None #33541

Cache: don't show warning in forward passes when past_key_values is None #33541

Conversation

gante commented Sep 17, 2024 • edited Loading

What does this PR do?

gante Sep 17, 2024

Choose a reason for hiding this comment

gante Sep 17, 2024

Choose a reason for hiding this comment

gante Sep 17, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 17, 2024

zucchini-nlp left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Sep 18, 2024

Choose a reason for hiding this comment

gante Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

Cache: don't show warning in forward passes when `past_key_values` is None #33541

Cache: don't show warning in forward passes when `past_key_values` is None #33541

gante commented Sep 17, 2024 •

edited

Loading

gante Sep 18, 2024 •

edited

Loading