Paligemma: fix static cache test #33941

zucchini-nlp · 2024-10-04T08:06:18Z

What does this PR do?

Fixes the flaky test on paligemma from #33630

zucchini-nlp · 2024-10-04T08:07:28Z

src/transformers/models/paligemma/modeling_paligemma.py

@@ -378,7 +378,7 @@ def _update_causal_mask(
            if is_training:
                causal_mask = torch.triu(causal_mask, diagonal=1)
            else:
-                causal_mask = torch.zeros_like(causal_mask)
+                causal_mask[:, :sequence_length] = 0.0


this was the cause as it was not masking dummy tokens from static cache, and thus we always ended up with no mask on those token positions

aah gotcha. good catch

zucchini-nlp · 2024-10-04T08:08:23Z

src/transformers/models/paligemma/modeling_paligemma.py

@@ -604,8 +603,6 @@ def prepare_inputs_for_generation(
                min_dtype=min_dtype,
                cache_position=cache_position,
                batch_size=batch_size,
-                is_training=is_training,


if we come to prepare static cache from here, then we cannot be in training mode. I don't think it is common to pass labels through generation, right?

I'm not seeing many use-cases indeed, except for maybe constrained generation and RL?

guess so, let's see what generation master (gante) thinks 😄

If labels in paligemma has the usual meaning (=tensor with which we compute the loss, with no further uses), then generate will never use labels :D

nice, yes those are normal labels :)

molbap

LGTM, added comment on training case for generation :)

HuggingFaceDocBuilderDev · 2024-10-04T08:30:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante

LGTM, thank you for fixing 🤗

ArthurZucker

Thanks 🤗

* fix * not flaky anymore + style

zucchini-nlp added 3 commits October 3, 2024 13:53

fix

322b7ca

Merge remote-tracking branch 'upstream/main' into paligemma-fix

0b3d258

not flaky anymore + style

282495b

zucchini-nlp requested review from gante and molbap October 4, 2024 08:06

zucchini-nlp commented Oct 4, 2024

View reviewed changes

molbap approved these changes Oct 4, 2024

View reviewed changes

gante approved these changes Oct 4, 2024

View reviewed changes

gante requested a review from LysandreJik October 4, 2024 13:22

ArthurZucker approved these changes Oct 4, 2024

View reviewed changes

zucchini-nlp merged commit 612065e into huggingface:main Oct 5, 2024
19 checks passed

NielsRogge pushed a commit to NielsRogge/transformers that referenced this pull request Oct 21, 2024

Paligemma: fix static cache test (huggingface#33941)

20ee790

* fix * not flaky anymore + style

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paligemma: fix static cache test #33941

Paligemma: fix static cache test #33941

zucchini-nlp commented Oct 4, 2024

zucchini-nlp Oct 4, 2024

molbap Oct 4, 2024

zucchini-nlp Oct 4, 2024

molbap Oct 4, 2024

zucchini-nlp Oct 4, 2024

gante Oct 4, 2024

zucchini-nlp Oct 4, 2024

molbap left a comment

HuggingFaceDocBuilderDev commented Oct 4, 2024

gante left a comment

ArthurZucker left a comment

Paligemma: fix static cache test #33941

Paligemma: fix static cache test #33941

Conversation

zucchini-nlp commented Oct 4, 2024

What does this PR do?

zucchini-nlp Oct 4, 2024

Choose a reason for hiding this comment

molbap Oct 4, 2024

Choose a reason for hiding this comment

zucchini-nlp Oct 4, 2024

Choose a reason for hiding this comment

molbap Oct 4, 2024

Choose a reason for hiding this comment

zucchini-nlp Oct 4, 2024

Choose a reason for hiding this comment

gante Oct 4, 2024

Choose a reason for hiding this comment

zucchini-nlp Oct 4, 2024

Choose a reason for hiding this comment

molbap left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 4, 2024

gante left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment