VLMs: enable generation tests #33533

zucchini-nlp · 2024-09-17T09:35:54Z

What does this PR do?

Part of #33374. This PR adds GenerationTesterMixin in all VLMs (except BLIP) and modifies tests to accept all input kwargs, not only input_ids. That way we can test the whole logic, including merging image and text embeddings.

I also ran a few tests for other models, and locally nothing seemed broken

cc @amyeroberts if you want to take a look :)

zucchini-nlp · 2024-09-17T09:54:02Z

tests/generation/test_utils.py

@@ -98,9 +98,16 @@ class GenerationTesterMixin:

    def _get_input_ids_and_config(self, batch_size=2):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
-        input_ids = inputs_dict[self.input_name]
+        input_ids = inputs_dict.pop(self.input_name)[:batch_size, :]


I'd day we use model.main_input_name here but let's leave it for the next PR. Seems it will cause many tests to fail 😓

let's add a todo then! :)

zucchini-nlp · 2024-09-17T09:56:06Z

tests/models/llava/test_modeling_llava.py

@@ -80,7 +81,7 @@ def __init__(
            "initializer_range": 0.02,
            "num_labels": 3,
            "num_choices": 4,
-            "pad_token_id": 0,
+            "pad_token_id": 1,


I chaneg this so that it doesn't clash with special image token, so when we artificially pad input sequence it won't get extra image placeholders than needed

zucchini-nlp · 2024-09-17T10:55:40Z

tests/generation/test_utils.py

+        # we don't want encoder-decoder models to start from filled decoder ids
+        _ = inputs_dict.pop("decoder_input_ids", None)
+        _ = inputs_dict.pop("decoder_attention_mask", None)
+
+        # we'll set cache use in each test differently
+        _ = inputs_dict.pop("use_cache", None)


Hope it's okay this way, otherwise we can override this in each model which returns decoder inputs. I don't think we need a new helper like get_input_for_generation as the current one works good already

HuggingFaceDocBuilderDev · 2024-09-17T15:19:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante

🔥🔥

(having tests makes me happy!)

gante · 2024-09-18T14:23:50Z

tests/generation/test_utils.py

@@ -98,9 +98,16 @@ class GenerationTesterMixin:

    def _get_input_ids_and_config(self, batch_size=2):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
-        input_ids = inputs_dict[self.input_name]
+        input_ids = inputs_dict.pop(self.input_name)[:batch_size, :]


let's add a todo then! :)

tests/generation/test_utils.py

gante · 2024-09-18T14:36:38Z

tests/generation/test_utils.py

+            if any(name in model.__class__.__name__.lower() for name in ("blip", "llava", "paligemma")):
+                self.skipTest(
+                    "For VLMs inputs embeds won't match input ids unless images are encoded and merged with ids properly"
+                )
+


This tags the whole test as skipped, which doesn't seem to be the goal (if the skip is reached, then the test above passes).

Let's simply run the rest of the test if a if condition is satisfied, and leave a TODO to work on these models

Oh, btw, just so you know, the below part is not running right now because it should be cache_position in singular. I didn't fix it here because fixing -> many hidden failures. I can work on it as separate PR or feel free to fix it if you have bandwidth 😄

oh no -- let's work it on a separate PR

gante · 2024-09-18T14:37:22Z

tests/generation/test_utils.py

-            outputs_from_ids = model.generate(input_ids)
-            self.assertEqual(outputs_from_ids.shape, (2, 20))
+            outputs_from_ids = model.generate(input_ids, max_new_tokens=5)
+            self.assertEqual(outputs_from_ids.shape, (input_ids.shape[0], input_ids.shape[1] + 5))


yes, let's not rely on defaults 👍

Co-authored-by: Joao Gante <[email protected]>

zucchini-nlp · 2024-09-18T17:28:33Z

Will try to find the root reason for flaky tests, otherwise we'll need to override and add is_flaky for VLMs. Maybe related to VLMs generating special image tokens because weights are not trained and technically anything can be generated

…sio tower

gante · 2024-09-19T11:37:43Z

@zucchini-nlp the PR still missed a review from a core maintainer :P

(cc @LysandreJik for a quick look at the PR)

zucchini-nlp · 2024-09-19T11:44:26Z

Omg! O_o you're right, sorry, completely missed that

* add tests * fix whisper * update * nit * add qwen2-vl * more updates! * better this way * fix this one * fix more tests * fix final tests, hope so * fix led * Update tests/generation/test_utils.py Co-authored-by: Joao Gante <[email protected]> * pr comments * not pass pixels and extra for low-mem tests, very flaky because of visio tower --------- Co-authored-by: Joao Gante <[email protected]>

zucchini-nlp added 2 commits September 17, 2024 11:35

add tests

acff489

fix whisper

ef6f1b4

zucchini-nlp commented Sep 17, 2024

View reviewed changes

update

49e7ce5

zucchini-nlp commented Sep 17, 2024

View reviewed changes

zucchini-nlp added 3 commits September 17, 2024 13:45

nit

1c7f4e4

add qwen2-vl

efdf143

more updates!

0d19798

zucchini-nlp added 5 commits September 17, 2024 17:26

better this way

d26cc4a

fix this one

c436d62

fix more tests

7b88bfe

fix final tests, hope so

790eccb

fix led

05a33aa

zucchini-nlp requested a review from gante September 18, 2024 14:06

zucchini-nlp mentioned this pull request Sep 18, 2024

Adding mplugdocowl #31792

Open

5 tasks

gante approved these changes Sep 18, 2024

View reviewed changes

zucchini-nlp and others added 2 commits September 18, 2024 17:07

Update tests/generation/test_utils.py

c8ede3a

Co-authored-by: Joao Gante <[email protected]>

pr comments

b7c39a2

not pass pixels and extra for low-mem tests, very flaky because of vi…

b3f5509

…sio tower

zucchini-nlp merged commit d7975a5 into huggingface:main Sep 19, 2024
23 checks passed

This was referenced Sep 19, 2024

Fix missing test in torch_job #33593

Merged

VLM generate: tests can't generate image/video tokens #33623

Merged

gante mentioned this pull request Sep 20, 2024

VLM Generate: tag test_static_cache_matches_dynamic as flaky #33630

Merged

zucchini-nlp mentioned this pull request Oct 4, 2024

Track progress for VLMs refactoring #33374

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLMs: enable generation tests #33533

VLMs: enable generation tests #33533

zucchini-nlp commented Sep 17, 2024 •

edited

Loading

zucchini-nlp Sep 17, 2024

gante Sep 18, 2024

zucchini-nlp Sep 17, 2024

zucchini-nlp Sep 17, 2024

HuggingFaceDocBuilderDev commented Sep 17, 2024

gante left a comment

gante Sep 18, 2024

gante Sep 18, 2024

zucchini-nlp Sep 18, 2024

gante Sep 18, 2024

gante Sep 18, 2024

zucchini-nlp commented Sep 18, 2024

gante commented Sep 19, 2024 •

edited

Loading

zucchini-nlp commented Sep 19, 2024

VLMs: enable generation tests #33533

VLMs: enable generation tests #33533

Conversation

zucchini-nlp commented Sep 17, 2024 • edited Loading

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 17, 2024

gante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zucchini-nlp commented Sep 18, 2024

gante commented Sep 19, 2024 • edited Loading

zucchini-nlp commented Sep 19, 2024

zucchini-nlp commented Sep 17, 2024 •

edited

Loading

gante commented Sep 19, 2024 •

edited

Loading