Uniformize kwargs for Layoutlm (2, 3, X) processors #32180

leloykun · 2024-07-24T09:32:43Z

What does this PR do?

Uniformizes kwargs for LayoutLM (2, 3, X) processors as discussed in Uniform kwargs for processors #31911
Also fixes the bug in tests_exotic_models which totally prevents any test from being run

Fixes # (issue)

Uniform kwargs for processors #31911

Who can review?

@zucchini-nlp @molbap @NielsRogge

src/transformers/processing_utils.py

zucchini-nlp

Thanks for working on this! I left a few comments, regarding the apply_ocr arg. I got you point, but we need to test it and make sure it's working for different cases.

So can you add the Mixin test to run general tests + write a special test for apply_ocr with different scenarios ?

src/transformers/models/layoutlmv2/processing_layoutlmv2.py

src/transformers/models/layoutlmv3/processing_layoutlmv3.py

src/transformers/processing_utils.py

leloykun · 2024-08-15T17:21:24Z

src/transformers/models/layoutlmv3/processing_layoutlmv3.py

+        output_kwargs = self._merge_kwargs(
+            LayoutLMv3ProcessorKwargs,
+            tokenizer_init_kwargs=self.tokenizer.init_kwargs,
+            image_processor_init_kwargs=self.image_processor.init_kwargs,


@zucchini-nlp @NielsRogge

The test group tests_exotic_models may be broken. It's not running any tests at all. And the tests for the layoutlm models can only be ran on this test group (cuz they require pytesseract which is only installed in the docker container for this test).

I've already fixed the bug above but while it was there, it didn't cause any of the tests to fail.

@NielsRogge are the layoutlmv3 models deprecated?

leloykun · 2024-08-15T18:46:35Z

Note for reviewer(s): the failing tests are caused by the Jamba model

leloykun · 2024-08-15T20:21:29Z

.circleci/create_circleci_config.py

-        "tests/models/*layoutlmv*",
-        "tests/models/*nat",
-        "tests/models/deta",
-        "tests/models/udop",
-        "tests/models/nougat",
+        *glob.glob("tests/models/layoutlm*/*.py", recursive=True),
+        *glob.glob("tests/models/layoutxlm/*.py", recursive=True),
+        *glob.glob("tests/models/*nat/*.py", recursive=True),
+        *glob.glob("tests/models/deta/*.py", recursive=True),
+        *glob.glob("tests/models/udop/*.py", recursive=True),
+        *glob.glob("tests/models/nougat/*.py", recursive=True),


@zucchini-nlp @NielsRogge

Update: I've just fixed it!

This test may have been broken for months now (perhaps up to 22 months). I'm not sure why, but I think it has to do with CircleCI's filtering mechanism. Directly listing out the files to test fixed the issue.

cc @ydshieh for CI

zucchini-nlp

Looks good to me overall! One thing left is to move all kwargs to ModalityKwargs and handle backwards compatibility for that

src/transformers/models/layoutlmv2/processing_layoutlmv2.py

src/transformers/models/layoutlmv3/processing_layoutlmv3.py

tests/models/layoutlmv2/test_processor_layoutlmv2.py

zucchini-nlp · 2024-08-19T07:09:01Z

src/transformers/processing_utils.py

+        if len(args):
+            warnings.warn(
+                "Passing positional arguments to the processor call is now deprecated and will be disallowed in future versions. "
+                "Please pass all arguments as keyword arguments."
+            )
+        if len(args) > len(self.optional_call_args):
+            raise ValueError(
+                f"Expected *at most* {len(self.optional_call_args)} optional positional arguments in processor call but received {len(args)}."
+                "Passing positional arguments to the processor call is not recommended"
+            )
+        return {arg_name: arg_value for arg_value, arg_name in zip(args, self.optional_call_args)}


Cool! We can modify this later after everyone agrees on this workaround. Overall looks good to me, except for some nits I left in another PR.

IMO we can get a core maintainer review on this one which is easier to iterate and then use the agreed deprecation format in PR "all processors with extra args"

And ofc we'll need a test here 😄

leloykun · 2024-09-24T15:29:58Z

@zucchini-nlp @yonigozlan I've just rebased this to main

the failing tests are due to Llava-Next:

but yah, this PR should now be ready for review

leloykun commented Jul 24, 2024

View reviewed changes

src/transformers/processing_utils.py Outdated Show resolved Hide resolved

zucchini-nlp reviewed Jul 29, 2024

View reviewed changes

zucchini-nlp mentioned this pull request Aug 7, 2024

Uniform kwargs for processors #31911

Open

40 tasks

leloykun force-pushed the fc--uniform-kwargs-layoutlmv branch 2 times, most recently from 1fafbe9 to 85b4aec Compare August 15, 2024 15:04

leloykun commented Aug 15, 2024

View reviewed changes

leloykun requested a review from zucchini-nlp August 15, 2024 18:37

leloykun commented Aug 15, 2024

View reviewed changes

zucchini-nlp reviewed Aug 16, 2024

View reviewed changes

src/transformers/models/layoutlmv2/processing_layoutlmv2.py Outdated Show resolved Hide resolved

src/transformers/models/layoutlmv2/processing_layoutlmv2.py Show resolved Hide resolved

src/transformers/models/layoutlmv3/processing_layoutlmv3.py Outdated Show resolved Hide resolved

leloykun requested a review from zucchini-nlp August 16, 2024 07:12

molbap mentioned this pull request Aug 16, 2024

Uniformize kwargs for image-text-to-text processors #32544

Merged

11 tasks

zucchini-nlp reviewed Aug 19, 2024

View reviewed changes

leloykun force-pushed the fc--uniform-kwargs-layoutlmv branch from a7e623f to 8dc5463 Compare September 24, 2024 15:00

leloykun added 3 commits September 28, 2024 09:51

uniformize layoutlm* processor kwargs

2da4134

minor nits on typing & imports

8b70d5d

fix tests

1ad69a5

leloykun force-pushed the fc--uniform-kwargs-layoutlmv branch from 964be32 to 1ad69a5 Compare September 28, 2024 01:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uniformize kwargs for Layoutlm (2, 3, X) processors #32180

Uniformize kwargs for Layoutlm (2, 3, X) processors #32180

leloykun commented Jul 24, 2024 •

edited

Loading

zucchini-nlp left a comment

leloykun Aug 15, 2024 •

edited

Loading

leloykun commented Aug 15, 2024

leloykun Aug 15, 2024

zucchini-nlp Aug 16, 2024

zucchini-nlp left a comment

zucchini-nlp Aug 19, 2024

leloykun commented Sep 24, 2024

Uniformize kwargs for Layoutlm (2, 3, X) processors #32180

Are you sure you want to change the base?

Uniformize kwargs for Layoutlm (2, 3, X) processors #32180

Conversation

leloykun commented Jul 24, 2024 • edited Loading

What does this PR do?

Who can review?

zucchini-nlp left a comment

Choose a reason for hiding this comment

leloykun Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

leloykun commented Aug 15, 2024

leloykun Aug 15, 2024

Choose a reason for hiding this comment

zucchini-nlp Aug 16, 2024

Choose a reason for hiding this comment

zucchini-nlp left a comment

Choose a reason for hiding this comment

zucchini-nlp Aug 19, 2024

Choose a reason for hiding this comment

leloykun commented Sep 24, 2024

leloykun commented Jul 24, 2024 •

edited

Loading

leloykun Aug 15, 2024 •

edited

Loading