-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uniformize kwargs for Layoutlm (2, 3, X) processors #32180
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I left a few comments, regarding the apply_ocr
arg. I got you point, but we need to test it and make sure it's working for different cases.
So can you add the Mixin test to run general tests + write a special test for apply_ocr
with different scenarios ?
1fafbe9
to
85b4aec
Compare
output_kwargs = self._merge_kwargs( | ||
LayoutLMv3ProcessorKwargs, | ||
tokenizer_init_kwargs=self.tokenizer.init_kwargs, | ||
image_processor_init_kwargs=self.image_processor.init_kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test group tests_exotic_models may be broken. It's not running any tests at all. And the tests for the layoutlm models can only be ran on this test group (cuz they require pytesseract
which is only installed in the docker container for this test).
I've already fixed the bug above but while it was there, it didn't cause any of the tests to fail.
@NielsRogge are the layoutlmv3 models deprecated?
Note for reviewer(s): the failing tests are caused by the Jamba model |
.circleci/create_circleci_config.py
Outdated
"tests/models/*layoutlmv*", | ||
"tests/models/*nat", | ||
"tests/models/deta", | ||
"tests/models/udop", | ||
"tests/models/nougat", | ||
*glob.glob("tests/models/layoutlm*/*.py", recursive=True), | ||
*glob.glob("tests/models/layoutxlm/*.py", recursive=True), | ||
*glob.glob("tests/models/*nat/*.py", recursive=True), | ||
*glob.glob("tests/models/deta/*.py", recursive=True), | ||
*glob.glob("tests/models/udop/*.py", recursive=True), | ||
*glob.glob("tests/models/nougat/*.py", recursive=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I've just fixed it!
This test may have been broken for months now (perhaps up to 22 months). I'm not sure why, but I think it has to do with CircleCI's filtering mechanism. Directly listing out the files to test fixed the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ydshieh for CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall! One thing left is to move all kwargs to ModalityKwargs
and handle backwards compatibility for that
src/transformers/processing_utils.py
Outdated
if len(args): | ||
warnings.warn( | ||
"Passing positional arguments to the processor call is now deprecated and will be disallowed in future versions. " | ||
"Please pass all arguments as keyword arguments." | ||
) | ||
if len(args) > len(self.optional_call_args): | ||
raise ValueError( | ||
f"Expected *at most* {len(self.optional_call_args)} optional positional arguments in processor call but received {len(args)}." | ||
"Passing positional arguments to the processor call is not recommended" | ||
) | ||
return {arg_name: arg_value for arg_value, arg_name in zip(args, self.optional_call_args)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! We can modify this later after everyone agrees on this workaround. Overall looks good to me, except for some nits I left in another PR.
IMO we can get a core maintainer review on this one which is easier to iterate and then use the agreed deprecation format in PR "all processors with extra args"
And ofc we'll need a test here 😄
a7e623f
to
8dc5463
Compare
@zucchini-nlp @yonigozlan I've just rebased this to main the failing tests are due to Llava-Next: but yah, this PR should now be ready for review |
964be32
to
1ad69a5
Compare
What does this PR do?
tests_exotic_models
which totally prevents any test from being runFixes # (issue)
Who can review?
@zucchini-nlp @molbap @NielsRogge