Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniformize model processors (models *with* special arg names) #32841

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

leloykun
Copy link
Contributor

@leloykun leloykun commented Aug 16, 2024

What does this PR do?

  • Uniformizes kwargs for processors of ClipSeq, Nougat, Owlv2, OwlVIT as discussed in Uniform kwargs for processors #31911
  • Adds backward compatibility for special call arguments passed as positional arguments. Special call args are arguments that carry data (e.g. negative prompt, segmentation images, etc.), but aren't [text, images, audio, videos] and not config values for the tokenizer, image processor, etc.

(arguments that carry data

TODO:

  • add tests
    • ClipSeg
    • Nougat
    • OwlV2
    • OwlVIT

Fixes # (issue)

Who can review?

@zucchini-nlp @molbap @NielsRogge

@leloykun leloykun mentioned this pull request Aug 16, 2024
40 tasks
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I think handling the text_pair etc kwargs is a bit more challenging, we should be able to use it solely with TypedDicts and without adding extra args

src/transformers/models/nougat/processing_nougat.py Outdated Show resolved Hide resolved
Comment on lines 107 to 115
if output_kwargs["text_kwargs"].get("text_pair") is not None and audio is not None:
raise ValueError(
"You cannot provide `text_pair` as a positional argument and as a keyword argument at the same time."
"Please provide it only as a keyword argument (i.e. `text_pair=...`)."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it is not robust to rely on audio here - or at least we have to explicitly comment why we are doing that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's what we also do in UDOP. Maybe a comment explaining is enough as audio arg-name isn't expected in Nougat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I just followed the code here: #32544 (comment)

without this, old code that passed the args as positional arguments would break

I'll add comments & warnings for now, but lemme guys know what you think is best

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on the other PR as well. I missed it for Udop but I don't think it's a good precedent unfortunately - it's a bit hacky as it stands, trying to see if we can do it some other way that's cleaner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this! I have a new approach which I described in more detail here: #31911 (comment)

pls lemme know what you think abt it!

src/transformers/models/nougat/processing_nougat.py Outdated Show resolved Hide resolved
@leloykun leloykun changed the title Uniformize kwargs for Nougat processor Uniformize model processors (models *with* special arg names) Aug 17, 2024
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this awesome work!

The workaround seems a bit hacky but I guess we need that for BC. Can we add tests for the newly added prepare_and_validate_optional_call_arg in the models that support extra args? And let's add a version when we'll stop handling BC for users, it can be until v4.47 which gives two major versions for users

Comment on lines 1015 to 1018
raise ValueError(
f"Expected *at most* {len(self.optional_call_args)} optional positional arguments in processor call but received {len(args)}."
"Passing positional arguments to the processor call is not recommended"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message imo isn't very informative for users who have no idea what is happening under the hood. Maybe we could show optional_call_args names and say we couldn't map positional args to those

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just changed the error message

@zucchini-nlp can you check if the new one's okay?

Comment on lines +78 to +79
text: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
images: Optional[ImageInput] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another thing we're doing now is swap the arg order, so that it is image, text, audio, videos. And that needs another deprecation cycle...

BTW, i am quite out of the loop, do we need this order-swapping for pipeline @yonigozlan ?

src/transformers/models/clipseg/processing_clipseg.py Outdated Show resolved Hide resolved
tests/test_processing_common.py Outdated Show resolved Hide resolved
@@ -232,6 +233,7 @@ def thumbnail(
The channel dimension format of the input image. If not provided, it will be inferred.
"""
input_height, input_width = get_image_size(image, channel_dim=input_data_format)
size = get_size_dict(size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear why we needed these changes, was this causing CI failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are utils for old processors that don't support the new image size format yet

we might as well add these here since (1) they help w/ backwards compatibility, (2) make the image-text-to-text pipeline easier to implement, & (3) they just revert to a no-op if size already follows the new image size format

Comment on lines +76 to +77
text: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
images: Optional[ImageInput] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and also, these two should be swapped so that the order is image, text, kwargs

Comment on lines +92 to +94
text: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
images: Optional[ImageInput] = None,
# The following is to capture `visual_prompt` argument that may be passed as a positional argument.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for swapping

Comment on lines +111 to +113
text_pair: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]]
text_target: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]
text_pair_target: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be added here and not be treated as model-specific args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're in the base tokenizer class so I figured it's okay to put them here

lemme know if I should remove them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally for me it's okey to have it here, as it is a general arg accepted by all tokenizers, though not used by all processors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, that will remove the need for special args for Udop so that's great :)

@leloykun
Copy link
Contributor Author

I'll also swap the text & images args in a separate PR

This PR should now be ready for review

@yonigozlan
Copy link
Member

yonigozlan commented Sep 9, 2024

Hi @leloykun ! Pinging this as blocking requirements for this PR should all be completed soon! Notably the function to check that the images and text inputs are in the correct order is merged. Here's how it is used in LlaVa for example.

Also I think your test_processing_common in this PR is great, and generalizes better to most models than the current one. Would you mind opening a separate PR just for that to get it merged quickly so that other PRs can use it?
I'd say the same for processing_utils.
Thanks a lot!

@yonigozlan
Copy link
Member

Hi again @leloykun , I opened a PR with some of your changes to processing_utils.py and test_processing_common.py here #33479 .
One notable change is in test_processing_common where I replaced the testing of images kwargs with size and crop_size to do_rescale and rescale_factor, as the fact that not all image_processors accept size or crop_size.

If you think you will have the bandwidth to work on this, feel free to open a PR and I'll close mine. In any case, thanks a lot for your contributions on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants