Standardize image-text-to-text-models outputs #32471

yonigozlan · 2024-08-06T15:04:43Z

What does this PR do?

Standardize outputs for existing image-text-to-text models by adding a post_process_image_text_to_text function to their processor.
Blocking PR for image-text-to-text pipeline.

The following models' processors need to be modified:

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@molbap @amyeroberts

HuggingFaceDocBuilderDev · 2024-08-06T15:24:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

leloykun · 2024-08-06T18:28:30Z

@yonigozlan Chameleon can also do image-text-to-text

yonigozlan · 2024-08-06T19:08:09Z

@yonigozlan Chameleon can also do image-text-to-text

Thanks! Will add it to the list

molbap

Thanks for working on this! left a few comments, moving on to 32472 now

src/transformers/models/donut/processing_donut.py

src/transformers/models/fuyu/processing_fuyu.py

src/transformers/models/idefics/processing_idefics.py

src/transformers/models/paligemma/processing_paligemma.py

amyeroberts

Very nice! Looking forward to having all of the processing behaviour more standardized ❤️

Main comment is on the handling of the legacy behaviour

src/transformers/models/paligemma/processing_paligemma.py

src/transformers/models/pix2struct/processing_pix2struct.py

src/transformers/models/donut/processing_donut.py

src/transformers/models/paligemma/processing_paligemma.py

src/transformers/models/udop/processing_udop.py

src/transformers/models/idefics/processing_idefics.py

add post_process_image_text_to_text to chameleon and cleanup Fix legacy kwarg behavior and deprecation warning add post_process_image_text_to_text to qwen2_vl and llava_onevision Add post_process_image_text_to_text to idefics3, mllama, pixtral processor

yonigozlan · 2024-10-15T09:12:12Z

@LysandreJik This should be ready for a final review, and should significantly reduce the loc count and number of files changed for the image-text-to-text pipeline PR :).
One thing to note is the use of a "legacy" kwarg for BC, as some changes are needed in the preprocessing of some models if we want them to work seamlessly with the image-text-to-text pipeline. I'm open to suggestion if there is a better way to handle this

LysandreJik · 2024-10-17T15:21:20Z

cc @molbap can you do an initial review please?

LysandreJik · 2024-10-17T15:21:37Z

Maybe not initial, but pre-final ? 😁

yonigozlan · 2024-10-31T19:49:36Z

The changes from this PR were merged in #34170

yonigozlan marked this pull request as ready for review August 6, 2024 15:36

yonigozlan requested a review from molbap August 6, 2024 15:36

yonigozlan mentioned this pull request Aug 7, 2024

Uniform kwargs for processors #31911

Open

40 tasks

molbap reviewed Aug 7, 2024

View reviewed changes

amyeroberts reviewed Aug 13, 2024

View reviewed changes

yonigozlan force-pushed the standardize-inputs-outputs branch from f25eb1d to 7074649 Compare August 14, 2024 15:08

yonigozlan force-pushed the standardize-inputs-outputs branch from 7074649 to aa2b417 Compare September 12, 2024 15:21

yonigozlan force-pushed the standardize-inputs-outputs branch 2 times, most recently from 4137b24 to 04fb918 Compare October 3, 2024 11:18

yonigozlan added 3 commits October 11, 2024 09:23

nit var name post_process_image_text_to_text udop

70926ae

nit fix deprecation warnings

bc5cf3c

yonigozlan force-pushed the standardize-inputs-outputs branch from 04fb918 to bc5cf3c Compare October 11, 2024 09:39

yonigozlan mentioned this pull request Oct 15, 2024

Add image text to text pipeline #34170

Merged

2 tasks

yonigozlan requested a review from LysandreJik October 15, 2024 09:12

yonigozlan requested a review from molbap October 17, 2024 15:24

yonigozlan closed this Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize image-text-to-text-models outputs #32471

Standardize image-text-to-text-models outputs #32471

yonigozlan commented Aug 6, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 6, 2024

leloykun commented Aug 6, 2024

yonigozlan commented Aug 6, 2024

molbap left a comment

amyeroberts left a comment

yonigozlan commented Oct 15, 2024

LysandreJik commented Oct 17, 2024

LysandreJik commented Oct 17, 2024

yonigozlan commented Oct 31, 2024

Standardize image-text-to-text-models outputs #32471

Standardize image-text-to-text-models outputs #32471

Conversation

yonigozlan commented Aug 6, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Aug 6, 2024

leloykun commented Aug 6, 2024

yonigozlan commented Aug 6, 2024

molbap left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

yonigozlan commented Oct 15, 2024

LysandreJik commented Oct 17, 2024

LysandreJik commented Oct 17, 2024

yonigozlan commented Oct 31, 2024

yonigozlan commented Aug 6, 2024 •

edited

Loading