- 
                Notifications
    You must be signed in to change notification settings 
- Fork 31k
Uniformize kwargs for Idefics/2 processors #32568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uniformize kwargs for Idefics/2 processors #32568
Conversation
| The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of logic is needed for backward compatibility, as idefics used to take only prompts where text and images inputs would be interleaved. This added logic preserve supports for these kind of inputs (where prompts is replaced by text arg), while adding support for usual text and images inputs as in other image-text-to-text models. This will also be useful to support idefics in the image-text-to-text pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also looks good to me. Just want to clarify what will be the new format for Idefics to make the pipeline happy. Maybe we can add a test for that new format :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this code block is for new processing behavior when users pass images and text.
Not very sure this is a good idea to repeat text several times. Suppose user has one prompt with interleaved images-text, then we would replicate the prompt several times and cause error in downstream modeling code. For ex:
processor(text=["User: What do you see here? Assistant: a cat. User: what about this image?"], images=[image1, image2])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's a good point. Although interleaved images-text is not really supported when providing both images and text for Idefics, as there is no way to indicate where to put the images in the prompt. Maybe I should add a warning here instead of automatically duplicating the prompts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see now, indeed Idefics is a bit peculiar.
Yes interleaving like that is not, but providing more than 1 image per prompt like in multi-turn conversation is okey, as in the dosctring of call method. Then we should expect users to pass as many images as prompts, and they would have to wrap images as a batched list if there's more than one per prompt.
I think we can even raise an error, as we cannot know for sure what is the user expecting with these inputs. An error explaining what kind of input we want and let the user fix it, otherwise users who never read warnings might start complaining in the issues :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added support for multiple images per prompt, and this warning to make it clearer what input format we expect when using image-text-to-text format:
https://github.com/huggingface/transformers/blob/8b171a777bac10bbb9c9a13bd36d6ffd10be9b9d/src/transformers/models/idefics/processing_idefics.py#L353-L358
9799303    to
    8b171a7      
    Compare
  
    747fbe1    to
    6da6fa2      
    Compare
  
    | This should now be ready for review! Cc @molbap @amyeroberts . | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this processor!
Let's split up the changes in the processor tests from the changing of the processor.
6da6fa2    to
    6a62786      
    Compare
  
    | Hey! 🤗 Thanks for your contribution to the  Before merging this pull request, slow tests CI should be triggered. To enable this: 
 (For maintainers) The documentation for slow tests CI on PRs is here. | 
| Hey @ArthurZucker ! 
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM appart from mentioned breaking change for idefics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a breaking change here no? prompts will be passed by previous workflows (ex: prompts=xxx), and we don't check if prompt in kwargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be handled by https://github.com/huggingface/transformers/blob/7a63c6f7c11ecdb423bedb5558b2cbe32c43ed37/src/transformers/models/idefics/processing_idefics.py#L236
with prompts being replaced by text automatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right! missed this indeed, good to go then!
…ess uniformization
7a63c6f    to
    0065697      
    Compare
  
    * Add uniformize idefics processor kwargs and tests * Uniformize idefics2 processor kwargs * add image_processor tests idefics * add BC args order change idefics2 processor and update doc * Add support for multiple images per prompt in image-text-to-text mode idefics * Fix processor input args in idefics tests * improve test processing common, remove unnecessary tests, update process uniformization * fix doctrings idefics * fix tests processors idefics/2
What does this PR do?
Adds uniformized processors kwargs following #31911 for the following models:
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@molbap @zucchini-nlp @amyeroberts