Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Idefics 3! #32473

Open
wants to merge 51 commits into
base: main
Choose a base branch
from
Open

Add Idefics 3! #32473

wants to merge 51 commits into from

Conversation

andimarafioti
Copy link
Member

@andimarafioti andimarafioti commented Aug 6, 2024

What does this PR do?

Adding the Idefics 3 model.

There are still a few things to do before merging this PR. The results are not exactly the same as with our codebase and the tests are not done. We are opening it to unblock our release.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Models:

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@leloykun leloykun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just have a few comments regarding consistency with other VLMs like Chameleon + nits

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yaaay, Idefics3 is here!

I just left a few comments to make the ongoing work on unifying a bit VLMs easier for us, but didn't really review the whole PR

src/transformers/models/idefics3/configuration_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/modeling_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/modeling_idefics3.py Outdated Show resolved Hide resolved
Comment on lines 940 to 953
past_seen_tokens = 0
return_legacy_cache = False
if use_cache:
if not isinstance(past_key_values, Cache): # kept for BC (non `Cache` `past_key_values` inputs)
return_legacy_cache = True
past_key_values = DynamicCache.from_legacy_cache(past_key_values)
past_seen_tokens = past_key_values.get_seq_length()

if inputs_embeds is not None and input_ids is None and past_seen_tokens == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to support old-style cache for new models, can go directly with DynamicCache. Finally deprecated tuple cache in all decoder-only models, yay :)

And btw, past-cache-length should be obtained from cache_position, afaik we'll stop using the get_seq_length() some time in the future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I removed the support for the old-style cache.

But I tried to use cache_position and got this error:
AttributeError: 'DynamicCache' object has no attribute 'cache_position'
So I reverted to get_seq_length()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because past_key_values can be None, it results in an exception if we load the model without specifying use_case=False. E.g.,

Idefics3ForConditionalGeneration.from_pretrained(base_model_id, use_cache=False)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take care of cache-related changes after the PR is merged, as part of work going on "new-cache compatibility"

And yes, past_kv can be None, but the logic with get_seq_length() should work as long as we check for Noneness :)

src/transformers/models/idefics3/modeling_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work adding this model!

The main comment is for the tests to be properly aligned with the new model behaviours, in particular the processor and image processor.

Some general nits for the modeling file - the main one being all classes and method which come from idefics2 should have # Copied from comments.

Before the PR can be merged, all slow tests will need to be run & pass. These should be triggered in subsequent commits (I might need to approve the workflow for them to run)

docs/source/en/model_doc/idefics3.md Outdated Show resolved Hide resolved
tests/models/idefics3/test_image_processing_idefics3.py Outdated Show resolved Hide resolved
tests/models/idefics3/test_image_processing_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/configuration_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/configuration_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/modeling_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved
Comment on lines 155 to 162
if isinstance(image, Image.Image):
width, height = image.size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The images should never be Image.Image here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this here as it is a custom transforms for idefics3. If the images are passed as PIL objects, we don't convert them to numpy arrays until later in the processing pipeline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also raising a warning once if the input is not PIL images. But it will still work for numpy arrays or other types of inputs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the warning as now it works perfectly with numpy arrays :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still should be removed - we shouldn't be processing PIL images

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed!

Comment on lines 213 to 214
if isinstance(image, Image.Image):
cropped_image = image.crop((start_x, start_y, end_x, end_y))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - there shouldn't be any PIL code for the transformations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this here as it is a custom transforms for idefics3. If the images are passed as PIL objects, we don't convert them to numpy arrays until later in the processing pipeline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further discussion with Amy, I added a large change to support processing the images as numpy arrays. Here: 5a0c0f4

@andimarafioti
Copy link
Member Author

updated main

@amyeroberts
Copy link
Collaborator

@andimarafioti I can see that you re-requested review, but there's still some debugging commits being pushed so will hold of reviewing until this has been resolved. I'm going to unsubscribe to prevent getting notifications for every push - as soon as you have a question or want to let me know it's ready, just ping me with @amyeroberts and I'll get a notification :)

@andimarafioti andimarafioti force-pushed the idefics3 branch 3 times, most recently from ef576cb to 483d5d8 Compare August 16, 2024 07:36
@andimarafioti
Copy link
Member Author

@amyeroberts ready to review! There is still the multi-gpu tests that is queued but if those fail I would skip them. They run OOM in the CI on single GPU and there is an issue open for the same on idefics2 #32288. If my fix here works, I already opened a PR for that fix as well: #32840

@andimarafioti
Copy link
Member Author

Talked to @molbap and he said that there is an issue with the multi-gpu workers getting stuck in the queue. But it's not related to this PR.

@amyeroberts
Copy link
Collaborator

@andimarafioti Yes, unfortunately we're having issues at the moment with single GPU and multi-GPU runners taking a long time to run / never running cc @ydshieh.

It looks like the multi GPU tests did eventually run and there's at least one test which is currently failing to be addressed

andimarafioti and others added 23 commits September 18, 2024 12:30
@andimarafioti
Copy link
Member Author

I rebased on main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.