-
Notifications
You must be signed in to change notification settings - Fork 26.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Idefics 3! #32473
base: main
Are you sure you want to change the base?
Add Idefics 3! #32473
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just have a few comments regarding consistency with other VLMs like Chameleon + nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yaaay, Idefics3 is here!
I just left a few comments to make the ongoing work on unifying a bit VLMs easier for us, but didn't really review the whole PR
past_seen_tokens = 0 | ||
return_legacy_cache = False | ||
if use_cache: | ||
if not isinstance(past_key_values, Cache): # kept for BC (non `Cache` `past_key_values` inputs) | ||
return_legacy_cache = True | ||
past_key_values = DynamicCache.from_legacy_cache(past_key_values) | ||
past_seen_tokens = past_key_values.get_seq_length() | ||
|
||
if inputs_embeds is not None and input_ids is None and past_seen_tokens == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to support old-style cache for new models, can go directly with DynamicCache. Finally deprecated tuple cache in all decoder-only models, yay :)
And btw, past-cache-length should be obtained from cache_position
, afaik we'll stop using the get_seq_length()
some time in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I removed the support for the old-style cache.
But I tried to use cache_position
and got this error:
AttributeError: 'DynamicCache' object has no attribute 'cache_position'
So I reverted to get_seq_length()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because past_key_values
can be None, it results in an exception if we load the model without specifying use_case=False. E.g.,
Idefics3ForConditionalGeneration.from_pretrained(base_model_id, use_cache=False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take care of cache-related changes after the PR is merged, as part of work going on "new-cache compatibility"
And yes, past_kv can be None, but the logic with get_seq_length()
should work as long as we check for Noneness
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work adding this model!
The main comment is for the tests to be properly aligned with the new model behaviours, in particular the processor and image processor.
Some general nits for the modeling file - the main one being all classes and method which come from idefics2 should have # Copied from
comments.
Before the PR can be merged, all slow tests will need to be run & pass. These should be triggered in subsequent commits (I might need to approve the workflow for them to run)
if isinstance(image, Image.Image): | ||
width, height = image.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The images should never be Image.Image here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this here as it is a custom transforms for idefics3. If the images are passed as PIL objects, we don't convert them to numpy arrays until later in the processing pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also raising a warning once if the input is not PIL images. But it will still work for numpy arrays or other types of inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the warning as now it works perfectly with numpy arrays :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still should be removed - we shouldn't be processing PIL images
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed!
if isinstance(image, Image.Image): | ||
cropped_image = image.crop((start_x, start_y, end_x, end_y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - there shouldn't be any PIL code for the transformations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this here as it is a custom transforms for idefics3. If the images are passed as PIL objects, we don't convert them to numpy arrays until later in the processing pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After further discussion with Amy, I added a large change to support processing the images as numpy arrays. Here: 5a0c0f4
1c10a48
to
5a0c0f4
Compare
f67ed1e
to
e71711d
Compare
updated main |
0f7a8e6
to
4756044
Compare
@andimarafioti I can see that you re-requested review, but there's still some debugging commits being pushed so will hold of reviewing until this has been resolved. I'm going to unsubscribe to prevent getting notifications for every push - as soon as you have a question or want to let me know it's ready, just ping me with |
ef576cb
to
483d5d8
Compare
@amyeroberts ready to review! There is still the multi-gpu tests that is queued but if those fail I would skip them. They run OOM in the CI on single GPU and there is an issue open for the same on idefics2 #32288. If my fix here works, I already opened a PR for that fix as well: #32840 |
Talked to @molbap and he said that there is an issue with the multi-gpu workers getting stuck in the queue. But it's not related to this PR. |
@andimarafioti Yes, unfortunately we're having issues at the moment with single GPU and multi-GPU runners taking a long time to run / never running cc @ydshieh. It looks like the multi GPU tests did eventually run and there's at least one test which is currently failing to be addressed |
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
I rebased on main |
7ee2ec8
to
ee041bf
Compare
1bbf7ba
to
39d88b2
Compare
What does this PR do?
Adding the Idefics 3 model.
There are still a few things to do before merging this PR. The results are not exactly the same as with our codebase and the tests are not done. We are opening it to unblock our release.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Models: