-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add uniform processors for altclip + chinese_clip #31198
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
Overall looks good to me. Main thought is that we probably want to move a lot of the kwarg preparation logic outside of the main __call__
method, and it can probably be abstracted into something more general for most processors.
Regarding tests, it would be good to check defaults and updates are correctly passed for all of the possible kwargs, particularly defaulting to the tokenizer.init_kwargs and ensuring these can be overridden.
Would be good to get a second opinion from @qubvel here!
Thanks! I'm currently moving the kwargs merging in a common method actually, and explaining order of operations wrt to kwargs priorities. Will be in #31197 in a minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work 🤗 Very structured and clean code!
I have the same concern as @amyeroberts that the logic of merging is worth split/move to separate methods. It is a bit hard to follow merging steps of different kwargs.
What do you think about grouping merging strategy per modality instead of per step?
not sure if its possible, because some kwargs, probably, have to be popped in advance.
Something like this:
images_kwargs = {} if images_kwargs is None else images_kwargs
default_images_kwargs = ChineseClipProcessorKwargs._defaults.get("images_kwargs", {}).copy()
# merging vision kwargs
images_kwargs = {**default_images_kwargs, **images_kwargs} # with default
images_kwargs = {**images_kwargs, **kwargs.pop("images_kwargs", {})} # with passed dict
images_kwargs = merge_with_kwargs_by_key(images_kwargs, ...)
images_kwargs = merge_with_common(images_kwargs, common_kwargs)
@amyeroberts I think this one is ok for a review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for all the work on this 🤗
Re-pinging @amyeroberts on that one and @qubvel and @zucchini-nlp optionally if you want to take another look - not much changed, just was between sprints and didn't merge that one 😬 should be good to go now (I know you approved, Amy, but that was a while ago!) |
Tests that were working are broken since passing from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding!
* add initial design for uniform processors + align model * add uniform processors for altclip + chinese_clip * fix mutable default 👀 * add configuration test * handle structured kwargs w defaults + add test * protect torch-specific test * fix style * fix * rebase * update processor to generic kwargs + test * fix style * add sensible kwargs merge * update test * fix assertEqual * move kwargs merging to processing common * rework kwargs for type hinting * just get Unpack from extensions * run-slow[align] * handle kwargs passed as nested dict * add from_pretrained test for nested kwargs handling * [run-slow]align * update documentation + imports * update audio inputs * protect audio types, silly * try removing imports * make things simpler * simplerer * move out kwargs test to common mixin * [run-slow]align * skip tests for old processors * [run-slow]align, clip * !$#@!! protect imports, darn it * [run-slow]align, clip * [run-slow]align, clip * update common processor testing * add altclip * add chinese_clip * add pad_size * [run-slow]align, clip, chinese_clip, altclip * remove duplicated tests * fix * update doc * improve documentation for default values * add model_max_length testing This parameter depends on tokenizers received. * Raise if kwargs are specified in two places * fix * match defaults * force padding * fix tokenizer test * clean defaults * move tests to common * remove try/catch block * deprecate kwarg * format * add copyright + remove unused method * [run-slow]altclip, chinese_clip * clean imports * fix version * clean up deprecation * fix style * add corner case test on kwarg overlap * resume processing - add Unpack as importable * add tmpdirname * fix altclip * fix up * add back crop_size to specific tests * generalize tests to possible video_processor * add back crop_size arg * fixup overlapping kwargs test for qformer_tokenizer * remove copied from * fixup chinese_clip tests values * fixup tests - qformer tokenizers * [run-slow] altclip, chinese_clip * remove prepare_image_inputs
* add initial design for uniform processors + align model * add uniform processors for altclip + chinese_clip * fix mutable default 👀 * add configuration test * handle structured kwargs w defaults + add test * protect torch-specific test * fix style * fix * rebase * update processor to generic kwargs + test * fix style * add sensible kwargs merge * update test * fix assertEqual * move kwargs merging to processing common * rework kwargs for type hinting * just get Unpack from extensions * run-slow[align] * handle kwargs passed as nested dict * add from_pretrained test for nested kwargs handling * [run-slow]align * update documentation + imports * update audio inputs * protect audio types, silly * try removing imports * make things simpler * simplerer * move out kwargs test to common mixin * [run-slow]align * skip tests for old processors * [run-slow]align, clip * !$#@!! protect imports, darn it * [run-slow]align, clip * [run-slow]align, clip * update common processor testing * add altclip * add chinese_clip * add pad_size * [run-slow]align, clip, chinese_clip, altclip * remove duplicated tests * fix * update doc * improve documentation for default values * add model_max_length testing This parameter depends on tokenizers received. * Raise if kwargs are specified in two places * fix * match defaults * force padding * fix tokenizer test * clean defaults * move tests to common * remove try/catch block * deprecate kwarg * format * add copyright + remove unused method * [run-slow]altclip, chinese_clip * clean imports * fix version * clean up deprecation * fix style * add corner case test on kwarg overlap * resume processing - add Unpack as importable * add tmpdirname * fix altclip * fix up * add back crop_size to specific tests * generalize tests to possible video_processor * add back crop_size arg * fixup overlapping kwargs test for qformer_tokenizer * remove copied from * fixup chinese_clip tests values * fixup tests - qformer tokenizers * [run-slow] altclip, chinese_clip * remove prepare_image_inputs
What does this PR do?
Adds two models for #30511 , see parent PR for more details