Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for args to ProcessorMixin for backward compatibility #33479

Merged
merged 8 commits into from
Sep 20, 2024

Conversation

yonigozlan
Copy link
Member

@yonigozlan yonigozlan commented Sep 13, 2024

What does this PR do?

Split from #32841 by @leloykun .
Add support for args inputs to processor calls for backward compatibility + improve generalization in ProcessorTesterMixin.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

videos: VideoInput = None,
**kwargs,
**kwargs: Unpack[LlavaOnevisionProcessorKwargs],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zucchini-nlp It seems that the Unpack was in the init instead of the call function of the processor. Do you think it's worth a separate PR or can we include it here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, yes, can be here, thanks! Regarding adding audio=None before video, while audio won't be used for this model: seems to be counterintuitive to me.

@molbap welcome back! We had questions about changing order of main input args in your absence, and this is also distantnly related to that. Should we be adding unused args with this order?

Copy link
Contributor

@molbap molbap Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey! glad to be back, github notifications are hefty 😁
keeping audio=None is a bit strange, but it's the price for having always the same signature of constant size. I'd be pro-keeping always the same order as well. This is because in terms of integration of transformers in other tools, it makes things (a bit) easier, and would allow more future impact for processors. But if you think of something smart along the lines of informing a user of which modality/types are accepted by the model, I'm open to it!
edit: pressed enter too soon

@@ -82,7 +82,7 @@ class LlavaOnevisionProcessor(ProcessorMixin):
"image_token",
"video_token",
]
image_processor_class = "AutoImageProcessor"
image_processor_class = "LlavaOnevisionImageProcessor"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zucchini-nlp This is because right now, using AutoImageProcessor.from_pretrained after saving the models in the kwargs tests will actually load the last preprocessor saved, which is a LlavaOnevisionVideoProcessor here. This looks like a larger problem so I'll open a separate issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry for not noting that when merging the model. Is there any errors caused apart from cases when image and video processors can have different config values?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove these changes now this has been resolved for the onevision processor

Comment on lines +155 to +166
processor_components["image_processor"] = self.get_component(
"image_processor", do_rescale=True, rescale_factor=-1
)
Copy link
Member Author

@yonigozlan yonigozlan Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the testing of images kwargs from size and crop_size to do_rescale and rescale_factor, as the fact that not all image_processors accept size or crop_size was making it hard to generalize to most models.

edit: hadn't finished the last sentence 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice workaround if do_rescale is indeed present for all models with image processors. So what's tested now is

self.assertLessEqual(inputs[self.images_data_arg_name].mean(), 0)

With a rescale_factor of -1 as you set it, will that work in all cases? or rather, is it enough to say that the image has indeed been rescaled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know the images will be those from the prepare_inputs function and that they are randomly set in 0...255, this seems to be enough for me. I agree it's a hacky workaround though. But I really struggle to find something that would work with most image processors and that's easy to test...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fair enough - my point was that the prepare_image_inputs is necessary and thus inseparable from this test. I think do_rescale is used in 62 models, and not used in 3 (out of 65 models with image_processing capabilities) namely

  • pix2struct
  • imagegpt
  • layoutlmv2

two of these see less usage now and we can handle their processing on the side. In conclusion, OK for me, even though it's strange to see it, maybe just a small docstring to explain why this test exists

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out which models do not support do_rescale. I will make sure to use other tests for those.
And you are right I will add a small docstring to explain what is happening.

@yonigozlan
Copy link
Member Author

Looks like the test errors were unrelated to this PR, so it should be good to review now :). Also glad to see you back @molbap!

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky tricky stuff. Nice, looking forward to having this merged, will reduce complexity a fair bit

Comment on lines +155 to +166
processor_components["image_processor"] = self.get_component(
"image_processor", do_rescale=True, rescale_factor=-1
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice workaround if do_rescale is indeed present for all models with image processors. So what's tested now is

self.assertLessEqual(inputs[self.images_data_arg_name].mean(), 0)

With a rescale_factor of -1 as you set it, will that work in all cases? or rather, is it enough to say that the image has indeed been rescaled?

Comment on lines 58 to 60
text_data_arg_name = "input_ids"
images_data_arg_name = "pixel_values"
videos_data_arg_name = "pixel_values_videos"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it adds clarity to have these as class attributes rather than keeping the names around (I had to scroll up to figure out what it was). Nit though, it's fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes in handy when a model uses different names, such as "decoder_input_ids" for text_data_arg_name. Then we can just change those attribute in the processor tester class instead of rewriting all the tests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree looking through the tests it's a bit confusing (I also had to scroll up to find) but like that it makes it more easily configurable.

Perhaps we could think of a different name? "data_arg_name" isn't obvious what it is. Maybe e.g. model_input_text or text_input_name?

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice - thanks for adding and for improving the tests!

src/transformers/processing_utils.py Outdated Show resolved Hide resolved
@@ -958,6 +964,64 @@ def validate_init_kwargs(processor_config, valid_kwargs):
unused_kwargs = {k: processor_config[k] for k in unused_keys}
return unused_kwargs

def prepare_and_validate_optional_call_args(self, *args):
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice docstring :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's from @leloykun :)

src/transformers/processing_utils.py Outdated Show resolved Hide resolved
@@ -82,7 +82,7 @@ class LlavaOnevisionProcessor(ProcessorMixin):
"image_token",
"video_token",
]
image_processor_class = "AutoImageProcessor"
image_processor_class = "LlavaOnevisionImageProcessor"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove these changes now this has been resolved for the onevision processor

Comment on lines 58 to 60
text_data_arg_name = "input_ids"
images_data_arg_name = "pixel_values"
videos_data_arg_name = "pixel_values_videos"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree looking through the tests it's a bit confusing (I also had to scroll up to find) but like that it makes it more easily configurable.

Perhaps we could think of a different name? "data_arg_name" isn't obvious what it is. Maybe e.g. model_input_text or text_input_name?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yonigozlan
Copy link
Member Author

Hey @molbap ! I made a few new modifications, mostly to fit Pixtral, but it still works for other processors. I also removed unnecessary tests for newly merged uniformized processors.
Main concern is about the scale_factor/ do_rescale part. Although it is a bit hacky, I do think it's the most generalizable test we can do on images kwargs.
Let me know if you think we can merge this :)

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @yonigozlan ! added a comment or two about a couple last things but LGTM!

Comment on lines +155 to +166
processor_components["image_processor"] = self.get_component(
"image_processor", do_rescale=True, rescale_factor=-1
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fair enough - my point was that the prepare_image_inputs is necessary and thus inseparable from this test. I think do_rescale is used in 62 models, and not used in 3 (out of 65 models with image_processing capabilities) namely

  • pix2struct
  • imagegpt
  • layoutlmv2

two of these see less usage now and we can handle their processing on the side. In conclusion, OK for me, even though it's strange to see it, maybe just a small docstring to explain why this test exists

Comment on lines 347 to 405
class MyProcessor(ProcessorMixin):
attributes = ["image_processor", "tokenizer"]
image_processor_class = "CLIPImageProcessor"
tokenizer_class = ("CLIPTokenizer", "CLIPTokenizerFast")
optional_call_args = ["optional_1", "optional_2"]

def __init__(self, image_processor=None, tokenizer=None, processor_attr_1=1, processor_attr_2=True):
super().__init__(image_processor, tokenizer)

self.processor_attr_1 = processor_attr_1
self.processor_attr_2 = processor_attr_2


@require_tokenizers
@require_vision
class ProcessorTest(unittest.TestCase):
processor_class = MyProcessor

def prepare_processor_dict(self):
return {"processor_attr_1": 1, "processor_attr_2": False}

def get_processor(self):
image_processor = CLIPImageProcessor.from_pretrained("openai/clip-vit-large-patch14")
tokenizer = CLIPTokenizerFast.from_pretrained("openai/clip-vit-large-patch14")
processor = MyProcessor(image_processor, tokenizer, **self.prepare_processor_dict())

return processor

def test_processor_to_json_string(self):
processor = self.get_processor()
obj = json.loads(processor.to_json_string())
for key, value in self.prepare_processor_dict().items():
self.assertEqual(obj[key], value)
self.assertEqual(getattr(processor, key, None), value)

def test_processor_from_and_save_pretrained(self):
processor_first = self.get_processor()

with tempfile.TemporaryDirectory() as tmpdirname:
saved_file = processor_first.save_pretrained(tmpdirname)[0]
check_json_file_has_correct_format(saved_file)
processor_second = self.processor_class.from_pretrained(tmpdirname)

self.assertEqual(processor_second.to_dict(), processor_first.to_dict())

def test_prepare_and_validate_optional_call_args(self):
processor = self.get_processor()
# test all optional call args are given
optional_call_args = processor.prepare_and_validate_optional_call_args("optional_1", "optional_2")
self.assertEqual(optional_call_args, {"optional_1": "optional_1", "optional_2": "optional_2"})
# test only one optional call arg is given
optional_call_args = processor.prepare_and_validate_optional_call_args("optional_1")
self.assertEqual(optional_call_args, {"optional_1": "optional_1"})
# test no optional call arg is given
optional_call_args = processor.prepare_and_validate_optional_call_args()
self.assertEqual(optional_call_args, {})
# test too many optional call args are given
with self.assertRaises(ValueError):
processor.prepare_and_validate_optional_call_args("optional_1", "optional_2", "optional_3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this was removed in a previous PR, because it only applied to one model I suppose? Can we take the tests here and move them to the tester class above instead, wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it probably existed to test the test Mixin before any processor with the mixin wasmerged but I guess it is not useful anymore.
I moved test_prepare_and_validate_optional_call_args to the tester class above (the other tests were already there).

Comment on lines +58 to +60
text_input_name = "input_ids"
images_input_name = "pixel_values"
videos_input_name = "pixel_values_videos"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for cleaning these up!

@yonigozlan yonigozlan merged commit c0c6815 into huggingface:main Sep 20, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants