-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Zero-shot image classification pipeline] Remove tokenizer_kwargs #33174
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -97,9 +97,6 @@ def __call__(self, images: Union[str, List[str], "Image", List["Image"]], **kwar | |
The maximum time in seconds to wait for fetching images from the web. If None, no timeout is set and | ||
the call may block forever. | ||
|
||
tokenizer_kwargs (`dict`, *optional*): | ||
Additional dictionary of keyword arguments passed along to the tokenizer. | ||
|
||
Comment on lines
-100
to
-102
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this should be removed. We might be able to remove in the tests but it should stay here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm could you clarify? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, OK, in this case we can remove! |
||
Return: | ||
A list of dictionaries containing one entry per proposed label. Each dictionary contains the | ||
following keys: | ||
|
@@ -109,16 +106,14 @@ def __call__(self, images: Union[str, List[str], "Image", List["Image"]], **kwar | |
""" | ||
return super().__call__(images, **kwargs) | ||
|
||
def _sanitize_parameters(self, tokenizer_kwargs=None, **kwargs): | ||
def _sanitize_parameters(self, **kwargs): | ||
preprocess_params = {} | ||
if "candidate_labels" in kwargs: | ||
preprocess_params["candidate_labels"] = kwargs["candidate_labels"] | ||
if "timeout" in kwargs: | ||
preprocess_params["timeout"] = kwargs["timeout"] | ||
if "hypothesis_template" in kwargs: | ||
preprocess_params["hypothesis_template"] = kwargs["hypothesis_template"] | ||
if tokenizer_kwargs is not None: | ||
preprocess_params["tokenizer_kwargs"] = tokenizer_kwargs | ||
|
||
return preprocess_params, {}, {} | ||
|
||
|
@@ -128,18 +123,15 @@ def preprocess( | |
candidate_labels=None, | ||
hypothesis_template="This is a photo of {}.", | ||
timeout=None, | ||
tokenizer_kwargs=None, | ||
): | ||
if tokenizer_kwargs is None: | ||
tokenizer_kwargs = {} | ||
image = load_image(image, timeout=timeout) | ||
inputs = self.image_processor(images=[image], return_tensors=self.framework) | ||
if self.framework == "pt": | ||
inputs = inputs.to(self.torch_dtype) | ||
inputs["candidate_labels"] = candidate_labels | ||
sequences = [hypothesis_template.format(x) for x in candidate_labels] | ||
padding = "max_length" if self.model.config.model_type == "siglip" else True | ||
text_inputs = self.tokenizer(sequences, return_tensors=self.framework, padding=padding, **tokenizer_kwargs) | ||
text_inputs = self.tokenizer(sequences, return_tensors=self.framework, padding=padding) | ||
inputs["text_inputs"] = [text_inputs] | ||
return inputs | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty hacky. It should at the very least only be done in the BertTokenizer branch to make explicit why this is needed (including an explanatory comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see what's hacky about this,
model_input_names
is a genuine attribute ofPretrainedTokenizer
that has been there since BERT/GPT-2 came out (it's not used a lot). As can be seen here, a tokenizer will returntoken_type_ids
if it's present in themodel_input_names
.A less "hacky" way would be to pass
model_input_names
directly in thefrom_pretrained
method, which the Transformers library also supports.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm trying to say is that one can avoid lines like these by appropriately setting the
model_input_names
attribute of the tokenizer, which is something we can still do for theBlip2ImageTextRetrieval
models on the hub as they are just added and not yet in a stable release.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I see. Although both modifying
model_input_names
andreturn_token_ids
are somewhat hacky: they modify a class attribute which might be depended upon for other behaviours / assumed to have certain properties. Passing inmodel_input_names
to the init is better, as this is more likely to correctly propagate any changes to any other dependant attributes / variables.