Fix ASR pipeline bug when using some kwargs #34663

jhj0517 · 2024-11-09T02:41:05Z

What does this PR do?

Always Thanks for your work.
In AutomaticSpeechRecognitionPipeline, when using temperature, no_speech_threshold and logprob_threshold kwargs, token_ids has List of float types and it occurs following TypeError:

  File "C:\Whisper_Project\Whisper-WebUI\venv\Lib\site-packages\transformers\tokenization_utils_fast.py", line 657, in _decode
    text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument 'ids': 'float' object cannot be interpreted as an integer

Reproduction

from transformers import pipeline

input_path = "test_audio.mp3"

pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-tiny",
    torch_dtype="float16",
    device="cuda",
)

kwargs = {
    "language": "en",
    "task": "transcribe",
    "temperature": 0.7,
    "no_speech_threshold": 0.6,
    "logprob_threshold": -1.0,
}
segments = pipe(
    inputs=[input_path],
    return_timestamps=True,
    chunk_length_s=30,
    batch_size=24,
    generate_kwargs=kwargs
)

This PR fixes the error by casting it to list of int if its type is float.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Thanks!!

tokenizers: @ArthurZucker
speech models: @ylacombe, @eustlb

jhj0517 · 2024-11-09T02:48:45Z

This may be caused by to_py_obj():

transformers/src/transformers/tokenization_utils_base.py

Line 3818 in a06a0d1

token_ids = to_py_obj(token_ids)

It's converted to list of float when using temperature, no_speech_threshold, and logprob_threshold parameters in the pipeline. So it occurs TypeError.

ArthurZucker · 2024-11-25T14:51:51Z

cc @eustlb can you have a look?

eustlb · 2025-01-17T14:09:41Z

Hey there, sorry for the delay and thanks a lot your patience 🤗
I am unable to reproduce the issue, everything runs fine with v4.48.0. Whisper underwent few changes and bug fixes and this one might have been fixed in the process. Can you confirm?

jhj0517 · 2025-01-17T15:16:59Z

@eustlb I've just confirmed that this is not reproducible with transformers==4.48.0.

Thanks for your hard work. I'm closing since this is no longer reproducible!

Cast to int if it's float

d67cbbf

jhj0517 mentioned this pull request Nov 9, 2024

argument 'ids': 'float' object cannot be interpreted as an integer jhj0517/Whisper-WebUI#379

Closed

jhj0517 added 6 commits November 12, 2024 00:49

Merge branch 'main' into fix/asr-bug

43da36b

Merge branch 'main' into fix/asr-bug

96c13f7

Merge branch 'main' into fix/asr-bug

2679f65

Merge branch 'main' into fix/asr-bug

9792df7

Merge branch 'main' into fix/asr-bug

ec1f246

Merge branch 'main' into fix/asr-bug

e3d3a4d

Merge branch 'main' into fix/asr-bug

644ede6

ylacombe requested a review from eustlb November 26, 2024 15:31

jhj0517 closed this Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ASR pipeline bug when using some kwargs #34663

Fix ASR pipeline bug when using some kwargs #34663

Uh oh!

jhj0517 commented Nov 9, 2024 •

edited

Loading

Uh oh!

jhj0517 commented Nov 9, 2024 •

edited

Loading

Uh oh!

ArthurZucker commented Nov 25, 2024

Uh oh!

eustlb commented Jan 17, 2025

Uh oh!

jhj0517 commented Jan 17, 2025

Uh oh!

Uh oh!

Fix ASR pipeline bug when using some kwargs #34663

Fix ASR pipeline bug when using some kwargs #34663

Uh oh!

Conversation

jhj0517 commented Nov 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Reproduction

Before submitting

Who can review?

Uh oh!

jhj0517 commented Nov 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker commented Nov 25, 2024

Uh oh!

eustlb commented Jan 17, 2025

Uh oh!

jhj0517 commented Jan 17, 2025

Uh oh!

Uh oh!

jhj0517 commented Nov 9, 2024 •

edited

Loading

jhj0517 commented Nov 9, 2024 •

edited

Loading