You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey @bnestor, thanks a lot for raising this issue.
Indeed the problem arises where you've spot it, it's linked to #33082 and the PR I already did to fix it #33512. It had to write tests for it before merging but we then underwent another bug-fixing effort (#34535, #34537, #34111) making the PR to stall, sorry for the delay! Nevertheless, it's next point on the whisper bug-fixing roadmap so should be solved quickly 🤗
System Info
python=3.10.13
transformers==4.44.1
torch==2.1.2
Who can help?
@sanchit-gandhi @ylacombe @eustlb @ArthurZ
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Decoding with timestamps produces unexpected results when the vocabulary is extended
The problem arises in
transformers/src/transformers/models/whisper/tokenization_whisper.py
Line 546 in 9613933
see issue 20225
Expected behavior
I would expect the timestamps to remain consistent from tokenizing and decoding.
The text was updated successfully, but these errors were encountered: