Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token length limit #96

Open
rose-jinyang opened this issue Oct 24, 2024 · 1 comment
Open

Token length limit #96

rose-jinyang opened this issue Oct 24, 2024 · 1 comment

Comments

@rose-jinyang
Copy link

Hello
How are you?
Thanks for contributing to this project.
I am going to fine-tune Whisper model for Indian Telugu language on google/fleurs dataset.

torchrun --nproc_per_node=2 finetune.py --base_model=openai/whisper-large-v2 --language=None

But while training, I met the following issue.

File "/opt/conda/lib/python3.10/site-packages/transformers/models/whisper/modeling_whisper.py", line 1757, in forward
raise ValueError(
ValueError: Labels' sequence length 495 cannot exceed the maximum allowed length of 448 tokens.

What do you think about possible reasons?

@yeyupiaoling
Copy link
Owner

@rose-jinyang whisper has a limit on the input text, the length of each audio text cannot exceed 448 tokens, so you need to filter the data, this project only limits the length of the audio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants