Missing sentences in Fine-Tuned whisper-medium CT2 model when audio exceeds 30 seconds #1298
VamsiMarriwada
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm experiencing an issue with my fine-tuned whisper-medium model after converting it to CT2 format for use with faster-whisper. When transcribing audio files longer than 30 seconds, some sentences are consistently missing from the output. However, the same audio files work perfectly with the original model (before CT2 conversion).
I've already tried adjusting the chunk_length parameter and increasing it, but this actually makes the transcription quality worse, with up to half of the sentences missing.
Is there a way to automatically chunk audio files into segments under 30 seconds and then combine the transcriptions at the end? Or are there other parameters I should adjust to fix this issue with longer audio files in the CT2 format?
Thank you for any suggestions!
Beta Was this translation helpful? Give feedback.
All reactions