You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, I want to thank the author of this tool for simplifying the process of using OpenAI Whisper. Thanks to you, Fauzan, far more people are able to use the features of Whisper via a clean GUI.
As a feature request, I would love to see support added in your program for the latest enhancements added by WhisperX (https://github.com/m-bain/whisperX), which is a greatly-improved version of OpenAI's Whisper.
WhisperX is by a research group from the University of Oxford, is 70x faster than OpenAI Whisper, requires much less GPU memory running the language models, has a lower word error rate, does not have the hallucinations, drifting and repetitions that standard WhisperAI is prone to. The program detects when there is silence, can also detect when there are multiple speakers and identify each one uniquely, even with overlapping voices. It is also able to produce far more accurate timestamps, down to the level of individual letters in the words.
As it processes a recording, it splits the audio into 30 second chunks then batch processes them simultaneously for a dramatic speed increase. It appears to be different from WhisperJAX (https://github.com/sanchit-gandhi/whisper-jax) in that the released version of WhisperJAX splits the audio for batch processing without proper context, meaning that the cuts sometimes occur in the middle of words, which means that WhisperJAX ends up translating partial words, which generates a higher Word Error Rate. WhisperX does not do this. It scans before splitting, properly detecting the start and stop of words, so cuts happen in the spaces between.
I have been reading that WhisperX does a much better job translating various languages compared to OpenAI's version, which makes me think that proceeding with the current version of Whisper I have been using is fairly pointless, because the results would be inferior to WhisperX and I would need to re-do them later.
The problem is that I have been unable to get WhisperX running properly on my machine. I don't know which version/update of which dependency has broken the installation. I have reinstalled things multiple times and spent many hours trying to troubleshoot it. I know that there are many others experiencing similar problems like me. It would be great if you could provide support for either WhisperX or even Faster-Whisper (https://github.com/guillaumekln/faster-whisper) which is not as advanced as WhisperX, but is an improvement over regular WhisperAI.
Ideally, users would have the option to choose between OpenAI's standard version and a huge improvement like WhisperX. Combining the improvements of WhisperX with your GUI would be wonderful!
That is long and very detailed, thank you. And yeah i'll try to add whisperx later on because I think i would like to make it so that the user can choose which backend to be used as an option
I have tried adding it before but it seems that there would need to be lots of refactoring so i decided to use whisper_timestamped stable-ts for now, although the development has been kinda slow lately because of so much personal stuff that i'm doing right now aside from developing this app.
First, I want to thank the author of this tool for simplifying the process of using OpenAI Whisper. Thanks to you, Fauzan, far more people are able to use the features of Whisper via a clean GUI.
As a feature request, I would love to see support added in your program for the latest enhancements added by WhisperX (https://github.com/m-bain/whisperX), which is a greatly-improved version of OpenAI's Whisper.
WhisperX is by a research group from the University of Oxford, is 70x faster than OpenAI Whisper, requires much less GPU memory running the language models, has a lower word error rate, does not have the hallucinations, drifting and repetitions that standard WhisperAI is prone to. The program detects when there is silence, can also detect when there are multiple speakers and identify each one uniquely, even with overlapping voices. It is also able to produce far more accurate timestamps, down to the level of individual letters in the words.
As it processes a recording, it splits the audio into 30 second chunks then batch processes them simultaneously for a dramatic speed increase. It appears to be different from WhisperJAX (https://github.com/sanchit-gandhi/whisper-jax) in that the released version of WhisperJAX splits the audio for batch processing without proper context, meaning that the cuts sometimes occur in the middle of words, which means that WhisperJAX ends up translating partial words, which generates a higher Word Error Rate. WhisperX does not do this. It scans before splitting, properly detecting the start and stop of words, so cuts happen in the spaces between.
I have been reading that WhisperX does a much better job translating various languages compared to OpenAI's version, which makes me think that proceeding with the current version of Whisper I have been using is fairly pointless, because the results would be inferior to WhisperX and I would need to re-do them later.
The problem is that I have been unable to get WhisperX running properly on my machine. I don't know which version/update of which dependency has broken the installation. I have reinstalled things multiple times and spent many hours trying to troubleshoot it. I know that there are many others experiencing similar problems like me. It would be great if you could provide support for either WhisperX or even Faster-Whisper (https://github.com/guillaumekln/faster-whisper) which is not as advanced as WhisperX, but is an improvement over regular WhisperAI.
Ideally, users would have the option to choose between OpenAI's standard version and a huge improvement like WhisperX. Combining the improvements of WhisperX with your GUI would be wonderful!
More info:
https://github.com/m-bain/whisperX (WhisperX GitHub source)
https://www.slashcam.com/news/single/WhisperX--Free-audio-transcription-with-speaker-re-17704.html
https://web.archive.org/web/20230301023005/https://www.swyx.io/transcribe-podcasts-with-whisper
https://arxiv.org/abs/2303.00747
The text was updated successfully, but these errors were encountered: