-
Hello, I am curious to know if WER numbers have been reported using whisper.cpp versus openai python code? I compared both codes on some broadcast data and found the that the WER with whisper.cpp is 50% higher than with the openai code (using the same model). Is there any way to tune whisper.cpp to get a comparable WER to the openai code?
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
The default parameters of whisper.cpp and openai are different. If you have not, test openai with beam_size=1, best_of=2, temperature=[0.0, 0.4, 0.8] which are whisper.cpp default parameters. While it is not identical, I get fairly similar accuracy qualitatively. |
Beta Was this translation helpful? Give feedback.
-
I found older messages about the same topic. I understand that whisper.cpp does not implement the same decoding strategy as openai code, meaning we should not expect the same accuracy. A 30% relative difference in the WER is in fact a huge difference when comparing different decodings with the same model. Differences in decoding for speech recognizers usually does not impact the WER by more than a few percent relative. Are there any plans to reduce this large gap? |
Beta Was this translation helpful? Give feedback.
-
We've wrapped up our analysis comparing the To summarize the main issues we found in whisper.cpp:
On top of these, whisper.cpp presents a couple of secondary concerns:
With these findings in hand, we're set to fix whisper.cpp.
|
Beta Was this translation helpful? Give feedback.
We've wrapped up our analysis comparing the
log_mel_spectrogram
generation between whisper.cpp and OpenAI's Whisper.To summarize the main issues we found in whisper.cpp:
The Stage-1 padding (zero padding) is inadequate. Whi…