You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Better transcription in more languages: Implement Massively Multilingual Speech - Meta's Open Source model with less than half of Whispers error rate
#20
Open
menelic opened this issue
Jun 30, 2023
· 2 comments
Please consider implementing Meta's MMS with speech recognition support for over 1000 languages at a drastically reduced error rate compared to Whisper:
Thank you, very interesting. I didn't know about this new model. Crazy times...
However, the above comparison might be a bit misleading. The WER of whisper can be as low as 3.0 in the languages supported the best (Spanish in this case), see here: https://github.com/openai/whisper#available-models-and-languages
I think there is a tradeoff: Whisper seems to be the leading model in the very large languages. MMS on the other hand is very inclusive. It would be best to include both... We will see.
thanks for the swift reply and the clarification about the WER by language in Whisper, well noted. However, I d still say that for researchers with material from non-Western contexts, adding MMS could truly be a game changer. And it might have another benefit over whisper: In many research contexts, interviewees would switch languages or use technical jargon in English while the rest of the interview is in another language. Whisper does not handle this well, if MMS is better at that this would be a boon. That said, I do not know if there might be a way to chunk the audio as part of NoScribe's processing, as that would make it easier for Whisper to recognise a language switch and follow it, rather than switching into translation mode.
Please consider implementing Meta's MMS with speech recognition support for over 1000 languages at a drastically reduced error rate compared to Whisper:
find it here:
https://github.com/facebookresearch/fairseq/tree/main/examples/mms
https://ai.facebook.com/blog/multilingual-model-speech-recognition/
The text was updated successfully, but these errors were encountered: