Better transcription in more languages: Implement Massively Multilingual Speech - Meta's Open Source model with less than half of Whispers error rate #20

menelic · 2023-06-30T10:56:46Z

Please consider implementing Meta's MMS with speech recognition support for over 1000 languages at a drastically reduced error rate compared to Whisper:

find it here:

https://github.com/facebookresearch/fairseq/tree/main/examples/mms

https://ai.facebook.com/blog/multilingual-model-speech-recognition/

kaixxx · 2023-06-30T12:26:50Z

Thank you, very interesting. I didn't know about this new model. Crazy times...

However, the above comparison might be a bit misleading. The WER of whisper can be as low as 3.0 in the languages supported the best (Spanish in this case), see here: https://github.com/openai/whisper#available-models-and-languages
I think there is a tradeoff: Whisper seems to be the leading model in the very large languages. MMS on the other hand is very inclusive. It would be best to include both... We will see.

menelic · 2023-06-30T12:37:52Z

thanks for the swift reply and the clarification about the WER by language in Whisper, well noted. However, I d still say that for researchers with material from non-Western contexts, adding MMS could truly be a game changer. And it might have another benefit over whisper: In many research contexts, interviewees would switch languages or use technical jargon in English while the rest of the interview is in another language. Whisper does not handle this well, if MMS is better at that this would be a boon. That said, I do not know if there might be a way to chunk the audio as part of NoScribe's processing, as that would make it easier for Whisper to recognise a language switch and follow it, rather than switching into translation mode.

kaixxx · 2024-12-19T14:41:45Z

I revisited this suggestion, but it seems that the facebook model has not evolved since 2023. It would also be quite complicated to integrate this into noScribe, so I abandon the idea. Still, thanks for the suggestion!

kaixxx closed this as completed Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better transcription in more languages: Implement Massively Multilingual Speech - Meta's Open Source model with less than half of Whispers error rate #20

Better transcription in more languages: Implement Massively Multilingual Speech - Meta's Open Source model with less than half of Whispers error rate #20

menelic commented Jun 30, 2023

kaixxx commented Jun 30, 2023 •

edited

Loading

menelic commented Jun 30, 2023 •

edited

Loading

kaixxx commented Dec 19, 2024

Better transcription in more languages: Implement Massively Multilingual Speech - Meta's Open Source model with less than half of Whispers error rate #20

Better transcription in more languages: Implement Massively Multilingual Speech - Meta's Open Source model with less than half of Whispers error rate #20

Comments

menelic commented Jun 30, 2023

kaixxx commented Jun 30, 2023 • edited Loading

menelic commented Jun 30, 2023 • edited Loading

kaixxx commented Dec 19, 2024

kaixxx commented Jun 30, 2023 •

edited

Loading

menelic commented Jun 30, 2023 •

edited

Loading