Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better transcription in more languages: Implement Massively Multilingual Speech - Meta's Open Source model with less than half of Whispers error rate #20

Open
menelic opened this issue Jun 30, 2023 · 2 comments

Comments

@menelic
Copy link

menelic commented Jun 30, 2023

Please consider implementing Meta's MMS with speech recognition support for over 1000 languages at a drastically reduced error rate compared to Whisper:

image

find it here:

https://github.com/facebookresearch/fairseq/tree/main/examples/mms

https://ai.facebook.com/blog/multilingual-model-speech-recognition/

@kaixxx
Copy link
Owner

kaixxx commented Jun 30, 2023

Thank you, very interesting. I didn't know about this new model. Crazy times...

However, the above comparison might be a bit misleading. The WER of whisper can be as low as 3.0 in the languages supported the best (Spanish in this case), see here: https://github.com/openai/whisper#available-models-and-languages
I think there is a tradeoff: Whisper seems to be the leading model in the very large languages. MMS on the other hand is very inclusive. It would be best to include both... We will see.

@menelic
Copy link
Author

menelic commented Jun 30, 2023

thanks for the swift reply and the clarification about the WER by language in Whisper, well noted. However, I d still say that for researchers with material from non-Western contexts, adding MMS could truly be a game changer. And it might have another benefit over whisper: In many research contexts, interviewees would switch languages or use technical jargon in English while the rest of the interview is in another language. Whisper does not handle this well, if MMS is better at that this would be a boon. That said, I do not know if there might be a way to chunk the audio as part of NoScribe's processing, as that would make it easier for Whisper to recognise a language switch and follow it, rather than switching into translation mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants