Non-English translations #526
sanjaymk908
started this conversation in
General
Replies: 2 comments 7 replies
-
Did you add --language option ? |
Beta Was this translation helpful? Give feedback.
7 replies
-
I believe Hindi and maybe Tamil were the only languages I've worked with with whisper.cpp where I had to specify something other than UTF-8
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Thanks @ggerganov for a phenomenal piece of s/w! OpenAI has def created fantastic models (more below). But my inference tests using the OpenAI Whisper impl on my puny MacBook Air and on beefy AWS servers were phenomenally bad :( 11 mins for a 30 sec audio clip e.g
Compare OpenAI Whisper w/ Whisper.cpp:
-Same puny MacBook Air
-Tiny model (77 Mb on disk)
But...have a Q re outputs with non-english audio. Why is it transliterated in english? e.g ""box office pe asa tandav kiya hai fund ne" is whisper.cpp's output. But its transliteration of "बॉक्स ऑफिस पर ऐसा तांडव किया है फंड ने"
Does this mean - OpenAI's models were trained on transliterated dialogs? One can use other APIs to change these back to Hindi e.g But the concern is - theres always some lossy conversions that happen. So, would be best to go from speech -> Hindi directly. Or am I missing something here?
Beta Was this translation helpful? Give feedback.
All reactions