You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello
I have tried to transcribe an audio file which is mixed with Telugu and English for an interview with a health professional.
When I set language for auto detection, it displayed as - Detected language 'te' with probability 0.933493
But the output transcription was in Tamil (another south Indian language)
When I set language as Te (Telugu) , still the output transcription was in Tamil (another south Indian language).
Kindly help how to resolve this issue.
Thanks and Regards,
Dr Manoj Aravind,
Assistant Professor,
Community Medicine, Andhra Medical College, Visakhapatnam,
Andhra Pradesh, India.
The text was updated successfully, but these errors were encountered:
First, the support for Indian languages is not the best in whisper, the underlying AI model from OpenAI that I use, see: https://qxf2.com/blog/testing-openai-whisper-support-for-indian-languages/ (Note, however, that this test uses the "medium" model. I use the "large" one, with slightly better quality).
Second, mixed language content is not supported very well by whisper. Even if you get issue one sorted, whisper will probably struggle with your mixed languages and start to translate the English passages of your interview into Telugu. The only solution to this would be to split up your interview and transcribe the different languages separately.
To solve the first issue, you can try out this version of the large whisper AI model which has been especially trained to support Telugu: https://huggingface.co/vasista22/whisper-telugu-large-v2
In order to use it with noScribe, you have to first convert this model into the format for "faster-whisper", the particular implementation I use. Follow the instruction here: https://github.com/SYSTRAN/faster-whisper#model-conversion (section "Model conversion").
Now go to the folder of your noScribe-installation (on Windows: "C:\Program Files (x86)\noScribe") and replace the contents of the subfolder "models\faster-whisper-large-v2" with the corresponding files from your converted model. From now on, if you select the "precise" quality in noScribe, the new model will be used which will hopefully have a better support for Telugu.
Hello
I have tried to transcribe an audio file which is mixed with Telugu and English for an interview with a health professional.
When I set language for auto detection, it displayed as - Detected language 'te' with probability 0.933493
But the output transcription was in Tamil (another south Indian language)
When I set language as Te (Telugu) , still the output transcription was in Tamil (another south Indian language).
Kindly help how to resolve this issue.
Thanks and Regards,
Dr Manoj Aravind,
Assistant Professor,
Community Medicine, Andhra Medical College, Visakhapatnam,
Andhra Pradesh, India.
The text was updated successfully, but these errors were encountered: