Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collaboration effort : regarding speech recognition #269

Open
atototenten opened this issue Jan 6, 2025 · 4 comments
Open

collaboration effort : regarding speech recognition #269

atototenten opened this issue Jan 6, 2025 · 4 comments
Labels
discussion Discussions or plans for the future question Further information is requested tts&stt Speech-to-text, text-to-speech and wakeword requests or bugs, including Vosk

Comments

@atototenten
Copy link

hi ,

i sincerely admire the project's author ,and want them and their creations to succeed .

im a user of another FOS app ,github.com/ElishaAz/Sayboard (speech-to-text virtual keyboard) ,which has superior speech recognition ,in my opinion .

since the most important and difficult component of both the projects is speech , why don't the projects join the forces ,into a single TTS and STT library project ?

i found that linux-world is also struggling with speech-recognition ,which is indeed quite difficult .

i think mozilla also has interest in speech part ,since they have an incomplete TTS engine project active ,also they released orbit(mistral-LLM based virtual-assistant) for firefox ,which would gain immensely from speech interface for communication with humans

hope for a positive action

thanks ,

well wisher anon.

@primesun
Copy link

I haven't tried sayboard, but I am surprised that it has "superior" speech recognition, as it also uses Vosk.

Anyway, I do think that the speech recognition of Dicio is not good, most of the time (80% or more) it doesn't understand what I'm saying. In comparison, the speech to text voice input method FUTO is so much more accurate.

So I hope that Dicio switches to it (or at least allows the option). See #197

@paolo-caroni
Copy link

Externalize the STT (ASR) to a external app (default engine on android with STT API) is in the roadmap.
If stypox spend time on STT engine we will lose effort on dicio itself.

In the linux world there are much very good FOSS STT.
My favourite actually is sherpa-onnx ASR (also on android, but not yet as a system engine).
Futo is not yet FOSS and is only an IME, not support STT API.

@Stypox
Copy link
Owner

Stypox commented Feb 26, 2025

@atototenten thanks for the kind words :-) Are you sure Sayboard works better? Because as @primesun said, Sayboard also uses Vosk, though one difference is that Sayboard implements Vosk at a lower level which might allow them for more control (though I don't see any big change).

https://github.com/ElishaAz/Sayboard/blob/81f4e4ce57cd274d73f1d2518153c423ca0d2abc/app/src/main/java/com/elishaazaria/sayboard/recognition/recognizers/sources/VoskLocal.kt#L49

@Stypox Stypox added question Further information is requested discussion Discussions or plans for the future tts&stt Speech-to-text, text-to-speech and wakeword requests or bugs, including Vosk labels Feb 26, 2025
@atototenten
Copy link
Author

maybe its just my experience

,i cannot be very sure ,since speech part of both are less than okay ,in my opinion

otherwise i agree with @paolo-caroni

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussions or plans for the future question Further information is requested tts&stt Speech-to-text, text-to-speech and wakeword requests or bugs, including Vosk
Projects
None yet
Development

No branches or pull requests

4 participants