collaboration effort : regarding speech recognition #269

atototenten · 2025-01-06T08:04:05Z

hi ,

i sincerely admire the project's author ,and want them and their creations to succeed .

im a user of another FOS app ,github.com/ElishaAz/Sayboard (speech-to-text virtual keyboard) ,which has superior speech recognition ,in my opinion .

since the most important and difficult component of both the projects is speech , why don't the projects join the forces ,into a single TTS and STT library project ?

i found that linux-world is also struggling with speech-recognition ,which is indeed quite difficult .

i think mozilla also has interest in speech part ,since they have an incomplete TTS engine project active ,also they released orbit(mistral-LLM based virtual-assistant) for firefox ,which would gain immensely from speech interface for communication with humans

hope for a positive action

thanks ,

well wisher anon.

primesun · 2025-01-29T06:40:09Z

I haven't tried sayboard, but I am surprised that it has "superior" speech recognition, as it also uses Vosk.

Anyway, I do think that the speech recognition of Dicio is not good, most of the time (80% or more) it doesn't understand what I'm saying. In comparison, the speech to text voice input method FUTO is so much more accurate.

So I hope that Dicio switches to it (or at least allows the option). See #197

paolo-caroni · 2025-02-26T10:55:54Z

Externalize the STT (ASR) to a external app (default engine on android with STT API) is in the roadmap.
If stypox spend time on STT engine we will lose effort on dicio itself.

In the linux world there are much very good FOSS STT.
My favourite actually is sherpa-onnx ASR (also on android, but not yet as a system engine).
Futo is not yet FOSS and is only an IME, not support STT API.

Stypox · 2025-02-26T18:58:22Z

@atototenten thanks for the kind words :-) Are you sure Sayboard works better? Because as @primesun said, Sayboard also uses Vosk, though one difference is that Sayboard implements Vosk at a lower level which might allow them for more control (though I don't see any big change).

https://github.com/ElishaAz/Sayboard/blob/81f4e4ce57cd274d73f1d2518153c423ca0d2abc/app/src/main/java/com/elishaazaria/sayboard/recognition/recognizers/sources/VoskLocal.kt#L49

atototenten · 2025-02-27T02:19:57Z

maybe its just my experience

,i cannot be very sure ,since speech part of both are less than okay ,in my opinion

otherwise i agree with @paolo-caroni

thanks

atototenten mentioned this issue Jan 6, 2025

collab. effort : regarding speech recog. ElishaAz/Sayboard#93

Open

Stypox added question Further information is requested discussion Discussions or plans for the future tts&stt Speech-to-text, text-to-speech and wakeword requests or bugs, including Vosk labels Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

collaboration effort : regarding speech recognition #269

collaboration effort : regarding speech recognition #269

atototenten commented Jan 6, 2025

primesun commented Jan 29, 2025

paolo-caroni commented Feb 26, 2025

Stypox commented Feb 26, 2025

atototenten commented Feb 27, 2025

collaboration effort : regarding speech recognition #269

collaboration effort : regarding speech recognition #269

Comments

atototenten commented Jan 6, 2025

primesun commented Jan 29, 2025

paolo-caroni commented Feb 26, 2025

Stypox commented Feb 26, 2025

atototenten commented Feb 27, 2025