websocket wrapper for Android+Konele voice input/search #457

rpdrewes · 2023-01-29T17:03:11Z

rpdrewes
Jan 29, 2023

For some time I have been using Kaldi voice recognition as a service on my home server. My (de-Googled) Android (LineageOS) phone communicates with this server through the (open source) Konele app from f-droid repo. This gives me voice recognition input and search services similar to "Ok Google" and Siri in a free and open source environment completely under my control. It really works quite well, though I think Kaldi is not up to Whisper's level as a speech recognition system.

The communication between Konele (the Android voice input part) and the voice server backend is websockets and doesn't seem too complex. I am considering making a similar websockets server wrapper around Whisper.cpp that could substitute in for the Kaldi-based backend and work with the existing Konele frontend. I searched around and it doesn't seem that anyone has done this yet.

Anyone have any thoughts on this?

Rich

rpdrewes · 2023-01-30T22:41:39Z

rpdrewes
Jan 30, 2023
Author

I could not resist doing a trial implementation and already it is really fantastic! My current implementation provides (with Konele free app as the client) a simple self-hosted server-based voice recognition service to my de-Googled Android phone that seems really great.

My current websocket wrapper for whisper.cpp is very stupid and simple, but with the accuracy of Whisper, and the fantastic implementation by ggerganov, the result is still impressive!

3 replies

strangelearning Feb 7, 2023

@rpdrewes could you share your websocket implementation?

What format are you sending the audio in? I suppose that's the only real tricky bit

michael-prosper Feb 7, 2023

@

rpdrewes Feb 7, 2023
Author

The Android front end is an existing app (free, on f-droid repo) called Konele (or K6nele) that can send the audio in a variety of formats. This acts as the audio input interface on the phone side, which is invoked when for example you hit the microphone button on the virtual keyboard when composing a text message. I didn't initially realize that Konele was that flexible in what format it could be configured to send the audio in, so I just started working with the default which turned out to be 8KHz samples/second, each 16 bits (S16_LE). On the server side, I capture the audio to a file and then use a simple sox command to convert it to the format that ggerganov whisper.cpp "main" wants, and invoke "main" in a subprocess from my python websocket wrapper, and send back the result on the websocket. It is stupid and simple, but really works amazingly well! The reason it works so well is that Whisper is such an outstanding recognizer, and ggerganov's implementation is so efficient.

It would not be that much harder to send incremental recognition results on the websocket, rather than a final result at the end. Incremental seems to be the way most people like it, but I actually prefer the batch approach better. On my Android phone, when composing a text message or email, I can speak for say 30 seconds, then wait a few seconds for the recognition result to come back, then review and edit it manually. That's the way I prefer to do it.

@strangelearning do you have a recommendation for how to share it? Should I create a new git repo on github? The code is embarrassingly quick and dirty, but quite effective. It is less than 100 lines of code modified from the websocket demo program. I am using a python websocket library called "wsocket" (pip install --upgrade wsocket").

rpdrewes · 2023-02-08T23:40:54Z

rpdrewes
Feb 8, 2023
Author

I created a git repo to help anyone who is interested to set this up. It really works great!

https://github.com/rpdrewes/whisper-websocket-server

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

websocket wrapper for Android+Konele voice input/search #457

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

websocket wrapper for Android+Konele voice input/search #457

rpdrewes Jan 29, 2023

Replies: 2 comments · 3 replies

rpdrewes Jan 30, 2023 Author

strangelearning Feb 7, 2023

michael-prosper Feb 7, 2023

rpdrewes Feb 7, 2023 Author

rpdrewes Feb 8, 2023 Author

rpdrewes
Jan 29, 2023

Replies: 2 comments 3 replies

rpdrewes
Jan 30, 2023
Author

rpdrewes Feb 7, 2023
Author

rpdrewes
Feb 8, 2023
Author