Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heads up : whisper is easier to run on android than ever #169

Open
thiswillbeyourgithub opened this issue Mar 16, 2023 · 12 comments
Open

heads up : whisper is easier to run on android than ever #169

thiswillbeyourgithub opened this issue Mar 16, 2023 · 12 comments
Labels
tts&stt Speech-to-text, text-to-speech and wakeword requests or bugs, including Vosk

Comments

@thiswillbeyourgithub
Copy link

Hi,

Just thought people here might be interested in this repo https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.android

@Stypox
Copy link
Owner

Stypox commented Apr 3, 2023

It seems like they only have english models though: https://github.com/ggerganov/whisper.cpp/tree/master/models

@thiswillbeyourgithub
Copy link
Author

No they do not ? I don't know if I'm missing something though.
image

@Stypox
Copy link
Owner

Stypox commented Apr 3, 2023

What is the multilingual model and how well does it work?

@thiswillbeyourgithub
Copy link
Author

Well I think it works from "quick and fast" to "large and not far from SOTA" depending on the model size.

The model take a language as input, for example "--lang=french" btw

@Stypox
Copy link
Owner

Stypox commented Apr 3, 2023

We can go as high as the base module on phones, I guess

@thiswillbeyourgithub
Copy link
Author

I would be so interested in this I must say :)

Regarding model size : lots of quantized versions exist now

@Stypox Stypox added the tts&stt Speech-to-text, text-to-speech and wakeword requests or bugs, including Vosk label Apr 20, 2023
@Stypox
Copy link
Owner

Stypox commented Jul 21, 2024

Posted by LjL on the Matrix channel:

There is an open source demo app at https://github.com/vilassn/whisper_android but it's not the one I've used

There is also https://github.com/usefulsensors/openai-whisper/tree/main/android_app/Whisper-TFLIte-Android-Example

@Stypox
Copy link
Owner

Stypox commented Feb 26, 2025

Posted by @tdbe in #293

Dicio is amazing and I might contribute with adding a language or two that is missing, but it's so limiting that it locks itself to listening for only one language.

It is particularly bad because it's probably the only private System Voice Input app. (System > Languages > Speech > Voice Input support)

I don't know if it's too much work but WhisperIME can already detect spoken language in their app before transcribing or translating. (but they don't support being set as System Voice Input)

@woheller69
Copy link

The new Beta supports

RecognizerIntent.ACTION_RECOGNIZE_SPEECH

woheller69/whisperIME#53

You can try it.

@Stypox
Copy link
Owner

Stypox commented Feb 26, 2025

I opened #294 for this, which seems to work, however whisperIME always returns just "you" as speech, and in general I don't understand when I am supposed to talk (since I expect there is some time to load the model at the beginning (?), but the indefinite loading indicator remains like that). Note that my code could be problematic too, I hacked it together this evening.

@thiswillbeyourgithub
Copy link
Author

thiswillbeyourgithub commented Feb 26, 2025

Whisper models return "you" usually when no input is heard. Check your microphone permission

Maybe the indicator is slow because it is downloading the model?

@woheller69
Copy link

Press and hold while speaking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tts&stt Speech-to-text, text-to-speech and wakeword requests or bugs, including Vosk
Projects
None yet
Development

No branches or pull requests

3 participants