Can the streaming be used in Android ? #560

usfaa444 · 2023-03-03T15:29:46Z

usfaa444
Mar 3, 2023

So I was going through this awesome library, and by the way thank you very much for such a great work, I am sure it will help thousands of developers, including me.

So I want to use the streaming to be able to transcribe any speech detected from the microphone, I saw that the stream demo you gave in the video, you typed a command in the CLI, is it possible to have this functionality implemented in Android please?

rpikus27 · 2023-03-05T01:49:59Z

rpikus27
Mar 5, 2023

I'm currently implementing this idea on aosp. Where do you want the text to go after it is transcribed? What is the user experience you are trying to build?

6 replies

usfaa444 Mar 9, 2023
Author

So I also decided to try to implement it.
So I copied the stream.cpp, CMakeLists.txt, common.h and common-sdl.h, and I link it to the Android gradle using cmake.

But now in the common-sdl.h file, it there is a problem of "SDL.h and SDL_audio.h files not found"
Please could you help me with that.

I can say that I barely started working with linking C++ code with Android so I don't really know what I should be doing to get that fixed. I know that you said that the stream is using SDL2, and we need to install it, I did.

Now in the CMakeLists.txt file, do we need to specify the path of the installed SDL2 library so that it can find the SDL.h and the SDL_audio.h files?

rjpikus Mar 9, 2023

https://stackoverflow.com/questions/59750830/how-to-use-my-prebuilt-c-static-shared-library-in-aosp-project
https://android.googlesource.com/platform/build/soong/+/master/README.md
These might help. Google is migrating to Soong away from make files. Have you tried doing just that, specifying the path of the installed SDL2 Lib so CMakeLists.txt can find SDL.h and SDL_audio.h? I do not know off the top of my head.

rjpikus Mar 20, 2023

Did you figure this out? What are you doing to make the calls?
https://developer.android.com/reference/android/telecom/package-summary

usfaa444 Mar 20, 2023
Author

Hello @rjpikus no I did not, and sorry for not getting back to you, I totally forgot. I am using Pexip for the calls

rjpikus Mar 26, 2023

public static final int CAPABILITY_ADHOC_CONFERENCE_CALLING
https://developer.android.com/reference/android/telecom/PhoneAccount
maybe use this inside android app if you want conferencing still

usfaa444 · 2023-03-22T01:40:17Z

usfaa444
Mar 22, 2023
Author

So I decided to not use the stream for my app, instead I am recording audio of 10seconds and putting them in a queue where I am transcribing them one at a time.
The problem I encountered is the following.
When I test on a mibile phone, like my Samsung galaxy s22 ultra, everything is fine, and i have every audio transcribed with 3 to 4 seconds.
But on some other devices, when transcribing the first audio, the transcsibData funtion in the LibWhisper get stuck it doesn't return anything until I close the app (the app goes to the background) and then at this momebt, then transcrib result is returned and the other audio in the queue get transcribed as well with 3 to 4 seconds.

So any idea why when the app is running it is not transcribing the audio, and wait until the app goes to the background before?

Also to make sure of the behavior, I installed tour demo app on the same device and it is thz same behavior when I click on the TranscribSample button, it reads the audio, and then it doesn't show anything, when I click the back button and the app goes to the background and I open the app again, then I see the transcribed text on the screen.

1 reply

rjpikus Mar 26, 2023

What is the benefit of parsable/parsed segments? No more streaming because you worked around the issue? How is this better
?
To answer your question: One idea is that the interface between the app and os is not right or the data object isn't perfect. You can set timing logs to see where it's lagging. A solution: may be to look into the app tapping into the os. Your app is put into a package (APK). Android Application Framework is the interface between your APK and os, providing APIs and methods that enable the APK and os to interact. It's all code
.

rpikus27 · 2023-03-22T04:31:31Z

rpikus27
Mar 22, 2023

I don't know why, but you can put logs in between your method calls. I am using android.telecom to make calls

…

On Tue, Mar 21, 2023 at 6:40 PM usfaa444 ***@***.***> wrote: So I decided to not use the stream for my app, instead I am recording audio of 10seconds and putting them in a queue where I am transcribing them one at a time. The problem I encountered is the following. When I test on a mibile phone, like my Samsung galaxy s22 ultra, everything is fine, and i have every audio transcribed with 3 to 4 seconds. But on some other devices, when transcribing the first audio, the transcsibData funtion in the LibWhisper get stuck it doesn't return anything until I close the app (the app goes to the background) and then at this momebt, then transcrib result is returned and the other audio in the queue get transcribed as well with 3 to 4 seconds. So any idea why when the app is running it is not transcribing the audio, and wait until the app goes to the background before? Also to make sure of the behavior, I installed tour demo app on the same device and it is thz same behavior when I click on the TranscribSample button, it reads the audio, and then it doesn't show anything, when I click the back button and the app goes to the background and I open the app again, then I see the transcribed text on the screen. — Reply to this email directly, view it on GitHub <#560 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A446LP6AUOHQHX5QSP5ICZLW5JKA3ANCNFSM6AAAAAAVOX5A34> . You are receiving this because you commented.Message ID: ***@***.***>

1 reply

usfaa444 Mar 22, 2023
Author

Yeah I did put sole logs but some of them are not getting called. They are called only after the app goes to the background

rpdrewes · 2023-04-09T03:05:00Z

rpdrewes
Apr 9, 2023

I use whisper.cpp from my de-googled Android phone daily for months now for dictation to SMS and email and web search. My preferred method is to use the Konele app as the voice input frontend which sends the audio to my own private server which is running a very simple python wrapper around whisper.cpp. Here is my project: https://github.com/rpdrewes/whisper-websocket-server

Another alternative is this port of whisper.cpp to run natively on Android. I have tested it and it also works well but it is not as fast as my server-based approach in my environment (but if you have a slower server and/or a faster phone it might be better for you). It uses Konele as a frontend for the input as well, I believe. See: https://github.com/alex-vt/WhisperInput

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can the streaming be used in Android ? #560

{{title}}

Replies: 4 comments 8 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Can the streaming be used in Android ? #560

Replies: 4 comments · 8 replies

usfaa444 Mar 9, 2023 Author

usfaa444 Mar 20, 2023 Author

usfaa444 Mar 22, 2023 Author

usfaa444 Mar 22, 2023 Author

Replies: 4 comments 8 replies

usfaa444 Mar 9, 2023
Author

usfaa444 Mar 20, 2023
Author

usfaa444
Mar 22, 2023
Author

usfaa444 Mar 22, 2023
Author