Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have an issue where when I am using real time transcription, when I am not talking, it seems like it parses random text. #4

Open
heromanofe opened this issue Dec 7, 2023 · 23 comments

Comments

@heromanofe
Copy link

I was able to setup model and it works really great. My code is:

`private fun testAudio() {
// Initialize Whisper
val mWhisper = Whisper(this) // Create Whisper instance

// Load model and vocabulary for Whisper
val basePath = Global.fileOperations.getOutputDirectory("/Models", this)!!.path
val modelPath = basePath + "/whisper-tiny.tflite" // Provide model file path

    val vocabPath: String = basePath +
        "/filters_vocab_multilingual.bin" // Provide vocabulary file path
    println("PATHS: ")
    println(modelPath)
    println(vocabPath)
    mWhisper.loadModel(modelPath, vocabPath, true) // Load model and set multilingual mode

// Set a listener for Whisper to handle updates and results

    mWhisper.setListener(object : IWhisperListener {
        override fun onUpdateReceived(message: String?) {
            Log.i("TRANSCRIBE_WHISPER", "New State: $message")
            // Handle Whisper status updates
        }

        override fun onResultReceived(result: String?) {
            Log.i("TRANSCRIBE_WHISPER", result ?: "")
            // Handle transcribed results
        }
    })
    // Initialize Recorder
    val mRecorder = Recorder(this) // Create Recorder instance

// Set a listener for Recorder to handle updates and audio data
mRecorder.setListener(object : IRecorderListener {
override fun onUpdateReceived(message: String) {
// Handle Recorder status updates
}

        override fun onDataReceived(samples: FloatArray) {
            // Handle audio data received during recording
            // You can forward this data to Whisper for live recognition using writeBuffer()
            mWhisper.writeBuffer(samples);
        }
    })

    mRecorder.start(); // Start recording

}`

and  override fun onResultReceived(result: String?) {
            Log.i("TRANSCRIBE_WHISPER", result ?: "")
            // Handle transcribed results
        }

seemed to return:

[audioRecordData][fine] 5s(f:5014 m:0 s:0) : pid 8824 uid 10419 sessionId 41305 sr 16000 ch 1 fmt 1

I'll make a hole in the hole.
2 times this:

[audioRecordData][fine] 10s(f:10000 m:0 s:0) : pid 8824 uid 10419 sessionId 41305 sr 16000 ch 1 fmt 1
then
I'll be back with a little .... <== repeated a lot

thanks for you hard work :P

@vilassn
Copy link
Owner

vilassn commented Dec 7, 2023

This can be fixed with VAD detection support. But, VAD detection is not yet implemented.

@ITHealer
Copy link

ITHealer commented Dec 7, 2023

This can be fixed with VAD detection support. But, VAD detection is not yet implemented.

image

I am trying to apply VAD into the C++ source of my project. Get ideas from file: https://github.com/vilassn/whisper_android/blob/master/app/src/main/cpp/silent_detection.cpp

I tried calculating dB for each input audio clip according to BUFFER_SIZE then keeping only the audio clips that have speech inserted into outputBuffer. Then use this vector to calculate log_mel_spectrogram(...). However, the test results gave me a completely different sentence than the original sentence.

This is the result when I choose the threshold as -45.0:
image

This is the result when I choose the threshold as -40.0:
image

This is the result when I choose the threshold as -35.0:
image

  • Can you help me assess where the problem might be?

@heromanofe
Copy link
Author

Yea but I don't understand how VAD can fix.. random text detected. I will check what audio is recorded and report back.

@vilassn
Copy link
Owner

vilassn commented Dec 7, 2023

@heromanofe 512 samples are taken as a window to determine the silence for 31.25 ms. If there is sequence of silence, lets say 16 windows are silent continuously, then consider there is no voice activity (i.e. silence).

In short, check for 500ms of silence instead of 31.25 ms. 500ms means 16 windows in sequence.

I hope, this should works. I should check this too.

@heromanofe
Copy link
Author

I've noticed interesting thing, I have multi-lag model and it translates my speech when I think it shouldn't

@vilassn
Copy link
Owner

vilassn commented Dec 7, 2023

@heromanofe Yes. This is default behaviour for other languages. It translates to English if input language is other than English. We need to regenerate model with required configuration.

@heromanofe
Copy link
Author

speaking of which, I would be interested in self-generating those bin and tflite files or at least having some place where I can download other models. I will check in 1-2 hrs what whisper receives from recorder.

@heromanofe
Copy link
Author

https://1drv.ms/u/s!AgXqUQNVnl-xmZ07Nq71pVUibaZUOg?e=blb6zR <-- Onedrive link, if you want, I can send file using other way.
here is the audio.
here is output from my app.

2023-12-07 18:18:47.100 16170-16184 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I Compiler allocated 6018KB to compile java.lang.Object com.MyStudio.MyAppName~.model.XMLRPC.exeKwSafe(java.lang.String, java.lang.String, java.lang.Object, java.util.Map, com.MyStudio.MyAppName~.Permissions, boolean, boolean, boolean, kotlin.coroutines.Continuation)
2023-12-07 18:18:47.504 16170-16360 AudioRecord com.MyStudio.MyAppName~ D [audioRecordData][fine] 5s(f:5000 m:0 s:0) : pid 16170 uid 10419 sessionId 41849 sr 16000 ch 1 fmt 1
2023-12-07 18:18:47.959 16170-16351 System.out com.MyStudio.MyAppName~ I task refresh start
2023-12-07 18:18:47.964 16170-16463 System.out com.MyStudio.MyAppName~ I Already Running...!
2023-12-07 18:18:49.186 16170-16359 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I .
2023-12-07 18:18:49.217 16170-16185 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I NativeAlloc concurrent copying GC freed 96451(6415KB) AllocSpace objects, 28(668KB) LOS objects, 50% free, 12MB/25MB, paused 434us,49us total 127.362ms
2023-12-07 18:18:51.738 16170-16185 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I NativeAlloc concurrent copying GC freed 61908(2417KB) AllocSpace objects, 3(228KB) LOS objects, 50% free, 16MB/33MB, paused 1.142ms,1.206ms total 263.029ms
2023-12-07 18:18:51.848 16170-16359 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I I'll make a hole in the hole
2023-12-07 18:18:52.115 16170-16181 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I Thread[6,tid=16181,WaitingInMainSignalCatcherLoop,Thread*=0xb400007c9343d000,peer=0x13d00000,"Signal Catcher"]: reacting to signal 3
2023-12-07 18:18:52.115 16170-16181 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I
2023-12-07 18:18:52.306 16170-16181 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I Wrote stack traces to tombstoned
2023-12-07 18:18:52.504 16170-16360 AudioRecord com.MyStudio.MyAppName~ D [audioRecordData][fine] 10s(f:10000 m:0 s:0) : pid 16170 uid 10419 sessionId 41849 sr 16000 ch 1 fmt 1
2023-12-07 18:18:54.449 16170-16464 System.out com.MyStudio.MyAppName~ I widget tasks! Took: 7Seconds, 438Milliseconds
2023-12-07 18:18:54.458 16170-16464 System.out com.MyStudio.MyAppName~ I Tag Was Removed...!
2023-12-07 18:18:54.460 16170-16170 Choreographer com.MyStudio.MyAppName~ I Skipped 875 frames! The application may be doing too much work on its main thread.
2023-12-07 18:18:54.504 16170-16185 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I NativeAlloc concurrent copying GC freed 64933(2171KB) AllocSpace objects, 1(188KB) LOS objects, 50% free, 19MB/39MB, paused 2.278ms,1.462ms total 385.249ms
2023-12-07 18:18:54.569 16170-16359 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I I'll make a little more of the dough.
2023-12-07 18:18:55.591 16170-16658 ProfileInstaller com.MyStudio.MyAppName~ D Installing profile for com.MyStudio.MyAppName~
2023-12-07 18:18:55.615 16170-16199 OpenGLRenderer com.MyStudio.MyAppName~ I Davey! duration=8462ms; Flags=0, FrameTimelineVsyncId=368239147, IntendedVsync=1075489537039394, Vsync=1075496847566394, InputEventId=0, HandleInputStart=1075496856064781, AnimationStart=1075496856077020, PerformTraversalsStart=1075496860344781, DrawStart=1075497881552749, FrameDeadline=1075489549372727, FrameInterval=1075496855368999, FrameStartTime=8354888, SyncQueued=1075497972425353, SyncStart=1075497972721343, IssueDrawCommandsStart=1075497974028738, SwapBuffers=1075497993450405, FrameCompleted=1075497999888686, DequeueBufferDuration=31823, QueueBufferDuration=672396, GpuCompleted=1075497999888686, SwapBuffersCompleted=1075497994771238, DisplayPresentTime=0, CommandSubmissionCompleted=1075497993450405,
2023-12-07 18:18:55.654 16170-16170 Choreographer com.MyStudio.MyAppName~ I Skipped 142 frames! The application may be doing too much work on its main thread.
2023-12-07 18:18:55.732 16170-16199 OpenGLRenderer com.MyStudio.MyAppName~ I Davey! duration=1261ms; Flags=0, FrameTimelineVsyncId=368245973, IntendedVsync=1075496863178718, Vsync=1075498049451830, InputEventId=0, HandleInputStart=1075498049999468, AnimationStart=1075498050005770, PerformTraversalsStart=1075498050892228, DrawStart=1075498083264780, FrameDeadline=1075496883866087, FrameInterval=1075498049743947, FrameStartTime=8354036, SyncQueued=1075498098311811, SyncStart=1075498098412801, IssueDrawCommandsStart=1075498100124311, SwapBuffers=1075498116940457, FrameCompleted=1075498124907593, DequeueBufferDuration=76666, QueueBufferDuration=345781, GpuCompleted=1075498124907593, SwapBuffersCompleted=1075498117692801, DisplayPresentTime=0, CommandSubmissionCompleted=1075498116940457,
2023-12-07 18:18:56.889 16170-16359 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I I'll make a hole in the hole
2023-12-07 18:18:57.504 16170-16360 AudioRecord com.MyStudio.MyAppName~ D [audioRecordData][fine] 15s(f:15000 m:0 s:0) : pid 16170 uid 10419 sessionId 41849 sr 16000 ch 1 fmt 1
2023-12-07 18:18:59.690 16170-16359 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I I'll make a small piece of cake with a little bit of sugar.
2023-12-07 18:19:02.506 16170-16360 AudioRecord com.MyStudio.MyAppName~ D [audioRecordData][fine] 20s(f:20002 m:0 s:0) : pid 16170 uid 10419 sessionId 41849 sr 16000 ch 1 fmt 1
2023-12-07 18:19:02.543 16170-16359 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I I'll make a hole in the hole.
2023-12-07 18:19:05.311 16170-16359 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I you
2023-12-07 18:19:07.504 16170-16360 AudioRecord com.MyStudio.MyAppName~ D [audioRecordData][fine] 25s(f:25000 m:0 s:0) : pid 16170 uid 10419 sessionId 41849 sr 16000 ch 1 fmt 1
2023-12-07 18:19:08.509 16170-16359 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I I'll make a hole in the hole.
2023-12-07 18:19:11.442 16170-16359 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I I'll make a hole in the hole
2023-12-07 18:19:12.261 16318-16338 System com.MyStudio.MyAppName~ W A resource failed to call close.
2023-12-07 18:19:12.505 16170-16360 AudioRecord com.MyStudio.MyAppName~ D [audioRecordData][fine] 30s(f:30000 m:0 s:0) : pid 16170 uid 10419 sessionId 41849 sr 16000 ch 1 fmt 1
2023-12-07 18:19:12.562 16170-16360 AudioRecord com.MyStudio.MyAppName~ D stop mSessionID=41849
2023-12-07 18:19:12.563 16170-16360 AudioRecord com.MyStudio.MyAppName~ D stop(10025): mActive:1
2023-12-07 18:19:12.607 16170-16360 AudioRecord com.MyStudio.MyAppName~ D stop mSessionID=41849
2023-12-07 18:19:12.607 16170-16360 AudioRecord com.MyStudio.MyAppName~ D stop(10025): mActive:0
2023-12-07 18:19:12.607 16170-16360 AudioRecord com.MyStudio.MyAppName~ D stop mSessionID=41849
2023-12-07 18:19:12.607 16170-16360 AudioRecord com.MyStudio.MyAppName~ D stop(10025): mActive:0
2023-12-07 18:19:12.664 16170-16360 Recorder com.MyStudio.MyAppName~ D Recorded file: /storage/emulated/0/Android/media/com.MyStudio.MyAppName~/MyAppName~/Models/test.wav
2023-12-07 18:19:14.681 16170-16359 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I I'll make a small piece of cake with a little bit of sugar.

@heromanofe
Copy link
Author

Okay, you were sooo right :D I remembered that I looked into VAD before. I implemented this https://github.com/gkonovalov/android-vad
into my project, using
implementation 'org.tensorflow:tensorflow-lite-task-audio:0.4.0'
implementation 'com.github.gkonovalov.android-vad:yamnet:2.0.4'
and in your code:

(Recorder)

VadYamnet vad = Vad.builder()
.setContext(mContext)
.setSampleRate(SampleRate.SAMPLE_RATE_16K)
.setFrameSize(FrameSize.FRAME_SIZE_487)
.setMode(Mode.NORMAL)
.setSilenceDurationMs(200)
.setSpeechDurationMs(30)
.build();
before while loop and inside while loop:

SoundCategory soundCategory = vad.classifyAudio(samples);
Log.d(TAG, soundCategory.getLabel());
Log.d(TAG, String.valueOf(soundCategory.getScore()));
// Send samples for transcription
if(soundCategory.getLabel().equals("Speech") && soundCategory.getScore() > 0.5)
sendData(samples);

and result is this:

2023-12-07 19:27:45.835 7830-8027 Recorder com.MyStudio.MyAppName~ D Silence
2023-12-07 19:27:45.835 7830-8027 Recorder com.MyStudio.MyAppName~ D 0.0
2023-12-07 19:27:45.999 7830-8018 System.out com.MyStudio.MyAppName~ I Main User Logging (Auto-Login) Took: 1Seconds, 782Milliseconds
2023-12-07 19:27:46.268 7830-7830 DecorView[] com.MyStudio.MyAppName~ D onWindowFocusChanged hasWindowFocus false
2023-12-07 19:27:46.330 30975-31175 ActivityManagerWrapper com.mi.android.globallauncher E getRecentTasks: mainTaskId=3824 userId=0 baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.MyStudio.MyAppName~/com.MyStudio.MyAppName~.MainActivity} }
2023-12-07 19:27:46.349 30975-31175 ActivityManagerWrapper com.mi.android.globallauncher E getRecentTasks: mainTaskId=3824 userId=0 baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.MyStudio.MyAppName~/com.MyStudio.MyAppName~.MainActivity} }
2023-12-07 19:27:46.456 7830-7830 com.github...orActivity com.MyStudio.MyAppName~ D Detect NFC state changes while previously enabled
2023-12-07 19:27:46.456 7830-7830 com.github...orActivity com.MyStudio.MyAppName~ D NFC state remains enabled
2023-12-07 19:27:46.458 7830-7830 System.out com.MyStudio.MyAppName~ I task refresh start
2023-12-07 19:27:46.478 7830-7830 DecorView[] com.MyStudio.MyAppName~ D onWindowFocusChanged hasWindowFocus true
2023-12-07 19:27:46.505 7830-7830 HandWritingStubImpl com.MyStudio.MyAppName~ I refreshLastKeyboardType: 1
2023-12-07 19:27:46.505 7830-7830 HandWritingStubImpl com.MyStudio.MyAppName~ I getCurrentKeyboardType: 1
2023-12-07 19:27:46.506 30975-31175 ActivityManagerWrapper com.mi.android.globallauncher E getRecentTasks: mainTaskId=3824 userId=0 baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.MyStudio.MyAppName~/com.MyStudio.MyAppName~.MainActivity} }
2023-12-07 19:27:46.551 30975-31175 ActivityManagerWrapper com.mi.android.globallauncher E getRecentTasks: mainTaskId=3824 userId=0 baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.MyStudio.MyAppName~/com.MyStudio.MyAppName~.MainActivity} }
2023-12-07 19:27:46.965 7830-7849 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I Compiler allocated 6018KB to compile java.lang.Object com.MyStudio.MyAppName~.model.NetixXMLRPC.exeKwSafe(java.lang.String, java.lang.String, java.lang.Object, java.util.Map, com.MyStudio.MyAppName~.Permissions, boolean, boolean, boolean, kotlin.coroutines.Continuation)
2023-12-07 19:27:47.234 7830-8093 System.out com.MyStudio.MyAppName~ I task refresh start
2023-12-07 19:27:47.236 7830-8095 System.out com.MyStudio.MyAppName~ I Already Running...!
2023-12-07 19:27:47.681 7830-8027 AudioRecord com.MyStudio.MyAppName~ D [audioRecordData][fine] 5s(f:5019 m:0 s:0) : pid 7830 uid 10419 sessionId 42009 sr 16000 ch 1 fmt 1
2023-12-07 19:27:48.782 7830-8027 Recorder com.MyStudio.MyAppName~ D Silence
2023-12-07 19:27:48.782 7830-8027 Recorder com.MyStudio.MyAppName~ D 0.0
2023-12-07 19:27:49.617 7830-7845 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I Thread[6,tid=7845,WaitingInMainSignalCatcherLoop,Thread*=0xb400007c9343d000,peer=0x13c803d0,"Signal Catcher"]: reacting to signal 3
2023-12-07 19:27:49.617 7830-7845 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I
2023-12-07 19:27:49.755 7830-7845 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I Wrote stack traces to tombstoned
2023-12-07 19:27:51.730 7830-8027 Recorder com.MyStudio.MyAppName~ D Speech
2023-12-07 19:27:51.730 7830-8027 Recorder com.MyStudio.MyAppName~ D 0.95703125
2023-12-07 19:27:52.171 7830-7845 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I Thread[6,tid=7845,WaitingInMainSignalCatcherLoop,Thread*=0xb400007c9343d000,peer=0x13c803d0,"Signal Catcher"]: reacting to signal 3
2023-12-07 19:27:52.171 7830-7845 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I
2023-12-07 19:27:52.367 7830-8017 System.out com.MyStudio.MyAppName~ I widget tasks! Took: 5Seconds, 518Milliseconds
2023-12-07 19:27:52.368 7830-8017 System.out com.MyStudio.MyAppName~ I Tag Was Removed...!
2023-12-07 19:27:52.370 7830-7830 Choreographer com.MyStudio.MyAppName~ I Skipped 658 frames! The application may be doing too much work on its main thread.
2023-12-07 19:27:52.418 7830-7845 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I Waiting for a blocking GC ObjectsAllocated
2023-12-07 19:27:52.551 7830-7850 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I NativeAlloc concurrent copying GC freed 29251(1167KB) AllocSpace objects, 4(412KB) LOS objects, 50% free, 19MB/39MB, paused 145us,59us total 257.088ms
2023-12-07 19:27:52.551 7830-7845 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I WaitForGcToComplete blocked ObjectsAllocated on NativeAlloc for 133.523ms
2023-12-07 19:27:52.552 7830-7845 MyStudio.MyAppName~ com.MyStudio.MyAppName~ I Wrote stack traces to tombstoned
2023-12-07 19:27:52.681 7830-8027 AudioRecord com.MyStudio.MyAppName~ D [audioRecordData][fine] 10s(f:10020 m:0 s:0) : pid 7830 uid 10419 sessionId 42009 sr 16000 ch 1 fmt 1
2023-12-07 19:27:53.282 7830-7866 OpenGLRenderer com.MyStudio.MyAppName~ I Davey! duration=6403ms; Flags=0, FrameTimelineVsyncId=373222770, IntendedVsync=1079629264489237, Vsync=1079634761097501, InputEventId=0, HandleInputStart=1079634765932890, AnimationStart=1079634765937525, PerformTraversalsStart=1079634767145181, DrawStart=1079635577064712, FrameDeadline=1079629276822570, FrameInterval=1079634765744192, FrameStartTime=8353508, SyncQueued=1079635649329816, SyncStart=1079635649478150, IssueDrawCommandsStart=1079635650830285, SwapBuffers=1079635662302577, FrameCompleted=1079635668211848, DequeueBufferDuration=43594, QueueBufferDuration=365625, GpuCompleted=1079635668211848, SwapBuffersCompleted=1079635663156587, DisplayPresentTime=0, CommandSubmissionCompleted=1079635662302577,
2023-12-07 19:27:53.307 7830-8167 ProfileInstaller com.MyStudio.MyAppName~ D Installing profile for com.MyStudio.MyAppName~
2023-12-07 19:27:53.369 7830-7830 Choreographer com.MyStudio.MyAppName~ I Skipped 119 frames! The application may be doing too much work on its main thread.
2023-12-07 19:27:53.470 7830-7866 OpenGLRenderer com.MyStudio.MyAppName~ I Davey! duration=1089ms; Flags=0, FrameTimelineVsyncId=373233395, IntendedVsync=1079634769368090, Vsync=1079635763385681, InputEventId=0, HandleInputStart=1079635764478931, AnimationStart=1079635764496691, PerformTraversalsStart=1079635765899764, DrawStart=1079635814580702, FrameDeadline=1079634790054512, FrameInterval=1079635763763462, FrameStartTime=8353089, SyncQueued=1079635834791327, SyncStart=1079635834967316, IssueDrawCommandsStart=1079635837195545, SwapBuffers=1079635852484764, FrameCompleted=1079635859409816, DequeueBufferDuration=51198, QueueBufferDuration=531875, GpuCompleted=1079635859409816, SwapBuffersCompleted=1079635854016743, DisplayPresentTime=0, CommandSubmissionCompleted=1079635852484764,
2023-12-07 19:27:54.352 7830-8026 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I hello this is test to use
2023-12-07 19:27:54.714 7830-8027 Recorder com.MyStudio.MyAppName~ D Speech
2023-12-07 19:27:54.714 7830-8027 Recorder com.MyStudio.MyAppName~ D 0.95703125
2023-12-07 19:27:56.644 7830-8026 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I 16k sample rate and
2023-12-07 19:27:57.698 7830-8027 Recorder com.MyStudio.MyAppName~ D Speech
2023-12-07 19:27:57.699 7830-8027 Recorder com.MyStudio.MyAppName~ D 0.98046875
2023-12-07 19:27:57.699 7830-8027 AudioRecord com.MyStudio.MyAppName~ D [audioRecordData][fine] 15s(f:15038 m:0 s:0) : pid 7830 uid 10419 sessionId 42009 sr 16000 ch 1 fmt 1
2023-12-07 19:27:59.592 7830-8026 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I and the frame size 487.
2023-12-07 19:28:00.706 7830-8027 Recorder com.MyStudio.MyAppName~ D Speech
2023-12-07 19:28:00.706 7830-8027 Recorder com.MyStudio.MyAppName~ D 0.96875
2023-12-07 19:28:02.532 7830-8026 TRANSCRIBE_WHISPER com.MyStudio.MyAppName~ I with normal mode. Speech detection.
2023-12-07 19:28:02.682 7830-8027 AudioRecord com.MyStudio.MyAppName~ D [audioRecordData][fine] 20s(f:20020 m:0 s:0) : pid 7830 uid 10419 sessionId 42009 sr 16000 ch 1 fmt 1
2023-12-07 19:28:03.696 7830-8027 Recorder com.MyStudio.MyAppName~ D Silence
2023-12-07 19:28:03.696 7830-8027 Recorder com.MyStudio.MyAppName~ D 0.0
2023-12-07 19:28:06.685 7830-8027 Recorder com.MyStudio.MyAppName~ D Silence
2023-12-07 19:28:06.685 7830-8027 Recorder com.MyStudio.MyAppName~ D 0.0
2023-12-07 19:28:07.681 7830-8027 AudioRecord com.MyStudio.MyAppName~ D [audioRecordData][fine] 25s(f:25020 m:0 s:0) : pid 7830 uid 10419 sessionId 42009 sr 16000 ch 1 fmt 1
2023-12-07 19:28:09.681 7830-8027 Recorder com.MyStudio.MyAppName~ D Animal
2023-12-07 19:28:09.681 7830-8027 Recorder com.MyStudio.MyAppName~ D 0.4140625
2023-12-07 19:28:12.526 7987-8007 System com.MyStudio.MyAppName~ W A resource failed to call close.
2023-12-07 19:28:12.670 7830-8027 Recorder com.MyStudio.MyAppName~ D Silence
2023-12-07 19:28:12.671 7830-8027 Recorder com.MyStudio.MyAppName~ D 0.0
2023-12-07 19:28:12.680 7830-8027 AudioRecord com.MyStudio.MyAppName~ D stop mSessionID=42009
2023-12-07 19:28:12.680 7830-8027 AudioRecord com.MyStudio.MyAppName~ D stop(10055): mActive:1
2023-12-07 19:28:12.740 7830-8027 AudioRecord com.MyStudio.MyAppName~ D stop mSessionID=42009
2023-12-07 19:28:12.740 7830-8027 AudioRecord com.MyStudio.MyAppName~ D stop(10055): mActive:0
2023-12-07 19:28:12.741 7830-8027 AudioRecord com.MyStudio.MyAppName~ D stop mSessionID=42009
2023-12-07 19:28:12.741 7830-8027 AudioRecord com.MyStudio.MyAppName~ D stop(10055): mActive:0
2023-12-07 19:28:12.753 7830-8027 Recorder com.MyStudio.MyAppName~ D Recorded file: /storage/emulated/0/Android/media/com.MyStudio.MyAppName~/MyAppName~/Models/test.wav

here is onedrive link to file:
https://1drv.ms/u/s!AgXqUQNVnl-xmZ086HH_M3ekp8XeUQ?e=dQyePD

@vilassn
Copy link
Owner

vilassn commented Dec 7, 2023

Has your problem been solved?

@heromanofe
Copy link
Author

Has your problem been solved?

it was VAD problem, thou I wouldn't be celebrating for now. I noticed there is some speech it detected as silence instead :D I need to fine-tune it, but then its working 100% :P thanks for you work

@ITHealer
Copy link

ITHealer commented Dec 8, 2023

Has your problem been solved?

it was VAD problem, thou I wouldn't be celebrating for now. I noticed there is some speech it detected as silence instead :D I need to fine-tune it, but then its working 100% :P thanks for you work

Can you guide me how to run the project from the repo: https://github.com/gkonovalov/android-vad
Is that Okay?

I ran it but when I clicked record even though I was still talking the result was "Noise detected". I don't understand how it works?

image

@heromanofe
Copy link
Author

I don't know about app, all I did was this (in Recorder.java file)
image
image
image

@heromanofe
Copy link
Author

Quick update about my situation, I decided to write kotlin code for real-time recognition. it works very simple, I am taking your recording system and just leaving out 1second chunks part. then in my code I have a system for tracking timeout.
there are 2 timeouts, first: if I don't talk for 5 seconds after activating, timeout
and when I stop talking < 2 second timeout.
when 2nd timeout happens, I am gathering all floatArrays I've created and pushing to whisper for recognition, result is this:

2023-12-11 19:53:26.300 30867-31012 WHISPER: New State com.ERPStudio.ErpDroid W READY
2023-12-11 19:53:26.309 30867-31012 WHISPER: New State com.ERPStudio.ErpDroid W LISTENING_WAITING
2023-12-11 19:53:29.591 30867-31039 WHISPER: New State com.ERPStudio.ErpDroid W LISTENING_RECORDING
2023-12-11 19:53:34.509 30867-31039 System.out com.ERPStudio.ErpDroid I Whisper: recognizing text....
2023-12-11 19:53:34.509 30867-31039 WHISPER: New State com.ERPStudio.ErpDroid W READY
2023-12-11 19:53:37.363 30867-31038 TRANSCRIBE_WHISPER com.ERPStudio.ErpDroid I Test Test 1,2,3, Test Test
so transcribe whisper took 3 seconds to recognise that text I have.

I am making 2bl app, I need both: TTS which like here can be slow and Commands (like start X do Y) and those specifically ideally should be very quick, but this 3 second delay is too much for me. what can you suggest for speed optimization, keep in mind I am using right now whisper-tiny.tflite, so multi-lang model. would using eng model speed things up?

@vilassn
Copy link
Owner

vilassn commented Dec 14, 2023

Transcription time varies device to device. On high end device, transcription time will be less.

You can debug what is taking more time.
Whether it is Mel spectrogram calculation or inference.

@matanel-6over6
Copy link

Hi, first of all thanks for the hard work. Is there a solution to the quiet mode issue? I don't speak and there is complete silence and words are still coming back to me

@heromanofe
Copy link
Author

@matanel-6over6 scroll up for screenshots, here is library: https://github.com/gkonovalov/android-vad
You need VAD and that was pretty good solution for me

@matanel-6over6
Copy link

@heromanofe Thanks for the quick reply. What should I take from the project I mentioned to Vilassn's project?

@heromanofe
Copy link
Author

you implement that library in gradle (
implementation 'org.tensorflow:tensorflow-lite-task-audio:0.4.0'
implementation 'com.github.gkonovalov.android-vad:yamnet:2.0.4'
)
and for this project, Recorder.java <-- file you add vad there

@matanel-6over6
Copy link

Do I need to add what you marked to the Class of the recorder?

@heromanofe
Copy link
Author

in screenshot stuff there, implementation is gradle (app/build.gradle)

@matanel-6over6
Copy link

@heromanofe Yes, I understand, thank you very much.

@matanel-6over6
Copy link

@heromanofe Working grate. Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants