Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor VAD end with background noise #113

Open
dslugPX opened this issue May 26, 2023 · 8 comments
Open

Poor VAD end with background noise #113

dslugPX opened this issue May 26, 2023 · 8 comments
Labels
1.0 Issues to address for 1.0 release audio Issues related to audio wake, speech recognition, audio quality, etc dynamic configuration Config and behavior changes for post-dynamic configuration support

Comments

@dslugPX
Copy link

dslugPX commented May 26, 2023

As mentioned in issue 112 we have a relatively noisy environment as we have music running 24/7.

We have noticed that sometimes willow may seem to be listening to the music in addition to our voices.
I believe it may be causing some of the issues mentioned in 112, in particular scenario 3.

I have also seen (only once) it pick up what I think was a drum beat as the command "No no no"

Happy to help by providing whatever I can for you.

Also - should mention I have two ESP32s in flight now, and one more in a box still so I can certainly try some different settings and the like as well.

Cheers!

@kristiankielhofner
Copy link
Contributor

Thank you for filing separate issues, we'll be addressing them in commits for you to test with later today.

As I've noted previously, of all of the reports we are getting you seem to be having the most usability issues. It's reassuring to us that even with these initial and very early problems your experience is still positive enough to order more devices!

@dslugPX
Copy link
Author

dslugPX commented May 26, 2023 via email

@kristiankielhofner
Copy link
Contributor

Wow, yeah... Now that I'm hearing about the network situation I suppose I'm even happier Willow works as well as it does, especially for hardware that is 2.4 GHz only. Do you have any plans to address some of that? I wouldn't ask you to do it for Willow - after all, we aim to be the best speech solution in the world and it's good to know it's being used in environments that are... Let's just go with "suboptimal" for a wireless network connected speech recognition device. Don't take this the wrong way but from the sounds of it someone couldn't purposely design more of an environmental nightmare for a solution like Willow ;). I'm almost surprised it works at all.

I'm very sorry to hear you have severe tinnitus. I don't have it myself but from what I understand it's dramatically life-impacting.

Yes, background noise is always a challenge. The ESP BOX and the various libraries do (IMO) a very good job with it but at the end of the day you can start to run out of magic. That said we have plenty of knobs to tweak and we'll get the full set to you later today.

@dslugPX
Copy link
Author

dslugPX commented May 26, 2023 via email

@stintel
Copy link
Collaborator

stintel commented May 30, 2023

My apartment is also relatively noisy and I too run into AUDIO_REC_VAD_END not triggering. With 7da6d73 we will force the stream to end after CONFIG_WILLOW_STREAM_TIMEOUT seconds, which avoids endless stream and setting it to 5 works around that problem somewhat.

Last night I wondered if reducing the mic gain would help in noisy environments, so I added a Kconfig option to set mic gain in 00f0d1b. Could you please test if reducing the mic gain helps in noisy environments? I'm currently travelling so can't test myself.

@kristiankielhofner
Copy link
Contributor

@dslugPX As shown in the commit reference I also just added a parameter exposed under "Advanced Configuration" to configure the "aggressiveness" of VAD - higher values mean it will be more selective in considering what constitutes speech. In my initial testing VAD_MODE_4 (most aggressive) helps with this issue, but you may want to play with the various levels in your environment.

@dslugPX
Copy link
Author

dslugPX commented Jun 7, 2023

@dslugPX As shown in the commit reference I also just added a parameter exposed under "Advanced Configuration" to configure the "aggressiveness" of VAD - higher values mean it will be more selective in considering what constitutes speech. In my initial testing VAD_MODE_4 (most aggressive) helps with this issue, but you may want to play with the various levels in your environment.

Nice. I'll have a little time coming up in the next few days to do some updates and try a few more things.
We are still using this daily and trying to note things we are finding. Drums are definitely a source of trouble, but I only did the one update since we put them online so most of your more interesting changes aren't in use yet.
Will follow up again soon!

btw - bunch of esp32 boxes hit ADAfruit this afternoon, so I'm guessing you will have a new run of users coming at you soon!

@kristiankielhofner
Copy link
Contributor

Thanks, appreciate it!

Yep, we saw a bunch come into Mouser too!

@kristiankielhofner kristiankielhofner added audio Issues related to audio wake, speech recognition, audio quality, etc dynamic configuration Config and behavior changes for post-dynamic configuration support 1.0 Issues to address for 1.0 release labels Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0 Issues to address for 1.0 release audio Issues related to audio wake, speech recognition, audio quality, etc dynamic configuration Config and behavior changes for post-dynamic configuration support
Projects
None yet
Development

No branches or pull requests

3 participants