Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio recording enhacement #1341

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 25 additions & 4 deletions MAVProxy/modules/mavproxy_chat/chat_voice_to_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
'''

import time
import math
import struct

try:
import pyaudio # install using, "sudo apt-get install python3-pyaudio"
Expand All @@ -15,6 +17,15 @@
print("chat: failed to import pyaudio, wave or openai. See https://ardupilot.org/mavproxy/docs/modules/chat.html")
exit()

def rms( data ):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for this. A few things to fix here:

  1. could you add comments above the function?
  2. I think we should name the function to be more specific so its purpose is clear. Perhaps calc_audio_volume() and maybe it should return the decibels directly instead of making the caller do that.
  3. our normal style is no spaces after brackets. So change "def rms( data )" to "def rms(data)"
  4. I think it's possible that count could be zero if the microphone is not working. In any case we should have protection against divide-by-zero which could happen if count is zero.

count = len(data)/2
format = "%dh"%(count)
shorts = struct.unpack( format, data )
sum_squares = 0.0
for sample in shorts:
n = sample * (1.0/32768)
sum_squares += n*n
return math.sqrt( sum_squares / count )

class chat_voice_to_text():
def __init__(self):
Expand All @@ -34,7 +45,7 @@ def check_connection(self):
try:
self.client = OpenAI()
except Exception:
print("chat: failed to connect to OpenAI")
print("chat: failed to connect to OpenAI - 4")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change to the print statement can be removed.

return False

# return True if connected
Expand All @@ -55,15 +66,25 @@ def record_audio(self):

# calculate time recording should stop
curr_time = time.time()
time_stop = curr_time + 5
time_stop = curr_time + 3

# record until specified time
frames = []
while curr_time < time_stop:

# logic for recording sound until someone is speaking.
isSpeaking = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I think our normal style is to use underscores between variables so let's change "isSpeaking" to "is_speaking".

Also maybe change the comment to be "record sound while user is speaking"

while curr_time < time_stop or isSpeaking:
data = stream.read(1024)
frames.append(data)
rms1 = rms(data)
if rms1!=0.0:
decibel = 20 * math.log10(rms1)
isSpeaking = decibel>-80.0 # -80 is the hardcoded threshold. higher number means louder. Set threshold in the range (-100,0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible let's move this -80 to be a definition at the top of the file where it will be easier to find.

if isSpeaking:
time_stop = time.time()+3
else:
isSpeaking = False
curr_time = time.time()

# Stop and close the stream
stream.stop_stream()
stream.close()
Expand Down
Loading