-
Notifications
You must be signed in to change notification settings - Fork 3.3k
feat(core): add user transcription timeout #6182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🚩 New event not forwarded in SessionHost remote transport The (Refers to lines 366-379) Was this helpful? React with 👍 or 👎 to provide feedback. |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚩 Transcription timeout suppressed during agent speech with no re-arm
When
_on_transcription_timeoutfires while the agent is speaking (audio_recognition.py:1809), the callback returns silently without emitting the event and without scheduling a retry. This means if VAD detects user speech that produces no transcript, and the timeout happens to fire while the agent is still speaking (e.g. the user spoke over the agent near the end of its turn), the timeout event is permanently lost for that speech burst. The rationale is likely that speech detected during agent output is often echo/noise, but with AEC enabled it could be genuine user speech. Whether this is acceptable depends on the use case.Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is no user content, we should still emit the signal so the agent can check.