Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deepgram streaming #101

Merged
merged 5 commits into from
Mar 25, 2024
Merged

Deepgram streaming #101

merged 5 commits into from
Mar 25, 2024

Conversation

matthewkennedy5
Copy link
Contributor

@matthewkennedy5 matthewkennedy5 commented Mar 25, 2024

User description

attempting to be backwards compatible with whisper.

when ASR_METHOD == "deepgram", deepgram keeps track of the transcription as it goes in self.transcription, which is used in start_response().

Next, I think deepgram has VAD functionality, so we can get rid of silero vad and call start_response() once deepgram has a transcript ready.


Description

  • Integrated Deepgram's live transcription service into the ResponseAgent class, replacing the previous static transcription setup.
  • Added a new transcript attribute to ResponseAgent to store the ongoing transcription result from Deepgram.
  • Set up event handling for live transcription results, updating the transcript attribute accordingly.
  • Modified the receive_audio method to send audio data to Deepgram's live transcription service.
  • Updated the start_response method to use the live transcription result when Deepgram is selected as the ASR method.
  • Changed the default recording behavior in the connect_daily function to not record by default.
  • Changed the TTS provider from ElevenLabs to local in the connect_daily function.

Changes walkthrough

Relevant files
Enhancement
response_agent.py
Integrate Deepgram Live Transcription                                                   

openduck-py/openduck_py/response_agent.py

  • Replaced PrerecordedOptions and FileSource with
    LiveTranscriptionEvents and LiveOptions for DeepgramClient.
  • Initialized DeepgramClient outside of the conditional check for
    ASR_METHOD.
  • Added live transcription setup with event handling for Deepgram.
  • Removed commented-out initialization of SileroVad.
  • Added a new transcript attribute to store ongoing transcription.
  • Updated receive_audio method to send audio data to Deepgram live
    transcription.
  • Modified start_response to use the stored transcript when ASR_METHOD
    is set to "deepgram".
  • Added on_message method to handle live transcription events and update
    the transcript.
  • +41/-13 
    Configuration changes
    voice.py
    Update Voice Router Configuration Defaults                                         

    openduck-py/openduck_py/routers/voice.py

  • Changed the default value of record from True to False in
    connect_daily function.
  • Updated the TTSConfig provider from "elevenlabs" to "local" in
    connect_daily function.
  • +2/-2     
    💡 Usage Guide

    Checking Your Pull Request

    Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

    Talking to CodeAnt AI

    Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

    @codeant-ai ask: Your question here
    

    This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

    Check Your Repository Health

    To analyze the health of your code repository, visit our dashboard at app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

    Copy link

    vercel bot commented Mar 25, 2024

    The latest updates on your projects. Learn more about Vercel for Git ↗︎

    Name Status Preview Comments Updated (UTC)
    openduck ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 25, 2024 11:41pm

    Copy link

    netlify bot commented Mar 25, 2024

    Deploy Preview for openduck canceled.

    Name Link
    🔨 Latest commit a3ea686
    🔍 Latest deploy log https://app.netlify.com/sites/openduck/deploys/66020b6c9cc28d000824c0dd

    Comment on lines 254 to 271
    self.dg_connection = deepgram.listen.live.v("1")
    options = LiveOptions(
    model="nova-2",
    punctuate=True,
    language="en-US",
    encoding="linear16",
    channels=1,
    sample_rate=WS_SAMPLE_RATE,
    interim_results=True,
    utterance_end_ms="1000",
    vad_events=True,
    )

    self.dg_connection.on(
    LiveTranscriptionEvents.Transcript,
    lambda x, result, **kwargs: self.on_message(result),
    )
    self.dg_connection.start(options)
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Let's put this inside if ASR_METHOD == "deepgram":

    @@ -483,3 +504,9 @@ async def speak_response(
    audio=np.frombuffer(audio_chunk_bytes, dtype=np.int16),
    latency=t_styletts - t_normalize,
    )

    def on_message(self, result):
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Call it on_deepgram_message or on_streaming_asr_message?

    @@ -279,7 +279,7 @@ async def connect_daily(
    session_id=session_id,
    record=record,
    input_audio_format="int16",
    tts_config=TTSConfig(provider="elevenlabs", voice_id=voice_id),
    tts_config=TTSConfig(provider="local", voice_id=voice_id),
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    can we revert the changes in this file before merging? in prod it's nice to have recording and 11 labs voice for now

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ya

    @@ -271,6 +288,9 @@ async def interrupt(self, task: asyncio.Task):
    self.is_responding = False

    async def receive_audio(self, message: bytes):

    self.dg_connection.send(message)
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Same thing here, should gate on ASR_METHOD == "deepgram"

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    bug_fix enhancement New feature or request
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    None yet

    2 participants