Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you stream audio bidirectionally? #1

Open
akdeb opened this issue Aug 19, 2024 · 2 comments
Open

How do you stream audio bidirectionally? #1

akdeb opened this issue Aug 19, 2024 · 2 comments

Comments

@akdeb
Copy link

akdeb commented Aug 19, 2024

Hey! Your project is super cool and we are using it to draw inspiration for our own open source project.
We are trying to stream audio bidirectionally with a full-duplex websocket with STT and TTS with deepgram on a py server.

In your demo picture are you using two ESP32 boards? Are you using multiple FreeRTOS tasks to handle audio streaming?

@kaloprojects
Copy link
Owner

kaloprojects commented Aug 19, 2024

Hi, thx for the compliment .. full duplex websocket streaming sounds interesting too ! Might be a follow up on my side too instead sending prerecoded audio (but pretty complex on ESP using C++/Arduino IDE, py server might be an easier option).

The reason i am using 2 ESP32 is more simple .. the right one (we are speaking here) is a pure (and versatile) STT and TTS device, the left ESP handles any other tasks (in my current projects e.g. an Open AI device via chat 4o API). Both communicate via Serial Tx/Rx (UART2) text/commands. So the right one is just an 'voice-assistant', just an I/O extension (covering STT and TTS) for dedicated other (existing) projects. Maybe i will combine all in one, but using 2 ESP is just easier and more flexible (and structured) for my use cases in moment

@kaloprojects
Copy link
Owner

kaloprojects commented Nov 1, 2024

Hi @akdeb !

.. i just reopened this issue to keep & get in contact with you. I just found you in this Starmoon project https://github.com/StarmoonAI/Starmoon and https://www.starmoon.app/.
This project is awesome !, i love it !! .. well well done, so cool :)

You know, seen this is pretty similar what i started (but until today never finalized). The whole KALO-ESP32-Voice-Assistant code was more a starter toolkit for my own (private) Open AI Chat project (using Speech-To-Speech on ESP32) .. to chat with virtual friends (just for fun):

  • so i built a pcb for my ESP32, same as on my picture (using I2S microphone INMP441, I2S audio amp MAX98357A)
  • implemented an Open AI chat device (STT via deepgram, TTS via Open AI TTS and meanwhile SpeechGen.IO TTS
  • btw: i use SpeechGen.IO because i LOVE the (German) child voice Gisela (same as your Azure 'Twinkle') .. easy to use as they respond with an url to a generated wav ;)
  • coded several agents (via System prompts) i call them via their name .. then they respond in their role (with their voice), Gisela is one of them
  • one detail more: meanwhile also coded optional access to Perplexity LLM / model 'llama-3.1-sonar-small-128k-online' (on top to OpenAI model gpt-4o-mini) , allowing me to ask actual real time questions about today (weather, politics etc)

=> all is done with pure C code on ESP32, works well .. BUT the big issue is the latency as i never realized real STT 'streaming' .. all is done via sending pre-recorded wav to Deepgram via http POST request, similar as i did in my KALO-ESP32-Voice-Assistant. But this concept reached the limit for 'human' conversations. Streaming in C++ /ESP32 is a nightmare .. you need a py Server and websockets, i do not have this skill set

then i found your Starmoon project. So amazing !! :) .. so i might stop my privats (Open AI Chat) and go with your idea :)

.. that’s exactly what I was looking for: 'compact AI-enabled device, you can take anywhere and converse with one of your 10 friends' .. in am emphatic conversation, just as a lovely friend, btw: I also planned to build into cuddly toys for the kinds of my friends :). So nice.

I will join your Discord for sure, also planning to order one of those lovely Starmoon AI devices .. and I might ask you ‘thousand’ questions more (in Discord’?) how to setup your device (saying this as I have some skills in hardware and ESP32 C coding, but I am a newbie with Docker, py server, github clones .. etc LOL)

Well well done @akdeb !! .. and I am happy that my KALO-ESP32-Voice-Assistant project could help you maybe a small bit in past (assuming in the I2S coding, right ?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants