-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do you stream audio bidirectionally? #1
Comments
Hi, thx for the compliment .. full duplex websocket streaming sounds interesting too ! Might be a follow up on my side too instead sending prerecoded audio (but pretty complex on ESP using C++/Arduino IDE, py server might be an easier option). The reason i am using 2 ESP32 is more simple .. the right one (we are speaking here) is a pure (and versatile) STT and TTS device, the left ESP handles any other tasks (in my current projects e.g. an Open AI device via chat 4o API). Both communicate via Serial Tx/Rx (UART2) text/commands. So the right one is just an 'voice-assistant', just an I/O extension (covering STT and TTS) for dedicated other (existing) projects. Maybe i will combine all in one, but using 2 ESP is just easier and more flexible (and structured) for my use cases in moment |
Hi @akdeb ! .. i just reopened this issue to keep & get in contact with you. I just found you in this Starmoon project https://github.com/StarmoonAI/Starmoon and https://www.starmoon.app/. You know, seen this is pretty similar what i started (but until today never finalized). The whole KALO-ESP32-Voice-Assistant code was more a starter toolkit for my own (private) Open AI Chat project (using Speech-To-Speech on ESP32) .. to chat with virtual friends (just for fun):
=> all is done with pure C code on ESP32, works well .. BUT the big issue is the latency as i never realized real STT 'streaming' .. all is done via sending pre-recorded wav to Deepgram via http POST request, similar as i did in my KALO-ESP32-Voice-Assistant. But this concept reached the limit for 'human' conversations. Streaming in C++ /ESP32 is a nightmare .. you need a py Server and websockets, i do not have this skill set then i found your Starmoon project. So amazing !! :) .. so i might stop my privats (Open AI Chat) and go with your idea :) .. that’s exactly what I was looking for: 'compact AI-enabled device, you can take anywhere and converse with one of your 10 friends' .. in am emphatic conversation, just as a lovely friend, btw: I also planned to build into cuddly toys for the kinds of my friends :). So nice. I will join your Discord for sure, also planning to order one of those lovely Starmoon AI devices .. and I might ask you ‘thousand’ questions more (in Discord’?) how to setup your device (saying this as I have some skills in hardware and ESP32 C coding, but I am a newbie with Docker, py server, github clones .. etc LOL) Well well done @akdeb !! .. and I am happy that my KALO-ESP32-Voice-Assistant project could help you maybe a small bit in past (assuming in the I2S coding, right ?) |
Hey! Your project is super cool and we are using it to draw inspiration for our own open source project.
We are trying to stream audio bidirectionally with a full-duplex websocket with STT and TTS with deepgram on a py server.
In your demo picture are you using two ESP32 boards? Are you using multiple FreeRTOS tasks to handle audio streaming?
The text was updated successfully, but these errors were encountered: