You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I hope you're doing well. I wanted to check if our Demo Web currently supports batch inference for Realtime Voice Calls, allowing multiple users to use it simultaneously. If not, could you provide guidance on how to modify the system to enable real-time call support for multiple users?
I noticed that the get_audio_embedding_streaming function explicitly states that it only supports batch_size=1, so I’m wondering if this limitation affects multi-user support in the demo.
Looking forward to your insights!
The text was updated successfully, but these errors were encountered:
Thank you for your feedback!
It is possible to support batch inference of realtime voice call, but it will be very hard to implement, because it has 3 submodules. Currently it only support one user on each instance.
If you need batch inference, you can consider the following parts:
Modify whisper, language model, and tts to support batch inference seperately, which means, correct padding to handle different sessions (some users have long sessions, while others has short sessions). So I think to serve 3 modules in vllm is a best practice but it needs investigation. But it needs 3 gpus to do that.
Implement a schedular to post information among all vllm-served modules, for example, chunked input audio -> vllm whisper(each session has its kv cache) -> audio embeddings -> vllm llm (each session has its kv cache) -> speech embedding, text tokens -> vllm tts (each session has its kv cache) -> chunked output audio.
We will also consider this feature in the future release.
I hope you're doing well. I wanted to check if our Demo Web currently supports batch inference for Realtime Voice Calls, allowing multiple users to use it simultaneously. If not, could you provide guidance on how to modify the system to enable real-time call support for multiple users?
I noticed that the get_audio_embedding_streaming function explicitly states that it only supports batch_size=1, so I’m wondering if this limitation affects multi-user support in the demo.
Looking forward to your insights!
The text was updated successfully, but these errors were encountered: