Inquiry About Batch Inference Support for Realtime Voice Call in Demo Web #854

tienanh28122000 · 2025-02-24T09:01:10Z

I hope you're doing well. I wanted to check if our Demo Web currently supports batch inference for Realtime Voice Calls, allowing multiple users to use it simultaneously. If not, could you provide guidance on how to modify the system to enable real-time call support for multiple users?

I noticed that the get_audio_embedding_streaming function explicitly states that it only supports batch_size=1, so I’m wondering if this limitation affects multi-user support in the demo.

Looking forward to your insights!

bokesyo · 2025-02-26T04:52:00Z

Thank you for your feedback!
It is possible to support batch inference of realtime voice call, but it will be very hard to implement, because it has 3 submodules. Currently it only support one user on each instance.

If you need batch inference, you can consider the following parts:

Modify whisper, language model, and tts to support batch inference seperately, which means, correct padding to handle different sessions (some users have long sessions, while others has short sessions). So I think to serve 3 modules in vllm is a best practice but it needs investigation. But it needs 3 gpus to do that.
Implement a schedular to post information among all vllm-served modules, for example, chunked input audio -> vllm whisper(each session has its kv cache) -> audio embeddings -> vllm llm (each session has its kv cache) -> speech embedding, text tokens -> vllm tts (each session has its kv cache) -> chunked output audio.

We will also consider this feature in the future release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry About Batch Inference Support for Realtime Voice Call in Demo Web #854

Inquiry About Batch Inference Support for Realtime Voice Call in Demo Web #854

tienanh28122000 commented Feb 24, 2025

bokesyo commented Feb 26, 2025

Inquiry About Batch Inference Support for Realtime Voice Call in Demo Web #854

Inquiry About Batch Inference Support for Realtime Voice Call in Demo Web #854

Comments

tienanh28122000 commented Feb 24, 2025

bokesyo commented Feb 26, 2025