Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry About Batch Inference Support for Realtime Voice Call in Demo Web #854

Open
tienanh28122000 opened this issue Feb 24, 2025 · 1 comment

Comments

@tienanh28122000
Copy link

I hope you're doing well. I wanted to check if our Demo Web currently supports batch inference for Realtime Voice Calls, allowing multiple users to use it simultaneously. If not, could you provide guidance on how to modify the system to enable real-time call support for multiple users?

I noticed that the get_audio_embedding_streaming function explicitly states that it only supports batch_size=1, so I’m wondering if this limitation affects multi-user support in the demo.

Looking forward to your insights!

@bokesyo
Copy link
Collaborator

bokesyo commented Feb 26, 2025

Thank you for your feedback!
It is possible to support batch inference of realtime voice call, but it will be very hard to implement, because it has 3 submodules. Currently it only support one user on each instance.

If you need batch inference, you can consider the following parts:

  • Modify whisper, language model, and tts to support batch inference seperately, which means, correct padding to handle different sessions (some users have long sessions, while others has short sessions). So I think to serve 3 modules in vllm is a best practice but it needs investigation. But it needs 3 gpus to do that.
  • Implement a schedular to post information among all vllm-served modules, for example, chunked input audio -> vllm whisper(each session has its kv cache) -> audio embeddings -> vllm llm (each session has its kv cache) -> speech embedding, text tokens -> vllm tts (each session has its kv cache) -> chunked output audio.

We will also consider this feature in the future release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants