title | sidebarTitle | description |
---|---|---|
Core Models |
Core Models |
The three core components to Vapi's voice AI pipeline. |
At it's core, Vapi is an orchestration layer over three modules: the transcriber, the model, and the voice.
These three modules can be swapped out with any provider of your choosing; OpenAI, Groq, Deepgram, ElevenLabs, PlayHT, etc. You can even plug in your server to act as the LLM.
Vapi takes these three modules, optimizes the latency, manages the scaling & streaming, and orchestrates the conversation flow to make it sound human.
The idea is to perform each phase in realtime (sensitive down to 50-100ms level), streaming between every layer. Ideally the whole flow voice-to-voice clocks in at <500-700ms.
Vapi pulls all these pieces together, ensuring a smooth & responsive conversation (in addition to providing you with a simple set of tools to manage these inner-workings).