| title | Custom transcriber |
|---|---|
| subtitle | Integrate your own transcription service with Vapi |
| slug | customization/custom-transcriber |
A custom transcriber lets you use your own transcription service with Vapi, instead of a built-in provider. This is useful if you need more control, want to use a specific provider like Deepgram, or have custom processing needs.
This guide shows you how to set up Deepgram as your custom transcriber. The same approach can be adapted for other providers.
You'll learn how to:
- Stream audio from Vapi to your server
- Forward audio to Deepgram for transcription
- Return real-time transcripts back to Vapi
- Flexibility: Integrate with your preferred transcription service.
- Control: Implement specialized processing that isn't available with built‑in providers.
- Cost Efficiency: Leverage your existing transcription infrastructure while maintaining full control over the pipeline.
- Customization: Tailor the handling of audio data, transcript formatting, and buffering according to your specific needs.
The optional `transcriptType` field controls how Vapi handles the transcript:
- **`"final"`** (default) — the transcription is definitive.
- **`"partial"`** — the transcription is provisional and may be superseded by a later message. Each partial replaces the previous one until a `"final"` arrives.
If omitted, `transcriptType` defaults to `"final"` for backward compatibility.
<CodeBlocks>
```bash title="npm"
npm install ws express dotenv @deepgram/sdk
```
```bash title="yarn"
yarn add ws express dotenv @deepgram/sdk
```
```bash title="pnpm"
pnpm add ws express dotenv @deepgram/sdk
```
```bash title="bun"
bun add ws express dotenv @deepgram/sdk
```
</CodeBlocks>
Create a `.env` file with the following content:
```env
DEEPGRAM_API_KEY=your_deepgram_api_key
PORT=3001
```
**transcriptionService.js**
```js
const { createClient, LiveTranscriptionEvents } = require("@deepgram/sdk");
const EventEmitter = require("events");
const PUNCTUATION_TERMINATORS = [".", "!", "?"];
const MAX_RETRY_ATTEMPTS = 3;
const DEBOUNCE_DELAY_IN_SECS = 3;
const DEBOUNCE_DELAY = DEBOUNCE_DELAY_IN_SECS * 1000;
const DEEPGRAM_API_KEY = process.env["DEEPGRAM_API_KEY"] || "";
class TranscriptionService extends EventEmitter {
constructor(config, logger) {
super();
this.config = config;
this.logger = logger;
this.flowLogger = require("./fileLogger").createNamedLogger(
"transcriber-flow.log"
);
if (!DEEPGRAM_API_KEY) {
throw new Error("Missing Deepgram API Key");
}
this.deepgramClient = createClient(DEEPGRAM_API_KEY);
this.logger.logDetailed(
"INFO",
"Initializing Deepgram live connection",
"TranscriptionService",
{
model: "nova-2",
sample_rate: 16000,
channels: 2,
}
);
this.deepgramLive = this.deepgramClient.listen.live({
encoding: "linear16",
channels: 2,
sample_rate: 16000,
model: "nova-2",
smart_format: true,
interim_results: true,
endpointing: 800,
language: "en",
multichannel: true,
});
this.finalResult = { customer: "", assistant: "" };
this.audioBuffer = [];
this.retryAttempts = 0;
this.lastTranscriptionTime = Date.now();
this.pcmBuffer = Buffer.alloc(0);
this.deepgramLive.addListener(LiveTranscriptionEvents.Open, () => {
this.logger.logDetailed(
"INFO",
"Deepgram connection opened",
"TranscriptionService"
);
this.deepgramLive.on(LiveTranscriptionEvents.Close, () => {
this.logger.logDetailed(
"INFO",
"Deepgram connection closed",
"TranscriptionService"
);
this.emitTranscription();
this.audioBuffer = [];
});
this.deepgramLive.on(LiveTranscriptionEvents.Metadata, (data) => {
this.logger.logDetailed(
"DEBUG",
"Deepgram metadata received",
"TranscriptionService",
data
);
});
this.deepgramLive.on(LiveTranscriptionEvents.Transcript, (event) => {
this.handleTranscript(event);
});
this.deepgramLive.on(LiveTranscriptionEvents.Error, (err) => {
this.logger.logDetailed(
"ERROR",
"Deepgram error received",
"TranscriptionService",
{ error: err }
);
this.emit("transcriptionerror", err);
});
});
}
send(payload) {
if (payload instanceof Buffer) {
this.pcmBuffer =
this.pcmBuffer.length === 0
? payload
: Buffer.concat([this.pcmBuffer, payload]);
} else {
this.logger.warn("TranscriptionService: Received non-Buffer data chunk.");
}
if (this.deepgramLive.getReadyState() === 1 && this.pcmBuffer.length > 0) {
this.sendBufferedData(this.pcmBuffer);
this.pcmBuffer = Buffer.alloc(0);
}
}
sendBufferedData(bufferedData) {
try {
this.logger.logDetailed(
"INFO",
"Sending buffered data to Deepgram",
"TranscriptionService",
{ bytes: bufferedData.length }
);
this.deepgramLive.send(bufferedData);
this.audioBuffer = [];
this.retryAttempts = 0;
} catch (error) {
this.logger.logDetailed(
"ERROR",
"Error sending buffered data",
"TranscriptionService",
{ error }
);
this.retryAttempts++;
if (this.retryAttempts <= MAX_RETRY_ATTEMPTS) {
setTimeout(() => {
this.sendBufferedData(bufferedData);
}, 1000);
} else {
this.logger.logDetailed(
"ERROR",
"Max retry attempts reached, discarding data",
"TranscriptionService"
);
this.audioBuffer = [];
this.retryAttempts = 0;
}
}
}
handleTranscript(transcription) {
if (!transcription.channel || !transcription.channel.alternatives?.[0]) {
this.logger.logDetailed(
"WARN",
"Invalid transcript format",
"TranscriptionService",
{ transcription }
);
return;
}
const text = transcription.channel.alternatives[0].transcript.trim();
if (!text) return;
const currentTime = Date.now();
const channelIndex = transcription.channel_index
? transcription.channel_index[0]
: 0;
const channel = channelIndex === 0 ? "customer" : "assistant";
this.logger.logDetailed(
"INFO",
"Received transcript",
"TranscriptionService",
{ channel, text }
);
if (transcription.is_final || transcription.speech_final) {
this.finalResult[channel] += ` ${text}`;
this.emitTranscription();
} else {
this.finalResult[channel] += ` ${text}`;
if (currentTime - this.lastTranscriptionTime >= DEBOUNCE_DELAY) {
this.logger.logDetailed(
"INFO",
`Emitting transcript after ${DEBOUNCE_DELAY_IN_SECS}s inactivity`,
"TranscriptionService"
);
this.emitTranscription();
}
}
this.lastTranscriptionTime = currentTime;
}
emitTranscription() {
for (const chan of ["customer", "assistant"]) {
if (this.finalResult[chan].trim()) {
const transcript = this.finalResult[chan].trim();
this.logger.logDetailed(
"INFO",
"Emitting transcription",
"TranscriptionService",
{ channel: chan, transcript }
);
this.emit("transcription", transcript, chan);
this.finalResult[chan] = "";
}
}
}
}
module.exports = TranscriptionService;
```
**server.js**
```js
const express = require("express");
const http = require("http");
const TranscriptionService = require("./transcriptionService");
const FileLogger = require("./fileLogger");
require("dotenv").config();
const app = express();
app.use(express.json());
app.use(express.urlencoded({ extended: true }));
app.get("/", (req, res) => {
res.send("Custom Transcriber Service is running");
});
const server = http.createServer(app);
const config = {
DEEPGRAM_API_KEY: process.env.DEEPGRAM_API_KEY,
PORT: process.env.PORT || 3001,
};
const logger = new FileLogger();
const transcriptionService = new TranscriptionService(config, logger);
transcriptionService.setupWebSocketServer = function (server) {
const WebSocketServer = require("ws").Server;
const wss = new WebSocketServer({ server, path: "/api/custom-transcriber" });
wss.on("connection", (ws) => {
logger.logDetailed(
"INFO",
"New WebSocket client connected on /api/custom-transcriber",
"Server"
);
ws.on("message", (data, isBinary) => {
if (!isBinary) {
try {
const msg = JSON.parse(data.toString());
if (msg.type === "start") {
logger.logDetailed(
"INFO",
"Received start message from client",
"Server",
{ sampleRate: msg.sampleRate, channels: msg.channels }
);
}
} catch (err) {
logger.error("JSON parse error", err, "Server");
}
} else {
transcriptionService.send(data);
}
});
ws.on("close", () => {
logger.logDetailed("INFO", "WebSocket client disconnected", "Server");
if (
transcriptionService.deepgramLive &&
transcriptionService.deepgramLive.getReadyState() === 1
) {
transcriptionService.deepgramLive.finish();
}
});
ws.on("error", (error) => {
logger.error("WebSocket error", error, "Server");
});
transcriptionService.on("transcription", (text, channel) => {
const response = {
type: "transcriber-response",
transcription: text,
channel,
transcriptType: "final",
};
ws.send(JSON.stringify(response));
logger.logDetailed("INFO", "Sent transcription to client", "Server", {
channel,
text,
});
});
transcriptionService.on("transcriptionerror", (err) => {
ws.send(
JSON.stringify({ type: "error", error: "Transcription service error" })
);
logger.error("Transcription service error", err, "Server");
});
});
};
transcriptionService.setupWebSocketServer(server);
server.listen(config.PORT, () => {
console.log(`Server is running on http://localhost:${config.PORT}`);
});
```
**Expected behavior:**
- Vapi connects via WebSocket to your custom transcriber at `/api/custom-transcriber`.
- The `"start"` message initializes the Deepgram session.
- PCM audio data is forwarded to Deepgram.
- Deepgram returns transcript events, which are processed with channel detection and debouncing.
- The transcript is sent back as a JSON message:
```json
{
"type": "transcriber-response",
"transcription": "The transcribed text",
"channel": "customer",
"transcriptType": "final"
}
```
- Streaming support requirement:
The custom transcriber must support streaming. Vapi sends continuous audio data over the WebSocket, and your server must handle this stream in real time. - Authentication:
For secure transcriber endpoints, use Custom Credentials withcredentialId. Create Custom Credentials in the dashboard to manage Bearer Token, OAuth 2.0, or HMAC authentication. For backward compatibility, the legacysecretfield is still supported and sends the value as anx-vapi-secretHTTP header. - Buffering:
The solution buffers PCM audio and performs simple validation (e.g. ensuring stereo PCM data length is a multiple of 4). If the audio data is malformed, it is trimmed to a valid length. - Channel detection:
Transcript events from Deepgram include achannel_indexarray. The service uses the first element to determine whether the transcript is from the customer (0) or the assistant (1). Ensure Deepgram's response format remains consistent with this logic. - Partial transcripts:
SettranscriptTypeto"partial"to send progressive transcription updates. Each partial supersedes the previous one until a"final"message arrives. This is useful for STT providers that emit fast, low-latency partials that get refined over time (e.g. ElevenLabs Scribe). IftranscriptTypeis omitted, Vapi treats the message as"final".
Using a custom transcriber with Vapi gives you the flexibility to integrate any transcription service into your call flows. This guide walked you through the setup, usage, and testing of a solution that streams real-time audio, processes transcripts with multi‑channel detection, and returns formatted responses back to Vapi. Follow the steps above and use the provided code examples to build your custom transcriber solution.