-
Notifications
You must be signed in to change notification settings - Fork 122
Description
Summary
Add a built-in microphone capture utility to the Python SDK that handles audio device enumeration, capture configuration, and streaming to Deepgram's live transcription WebSocket — eliminating the need for developers to separately install and configure PyAudio or sounddevice.
Problem it solves
Every developer building a live transcription demo in Python faces the same 30-minute setup hurdle: install PyAudio (which requires system-level portaudio headers), configure sample rate/channels/chunk size, write a capture loop, and pipe bytes to the Deepgram WebSocket. This friction is the #1 source of GitHub issues on the SDK (issues #425, #302, #418, #440, #495). A built-in helper would make "mic to transcript" a 5-line script.
Proposed API
from deepgram import DeepgramClient, Microphone
dg = DeepgramClient(api_key="...")
connection = dg.listen.websocket.v("1")
connection.on("transcript", lambda result: print(result.channel.alternatives[0].transcript))
connection.start({"model": "nova-3", "language": "en"})
mic = Microphone(connection.send)
mic.start()
# ... later
mic.finish()
connection.finish()Alternatively, an even simpler context-manager pattern:
async with dg.listen.live(model="nova-3") as session:
async with Microphone(session) as mic:
async for transcript in session.transcripts():
print(transcript.text)Acceptance criteria
- Works cross-platform (macOS, Linux, Windows) with automatic backend selection
- Installable via optional dependency:
pip install deepgram-sdk[microphone] - Handles sample rate, channels, and chunk size configuration with sensible defaults
- Provides device enumeration (
Microphone.list_devices()) - Documented with usage example
- Compatible with existing API (works with current
ListenWebSocketClient)
Raised by the DX intelligence system.