Skip to content

A lightweight SIP-to-AI voice bridge connecting telephony calls to AI voice agents (OpenAI Realtime GA, Deepgram). Built with PJSUA2 and asyncio.

License

Notifications You must be signed in to change notification settings

aicc2025/sip-to-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SIP-to-AI

Star to follow updates & roadmap

SIP-to-AI — stream RTP audio from FreeSWITCH / OpenSIPS / Asterisk directly to end-to-end realtime voice models:

  • OpenAI Realtime API (gpt-realtime GA)
  • Deepgram Voice Agent
  • 🔜 Gemini Live (coming soon)

Simple passthrough bridge: SIP (G.711 μ-law @ 8kHz)AI voice models with native codec support, no resampling needed.

Quick Start (OpenAI Realtime)

Prerequisites: Python 3.12+, UV package manager

  1. Install dependencies:

    git clone <repository-url>
    cd sip-to-ai
    uv venv && source .venv/bin/activate
    uv sync
  2. Install PJSUA2 (from local build):
    First, build and install PJSIP following the official build instructions.
    After completing make install to install headers and libraries, install the Python bindings:

    cd <pjproject>/pjsip-apps/src/swig/python
    uv pip install .
    
    # Verify installation
    python -c "import pjsua2 as pj; ep = pj.Endpoint(); ep.libCreate(); print('PJSUA2:', ep.libVersion().full); ep.libDestroy()"
  3. Configure environment:

    cp .env.example .env

    Edit .env with your OpenAI API key:

    # AI Service
    AI_VENDOR=openai
    OPENAI_API_KEY=sk-proj-your-key-here
    OPENAI_MODEL=gpt-realtime
    
    # Agent prompt
    AGENT_PROMPT_FILE=agent_prompt.yaml
    
    # SIP Settings (userless account - receive only)
    SIP_DOMAIN=192.168.1.100
    SIP_TRANSPORT_TYPE=udp
    SIP_PORT=6060

    Optional: Create agent_prompt.yaml for custom agent personality:

    instructions: |
      You are a helpful AI assistant. Be concise and friendly.
    
    greeting: "Hello! How can I help you today?"
  4. Run the server:

    uv run python -m app.main

    The server will listen on SIP_DOMAIN:SIP_PORT for incoming calls. Each call creates an independent OpenAI Realtime WebSocket connection.

  5. Make a test call:

    # From FreeSWITCH/Asterisk, dial to bridge IP:port
    # Or use a SIP softphone to call sip:192.168.1.100:6060

Project Overview

Core Architecture

graph LR
    SIP[SIP/PJSUA2<br/>PCM16 @ 8kHz] <--> AA[AudioAdapter<br/>Codec Only]
    AA <--> AI[AI WebSocket<br/>G.711 μ-law @ 8kHz]
    
Loading

Design Philosophy: Minimal client logic. The bridge is a transparent audio pipe:

  • Codec conversion only: PCM16 ↔ G.711 μ-law (same 8kHz, no resampling)
  • No client-side VAD/barge-in: AI models handle all voice activity detection
  • No jitter buffer: AI services provide pre-buffered audio
  • Connection management: WebSocket lifecycle and reconnection

Audio Flow

sequenceDiagram
    participant PJSUA2
    participant AudioAdapter
    participant AI as OpenAI/Deepgram

    Note over PJSUA2,AI: Uplink (SIP → AI)
    PJSUA2->>AudioAdapter: onFrameReceived(PCM16, 320 bytes)
    AudioAdapter->>AudioAdapter: PCM16 → G.711 μ-law (160 bytes)
    AudioAdapter->>AI: WebSocket send(G.711)

    Note over PJSUA2,AI: Downlink (AI → SIP)
    AI->>AudioAdapter: WebSocket receive(G.711 chunks)
    AudioAdapter->>AudioAdapter: Accumulate & split to 320-byte frames
    AudioAdapter->>AudioAdapter: G.711 → PCM16
    PJSUA2->>AudioAdapter: onFrameRequested()
    AudioAdapter->>PJSUA2: return PCM16 (320 bytes)
Loading

Key Points:

  • 20ms frames: 320 bytes PCM16 (8kHz) or 160 bytes G.711 μ-law
  • Thread-safe: PJSUA2 callbacks → asyncio.Queue → async AI WebSocket
  • Variable AI chunks: Accumulated in buffer, split into fixed 320-byte frames
  • No padding during streaming: Incomplete frames kept until next chunk arrives

Components

AudioAdapter (app/sip/audio_adapter.py)

  • Codec conversion: PCM16 ↔ G.711 μ-law
  • Accumulation buffer for variable-size AI chunks → fixed 320-byte frames
  • Thread-safe buffers: asyncio.Queue for uplink (SIP→AI) and downlink (AI→SIP)

CallSession (app/sip/audio_adapter.py)

  • Manages three async tasks per call:
    1. Uplink: Read from uplink stream → send to AI
    2. AI Receive: Receive AI chunks → feed to downlink stream
    3. Health: Ping AI connection, reconnect on failure
  • Uses asyncio.TaskGroup for structured concurrency

OpenAIRealtimeClient (app/ai/openai_realtime.py)

  • WebSocket: wss://api.openai.com/v1/realtime
  • Audio format: audio/pcmu (G.711 μ-law @ 8kHz)
  • Supports session config: instructions, voice, temperature
  • Optional greeting message on connect

DeepgramAgentClient (app/ai/deepgram_agent.py)

  • WebSocket: wss://agent.deepgram.com/agent
  • Audio format: mulaw (same as G.711 μ-law @ 8kHz)
  • Settings: listen model, speak model, LLM model, agent prompt

PJSIPMediaPort (app/sip/pjsua2_endpoint.py)

  • PJSUA2 callbacks: onFrameReceived(), onFrameRequested()
  • Bridges sync callbacks to async AudioAdapter

Deepgram Voice Agent Setup

Set AI_VENDOR=deepgram in .env:

AI_VENDOR=deepgram
DEEPGRAM_API_KEY=your-key-here
AGENT_PROMPT_FILE=agent_prompt.yaml  
DEEPGRAM_LISTEN_MODEL=nova-2
DEEPGRAM_SPEAK_MODEL=aura-asteria-en
DEEPGRAM_LLM_MODEL=gpt-4o-mini

Create agent_prompt.yaml (required):

instructions: |
  You are a helpful AI assistant. Be concise and friendly.

greeting: "Hello! How can I help you today?"

Get your API key from Deepgram Console.

Performance

Latency:

  • SIP → AI: <10ms (codec only)
  • AI → SIP: <10ms (codec only)
  • Total: ~100-300ms (AI processing dominates)

Why Fast?

  • No resampling (8kHz throughout)
  • No client-side VAD/barge-in
  • No jitter buffer
  • Just codec conversion

Troubleshooting

Choppy Audio: Check network to AI service. AI handles jitter buffering.

High Latency: Verify AI service response times. Client-side is <10ms.

SIP Connection Failed:

  • Check firewall/NAT for incoming SIP INVITE
  • Verify SIP_DOMAIN and SIP_PORT in .env
  • Confirm PJSUA2 installed: python -c "import pjsua2"

AI Disconnection:

  • Validate API keys
  • Check service quotas and rate limits
  • Monitor logs for reconnection attempts

License

GNU General Public License v2.0 (GPL-2.0)

This project uses PJSUA2, which is licensed under GPL v2. Therefore, this project is also distributed under GPL v2 to comply with PJSUA2's licensing requirements.

About

A lightweight SIP-to-AI voice bridge connecting telephony calls to AI voice agents (OpenAI Realtime GA, Deepgram). Built with PJSUA2 and asyncio.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages