Skip to content

feat(agents): Add realtime agents support#4543

Open
samwillis wants to merge 31 commits into
mainfrom
codex/realtime-agents-plan
Open

feat(agents): Add realtime agents support#4543
samwillis wants to merge 31 commits into
mainfrom
codex/realtime-agents-plan

Conversation

@samwillis

@samwillis samwillis commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds Horton realtime voice mode to Electric Agents with durable streams as the client/server IO path and OpenAI Realtime as the initial provider. Horton can drop into voice mode from an existing conversation or from the new-session screen, keep its normal context and tool loop, stream microphone audio to the runtime, stream assistant audio back to the client, persist transcript/audio metadata, and expose model/voice/reasoning settings in the desktop app.

The design deliberately avoids making the browser a direct OpenAI client. Clients write/read durable streams. The agent runtime owns the OpenAI WebSocket, provider credentials, tool execution, transcript reconciliation, and session lifecycle.

Design

IO Model

Realtime sessions are represented in the entity manifest with durable stream refs:

  • audio_in: client to runtime, mono PCM16 at 24 kHz.
  • audio_out: runtime/provider to client, mono PCM16 at 24 kHz.
  • control_in: client to runtime JSON commands, such as typed text, stop, close, and truncation.
  • control_out: runtime to client JSON provider/runtime events, such as session started, response started/completed, speech started, and audio deltas.

The only WebSocket in the first implementation is server/runtime to OpenAI Realtime. Browser and app clients use durable HTTP streams for all Electric client/server communication.

Runtime API

Adds first-class realtime runtime types and provider hooks for:

  • realtime audio formats
  • input transcription config
  • server/semantic/manual turn detection config
  • provider events
  • tool call streaming/results
  • transcript callbacks
  • active realtime session discovery through ctx.realtime.activeSession()
  • ctx.useRealtime(...) as a built-in runtime API

RealtimeTurnDetectionConfig supports:

  • server_vad
  • semantic_vad
  • manual mode with false or { type: "none" }

The runtime bridges durable streams into provider sessions and records realtime state into entity collections:

  • realtimeSessions
  • realtimeAudioSpans
  • realtimeTranscripts
  • transcript textDeltas

OpenAI Provider

Adds an OpenAI Realtime provider that:

  • defaults to gpt-realtime-2
  • supports configurable voices, defaulting to marin
  • supports configurable reasoning effort for gpt-realtime-2
  • sends GA-style nested session.audio.input / session.audio.output config
  • maps OpenAI audio, transcript, response, tool, and error events into runtime provider events
  • supports tool calls and sends tool results back into the active realtime response
  • guards stale tool results with response epochs after cancellations
  • deduplicates provider events by event_id where available
  • normalizes PCM16 append chunks before sending to OpenAI, skipping empty/half-sample chunks and splitting large batches
  • ignores known stale/inactive provider cancellation/truncation errors that can happen during interruption races

Horton Behavior

Horton detects active realtime sessions and uses ctx.useRealtime(...) instead of the normal ctx.useAgent(...) path. Context assembly and tool composition remain the same, so realtime Horton still has conversation history, repo/project context, and the existing Electric/Pi tool stack.

Current Horton realtime defaults:

  • provider: OpenAI only
  • model: gpt-realtime-2
  • voice: configured setting, default marin
  • input transcription: gpt-realtime-whisper with delay: "minimal"
  • turn detection: OpenAI server_vad
  • threshold: 0.55
  • prefix padding: 300ms
  • silence duration: 500ms
  • provider response creation: enabled
  • interruption: configurable, enabled by default

Typed text during an active realtime session routes into the realtime session via control_in instead of starting a separate text run.

Browser Audio

The UI audio path now:

  • uses AudioWorklet for microphone capture with ScriptProcessorNode fallback
  • explicitly resamples captured Float32 audio to 24 kHz before PCM16 encoding, instead of trusting AudioContext({ sampleRate: 24000 })
  • keeps OpenAI VAD as the real turn detector
  • uses a local transport gate only to reduce idle upload volume
  • keeps pre-roll and trailing silence so provider VAD does not clip speech starts/ends
  • aligns PCM16 playback chunks to avoid static/noise from odd byte boundaries
  • cancels playback/truncates output on provider speech-start events, not local mic gate noise
  • exposes a live input-level indicator

UI and Settings

Adds:

  • realtime controls in the normal chat composer
  • realtime start button on the new Horton screen
  • typed prompt forwarding when starting realtime from the new-session screen
  • delayed greeting for empty new-session voice starts: if OpenAI has not detected speech after session start, Horton says a short hello
  • voice start disabled when draft attachments are present
  • send button remains a send button during realtime mode, so users can type into an active voice session
  • desktop settings section for realtime model, voice, reasoning effort, and interruption
  • credential gating so realtime controls are disabled when a valid OpenAI API key is not available

OpenAI Realtime currently requires an OpenAI API key. ChatGPT/Codex sign-in alone is not treated as sufficient for Realtime API access.

Timeline and Replay Groundwork

Realtime output is represented in normal timeline order alongside tool calls and user turns:

  • input/output transcripts persist as realtime transcript rows
  • transcript text streams as textDeltas, avoiding repeated full-row rewrites
  • assistant transcript segments rotate around interruption/user turns so the visible timeline stays ordered
  • tool calls triggered during realtime appear in the same timeline as the realtime session
  • audio spans are persisted with byte/sample offsets and stream IDs
  • session stream refs are stored in the manifest, so a replay/scrub UI can be built later without changing the session storage model

App Builder Integration Example

Entity authors integrate realtime by adding a realtime branch to their handler. The runtime session is created by the server/client; the entity decides how to run when ctx.realtime.activeSession() is present.

import {
  createOpenAIRealtimeProvider,
  type AgentTool,
} from '@electric-ax/agents-runtime'

const tools: AgentTool[] = [
  {
    name: 'lookup_status',
    label: 'Lookup status',
    description: 'Look up the current status for a ticket.',
    parameters: {
      type: 'object',
      properties: { ticketId: { type: 'string' } },
      required: ['ticketId'],
      additionalProperties: false,
    },
    async execute({ ticketId }) {
      return { ticketId, status: 'open' }
    },
  },
]

export async function handler(ctx) {
  const systemPrompt = `You are a concise voice support agent.`
  const activeRealtimeSession = ctx.realtime?.activeSession?.()

  if (activeRealtimeSession) {
    if (activeRealtimeSession.provider !== 'openai') {
      throw new Error(`Unsupported realtime provider`)
    }

    const realtime = ctx.useRealtime({
      systemPrompt,
      provider: createOpenAIRealtimeProvider({
        apiKey: () => process.env.OPENAI_API_KEY!,
        model: activeRealtimeSession.model,
        ...(activeRealtimeSession.voice
          ? { voice: activeRealtimeSession.voice }
          : {}),
        ...(activeRealtimeSession.reasoningEffort
          ? { reasoningEffort: activeRealtimeSession.reasoningEffort }
          : {}),
      }),
      tools,
      audio: {
        inputFormat: { codec: 'pcm16', sampleRate: 24_000, channels: 1 },
        outputFormat: { codec: 'pcm16', sampleRate: 24_000, channels: 1 },
        inputTranscription: {
          model: 'gpt-realtime-whisper',
          delay: 'minimal',
        },
        turnDetection: {
          type: 'server_vad',
          threshold: 0.55,
          prefixPaddingMs: 300,
          silenceDurationMs: 500,
          createResponse: true,
          interruptResponse:
            activeRealtimeSession.interruptResponse !== false,
        },
      },
      toolPolicy: {
        // Only expose tools that are safe enough to run directly during voice.
        // Other work can still be delegated to workers by your own tools/policy.
        direct: ['lookup_status'],
      },
      onTranscript(transcript) {
        if (transcript.direction === 'input' && transcript.status === 'final') {
          console.log('user said:', transcript.text)
        }
      },
    })

    await realtime.run()
    return
  }

  ctx.useAgent({
    systemPrompt,
    modelProvider: 'openai',
    model: 'gpt-5.1',
    tools,
  })
  await ctx.agent.run()
}

A client starts a realtime session through the agents server, then writes audio/control frames to the returned durable streams:

import { DurableStream } from '@durable-streams/client'
import {
  appendPathToUrl,
  createRuntimeServerClient,
} from '@electric-ax/agents-runtime/client'

const client = createRuntimeServerClient({ baseUrl: 'http://localhost:4437' })

const session = await client.startRealtimeSession({
  entityUrl: '/support/session-123',
  provider: 'openai',
  model: 'gpt-realtime-2',
  voice: 'marin',
  reasoningEffort: 'low',
  interruptResponse: true,
  inputAudio: { codec: 'pcm16', sampleRate: 24_000, channels: 1 },
  outputAudio: { codec: 'pcm16', sampleRate: 24_000, channels: 1 },
})

const audioIn = new DurableStream({
  url: appendPathToUrl(client.baseUrl, session.streams.audio_in),
  contentType: 'audio/pcm; rate=24000; channels=1',
  batching: true,
})

const audioOut = new DurableStream({
  url: appendPathToUrl(client.baseUrl, session.streams.audio_out),
  contentType: 'audio/pcm; rate=24000; channels=1',
})

const controlIn = new DurableStream({
  url: appendPathToUrl(client.baseUrl, session.streams.control_in),
  contentType: 'application/json',
  batching: true,
})

// Send one PCM16 microphone chunk.
await audioIn.append(pcm16Chunk)

// Send typed text into the active realtime conversation.
await controlIn.append(
  new TextEncoder().encode(
    JSON.stringify({ type: 'input_text', text: 'Can you check this ticket?' })
  )
)

// Read assistant audio back.
const response = await audioOut.stream({ live: true })
for await (const pcm16Chunk of response.bodyStream()) {
  playPcm16Chunk(pcm16Chunk)
}

Validation

Focused checks run during this branch:

  • pnpm -C packages/agents-runtime exec vitest run test/openai-realtime.test.ts test/realtime-context.test.ts
  • pnpm -C packages/agents-server exec vitest run test/electric-agents-manager-write-validation.test.ts
  • pnpm -C packages/agents-runtime exec vitest run test/process-wake.test.ts
  • pnpm --filter @electric-ax/agents-runtime typecheck
  • pnpm --filter @electric-ax/agents-runtime build
  • pnpm --filter @electric-ax/agents-server-ui typecheck
  • pnpm --filter @electric-ax/agents-desktop typecheck
  • git diff --check

Manual desktop testing covered:

  • starting realtime from existing Horton sessions
  • starting realtime from the new Horton screen
  • microphone streaming and visible input level
  • assistant audio playback
  • typed text during realtime mode
  • realtime tool calls
  • live user transcript deltas during long turns
  • transcript/timeline ordering around user turns and tool calls
  • interruption while assistant audio is playing
  • OpenAI credential gating and realtime settings

Notes and Follow-ups

  • This PR intentionally supports OpenAI Realtime first, while keeping provider boundaries explicit for future providers.
  • Attachments are not sent into realtime starts yet; the UI disables the realtime button while draft attachments are present.
  • Replay/scrub UI is not included, but durable stream refs and audio span metadata are persisted for it.
  • Mobile/native scoped stream token refresh remains out of scope.
  • Long-running voice sessions with heavy interruption and tool use should continue to get manual testing because they stress provider cancellation, transcript reconciliation, and visible timeline ordering together.

@netlify

netlify Bot commented Jun 9, 2026

Copy link
Copy Markdown

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 242aca6
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/6a2825e8c87469000860818d
😎 Deploy Preview https://deploy-preview-4543--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@samwillis samwillis force-pushed the codex/realtime-agents-plan branch from 3ffd31d to d54bb62 Compare June 9, 2026 18:39
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Desktop Builds

Build artifacts for commit 90042d1.

Platform Status Artifact
macOS Apple Silicon Passed DMG
macOS Intel Passed DMG
Windows x64 Passed Installer
Linux x64 Passed AppImage / deb

Workflow run

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 63.53771% with 1344 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.69%. Comparing base (916f6cd) to head (90042d1).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...ackages/agents-server-ui/src/lib/realtime-audio.ts 0.00% 498 Missing ⚠️
packages/agents-runtime/src/openai-realtime.ts 74.81% 200 Missing ⚠️
packages/agents-runtime/src/context-factory.ts 85.62% 184 Missing ⚠️
...-ui/src/components/settings/pages/RealtimePage.tsx 0.00% 116 Missing ⚠️
...s/agents-server-ui/src/components/MessageInput.tsx 0.00% 95 Missing ⚠️
...-server-ui/src/components/views/NewSessionView.tsx 0.00% 56 Missing ⚠️
...nts-server-ui/src/hooks/useRealtimeAvailability.ts 0.00% 39 Missing ⚠️
...agents-server-ui/src/components/EntityTimeline.tsx 0.00% 38 Missing ⚠️
packages/agents/src/agents/horton.ts 52.63% 34 Missing and 2 partials ⚠️
...agents-server-ui/src/components/views/ChatView.tsx 0.00% 21 Missing ⚠️
... and 10 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4543      +/-   ##
==========================================
+ Coverage   54.80%   57.69%   +2.88%     
==========================================
  Files         317      367      +50     
  Lines       36681    42990    +6309     
  Branches    10466    12001    +1535     
==========================================
+ Hits        20104    24802    +4698     
- Misses      16544    18115    +1571     
- Partials       33       73      +40     
Flag Coverage Δ
packages/agents 70.05% <52.63%> (-0.48%) ⬇️
packages/agents-mcp 77.54% <ø> (?)
packages/agents-mobile 71.42% <ø> (ø)
packages/agents-runtime 81.95% <84.77%> (+1.70%) ⬆️
packages/agents-server 73.93% <69.73%> (-0.02%) ⬇️
packages/agents-server-ui 5.26% <0.00%> (-0.41%) ⬇️
packages/electric-ax 46.42% <ø> (ø)
packages/experimental 87.73% <ø> (?)
packages/react-hooks 86.48% <ø> (?)
packages/start 82.83% <ø> (?)
packages/typescript-client 91.83% <ø> (?)
packages/y-electric 56.05% <ø> (?)
typescript 57.69% <63.53%> (+2.88%) ⬆️
unit-tests 57.69% <63.53%> (+2.88%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Mobile Build

Local mobile checks ran for commit 90042d1.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

@netlify

netlify Bot commented Jun 9, 2026

Copy link
Copy Markdown

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit d54bb62
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/6a285def22d91d0008c570f8
😎 Deploy Preview https://deploy-preview-4543--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@samwillis samwillis changed the title [codex] Add realtime agents support feat(agents): Add realtime agents support Jun 10, 2026
@samwillis samwillis marked this pull request as ready for review June 10, 2026 11:51
@samwillis samwillis requested a review from Copilot June 10, 2026 11:51
@samwillis samwillis requested a review from KyleAMathews June 10, 2026 11:52

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class “realtime agents” support (durable-streams IO + OpenAI Realtime provider) across runtime, server control-plane, desktop settings bridge, and UI voice-mode controls so Horton can run speech-to-speech sessions without exposing OpenAI credentials to the browser.

Changes:

  • Introduces realtime session creation via /_electric/realtime/sessions, persisting durable stream refs + session metadata (manifest + entity collections).
  • Adds an OpenAI Realtime provider + runtime types/APIs for audio IO, turn detection, tool streaming, and transcript handling.
  • Updates UI/desktop to start/stop voice sessions, gate by OpenAI API key validation, and render realtime transcripts in the timeline.

Reviewed changes

Copilot reviewed 57 out of 58 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pnpm-lock.yaml Links desktop package to agents-runtime workspace dependency.
packages/agents/test/horton-tool-composition.test.ts Adds coverage ensuring Horton uses realtime mode when an active session exists.
packages/agents/test/generate-title.test.ts Adds coverage for title generation from finalized realtime input transcript.
packages/agents/src/agents/horton.ts Routes Horton into ctx.useRealtime, adds realtime tool policy + realtime-informed title setting.
packages/agents-server/test/routing-hooks.test.ts Updates CORS header expectations for producer/stream headers.
packages/agents-server/test/electric-agents-manager-write-validation.test.ts Adds tests for realtime session creation + provider validation.
packages/agents-server/src/stream-client.ts Extends stream append APIs to support content type, batching, and producer headers.
packages/agents-server/src/routing/realtime-router.ts Adds control-plane route to create realtime sessions.
packages/agents-server/src/routing/internal-router.ts Wires realtime router into internal routing tree.
packages/agents-server/src/routing/hooks.ts Expands allowed CORS headers for durable-stream producer/stream metadata.
packages/agents-server/src/index.ts Exports realtime session request/response and stream append option types.
packages/agents-server/src/entity-manager.ts Implements createRealtimeSession (streams + manifest + state rows + wake).
packages/agents-server-ui/src/router.tsx Adds “Realtime” settings route/category.
packages/agents-server-ui/src/lib/server-connection.ts Adds realtime settings types + desktop bridge load/save helpers.
packages/agents-server-ui/src/lib/realtime-audio.ts Implements browser durable-stream mic capture, resample/PCM16 encode, playback, control handling, and autogreet.
packages/agents-server-ui/src/hooks/useRealtimeAvailability.ts Adds credential-gated realtime availability hook for UI.
packages/agents-server-ui/src/hooks/useDocumentTitle.ts Adds label for realtime settings page.
packages/agents-server-ui/src/components/views/NewSessionView.tsx Adds “start realtime” from new session view with viewParams forwarding.
packages/agents-server-ui/src/components/views/ChatView.tsx Adds realtime autostart wiring from view params into composer.
packages/agents-server-ui/src/components/settings/SettingsSidebar.tsx Adds sidebar entry for realtime settings.
packages/agents-server-ui/src/components/settings/pages/RealtimePage.tsx Adds realtime settings UI (model/voice/effort/interruption + auth status).
packages/agents-server-ui/src/components/settings/pages/RealtimePage.module.css Styles realtime settings page lists/selects.
packages/agents-server-ui/src/components/NewSessionPage.module.css Adds styles for new-session voice start button.
packages/agents-server-ui/src/components/MessageInput.tsx Adds voice mode controls in chat composer + text routing into realtime control stream.
packages/agents-server-ui/src/components/MessageInput.module.css Adds styling for voice active state + input level meter.
packages/agents-server-ui/src/components/EntityTimeline.tsx Renders realtime transcripts in timeline and hides realtime session wake rows.
packages/agents-server-ui/src/components/EntityContextDrawer.tsx Adds safer fallbacks for manifest rendering with new manifest kinds.
packages/agents-server-ui/src/components/AgentResponse.tsx Suppresses empty “live run” rendering for realtime runs.
packages/agents-runtime/tsdown.config.ts Ensures stable d.ts generation for new chat entrypoints.
packages/agents-runtime/test/timeline-context.test.ts Adds projection tests for realtime transcripts into timeline messages.
packages/agents-runtime/test/runtime-server-client-update-metadata.test.ts Adds tests for starting realtime sessions via runtime server client.
packages/agents-runtime/test/openai-realtime.test.ts Adds comprehensive OpenAI realtime provider unit tests.
packages/agents-runtime/test/helpers/context-test-helpers.ts Extends handler context test helpers with realtime stream config.
packages/agents-runtime/test/entity-timeline.test.ts Extends timeline query coverage for realtime transcripts + run ordering changes.
packages/agents-runtime/test/electric-agents-client.test.ts Adds agents client surface for startRealtimeSession.
packages/agents-runtime/src/types.ts Introduces realtime runtime/provider/session/transcript types and context hooks.
packages/agents-runtime/src/timeline-context.ts Adds realtime transcript projection and filters realtime session wakes.
packages/agents-runtime/src/runtime-server-client.ts Adds startRealtimeSession client method + request/response types.
packages/agents-runtime/src/realtime.ts Adds test realtime provider helper.
packages/agents-runtime/src/realtime-options.ts Adds OpenAI realtime model/voice/effort choices + validators + defaults.
packages/agents-runtime/src/process-wake.ts Passes realtime stream connection info into handler context.
packages/agents-runtime/src/openai-realtime.ts Implements OpenAI Realtime WebSocket provider and event mapping/tool bridging.
packages/agents-runtime/src/index.ts Exposes realtime APIs/providers/options from runtime package.
packages/agents-runtime/src/entity-timeline.ts Adds realtime transcript rows to timeline query/data + improves run ordering anchor.
packages/agents-runtime/src/entity-stream-db.ts Adds optional collection indexes to support timeline query performance (incl. transcript deltas).
packages/agents-runtime/src/entity-schema.ts Adds realtime session/audio span/transcript collections + schema updates for transcript deltas.
packages/agents-runtime/src/client.ts Re-exports realtime options and session start types for client consumers.
packages/agents-runtime/src/agents-client.ts Adds startRealtimeSession to high-level agents client API.
packages/agents-desktop/src/shared/types.ts Adds desktop realtime settings + credential status/types.
packages/agents-desktop/src/settings/store.ts Bumps settings version and persists normalized realtime settings.
packages/agents-desktop/src/settings/realtime.ts Adds realtime settings normalization + OpenAI key validation with TTL cache.
packages/agents-desktop/src/preload.ts Exposes realtime settings IPC bridge to renderer.
packages/agents-desktop/src/ipc/preferences.ts Registers IPC handlers for realtime settings get/set.
packages/agents-desktop/src/app/controller.ts Implements realtime settings status + persistence in desktop controller.
packages/agents-desktop/package.json Adds agents-runtime dependency for realtime options/types.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/agents-server/src/entity-manager.ts
Comment on lines +377 to +412
session: {
type: `realtime`,
model,
instructions: input.systemPrompt,
output_modalities: outputFormat ? [`audio`] : [`text`],
tool_choice: input.tools.length > 0 ? `auto` : `none`,
...(reasoningEffort ? { reasoning: { effort: reasoningEffort } } : {}),
...(input.tools.length > 0
? { tools: input.tools.map((tool) => toOpenAITool(tool)) }
: {}),
...(inputFormat || outputFormat || opts.voice
? {
audio: {
...(inputFormat
? {
input: {
format: inputFormat,
...(transcription ? { transcription } : {}),
turn_detection: realtimeTurnDetection(
input.audio?.turnDetection
),
},
}
: {}),
...(outputFormat || opts.voice
? {
output: {
...(outputFormat ? { format: outputFormat } : {}),
...(opts.voice ? { voice: opts.voice } : {}),
},
}
: {}),
},
}
: {}),
},
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants