feat(agents): Add realtime agents support#4543
Conversation
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
3ffd31d to
d54bb62
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4543 +/- ##
==========================================
+ Coverage 54.80% 57.69% +2.88%
==========================================
Files 317 367 +50
Lines 36681 42990 +6309
Branches 10466 12001 +1535
==========================================
+ Hits 20104 24802 +4698
- Misses 16544 18115 +1571
- Partials 33 73 +40
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Electric Agents Mobile BuildLocal mobile checks ran for commit The EAS Android preview build was skipped because the |
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
Adds first-class “realtime agents” support (durable-streams IO + OpenAI Realtime provider) across runtime, server control-plane, desktop settings bridge, and UI voice-mode controls so Horton can run speech-to-speech sessions without exposing OpenAI credentials to the browser.
Changes:
- Introduces realtime session creation via
/_electric/realtime/sessions, persisting durable stream refs + session metadata (manifest + entity collections). - Adds an OpenAI Realtime provider + runtime types/APIs for audio IO, turn detection, tool streaming, and transcript handling.
- Updates UI/desktop to start/stop voice sessions, gate by OpenAI API key validation, and render realtime transcripts in the timeline.
Reviewed changes
Copilot reviewed 57 out of 58 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Links desktop package to agents-runtime workspace dependency. |
| packages/agents/test/horton-tool-composition.test.ts | Adds coverage ensuring Horton uses realtime mode when an active session exists. |
| packages/agents/test/generate-title.test.ts | Adds coverage for title generation from finalized realtime input transcript. |
| packages/agents/src/agents/horton.ts | Routes Horton into ctx.useRealtime, adds realtime tool policy + realtime-informed title setting. |
| packages/agents-server/test/routing-hooks.test.ts | Updates CORS header expectations for producer/stream headers. |
| packages/agents-server/test/electric-agents-manager-write-validation.test.ts | Adds tests for realtime session creation + provider validation. |
| packages/agents-server/src/stream-client.ts | Extends stream append APIs to support content type, batching, and producer headers. |
| packages/agents-server/src/routing/realtime-router.ts | Adds control-plane route to create realtime sessions. |
| packages/agents-server/src/routing/internal-router.ts | Wires realtime router into internal routing tree. |
| packages/agents-server/src/routing/hooks.ts | Expands allowed CORS headers for durable-stream producer/stream metadata. |
| packages/agents-server/src/index.ts | Exports realtime session request/response and stream append option types. |
| packages/agents-server/src/entity-manager.ts | Implements createRealtimeSession (streams + manifest + state rows + wake). |
| packages/agents-server-ui/src/router.tsx | Adds “Realtime” settings route/category. |
| packages/agents-server-ui/src/lib/server-connection.ts | Adds realtime settings types + desktop bridge load/save helpers. |
| packages/agents-server-ui/src/lib/realtime-audio.ts | Implements browser durable-stream mic capture, resample/PCM16 encode, playback, control handling, and autogreet. |
| packages/agents-server-ui/src/hooks/useRealtimeAvailability.ts | Adds credential-gated realtime availability hook for UI. |
| packages/agents-server-ui/src/hooks/useDocumentTitle.ts | Adds label for realtime settings page. |
| packages/agents-server-ui/src/components/views/NewSessionView.tsx | Adds “start realtime” from new session view with viewParams forwarding. |
| packages/agents-server-ui/src/components/views/ChatView.tsx | Adds realtime autostart wiring from view params into composer. |
| packages/agents-server-ui/src/components/settings/SettingsSidebar.tsx | Adds sidebar entry for realtime settings. |
| packages/agents-server-ui/src/components/settings/pages/RealtimePage.tsx | Adds realtime settings UI (model/voice/effort/interruption + auth status). |
| packages/agents-server-ui/src/components/settings/pages/RealtimePage.module.css | Styles realtime settings page lists/selects. |
| packages/agents-server-ui/src/components/NewSessionPage.module.css | Adds styles for new-session voice start button. |
| packages/agents-server-ui/src/components/MessageInput.tsx | Adds voice mode controls in chat composer + text routing into realtime control stream. |
| packages/agents-server-ui/src/components/MessageInput.module.css | Adds styling for voice active state + input level meter. |
| packages/agents-server-ui/src/components/EntityTimeline.tsx | Renders realtime transcripts in timeline and hides realtime session wake rows. |
| packages/agents-server-ui/src/components/EntityContextDrawer.tsx | Adds safer fallbacks for manifest rendering with new manifest kinds. |
| packages/agents-server-ui/src/components/AgentResponse.tsx | Suppresses empty “live run” rendering for realtime runs. |
| packages/agents-runtime/tsdown.config.ts | Ensures stable d.ts generation for new chat entrypoints. |
| packages/agents-runtime/test/timeline-context.test.ts | Adds projection tests for realtime transcripts into timeline messages. |
| packages/agents-runtime/test/runtime-server-client-update-metadata.test.ts | Adds tests for starting realtime sessions via runtime server client. |
| packages/agents-runtime/test/openai-realtime.test.ts | Adds comprehensive OpenAI realtime provider unit tests. |
| packages/agents-runtime/test/helpers/context-test-helpers.ts | Extends handler context test helpers with realtime stream config. |
| packages/agents-runtime/test/entity-timeline.test.ts | Extends timeline query coverage for realtime transcripts + run ordering changes. |
| packages/agents-runtime/test/electric-agents-client.test.ts | Adds agents client surface for startRealtimeSession. |
| packages/agents-runtime/src/types.ts | Introduces realtime runtime/provider/session/transcript types and context hooks. |
| packages/agents-runtime/src/timeline-context.ts | Adds realtime transcript projection and filters realtime session wakes. |
| packages/agents-runtime/src/runtime-server-client.ts | Adds startRealtimeSession client method + request/response types. |
| packages/agents-runtime/src/realtime.ts | Adds test realtime provider helper. |
| packages/agents-runtime/src/realtime-options.ts | Adds OpenAI realtime model/voice/effort choices + validators + defaults. |
| packages/agents-runtime/src/process-wake.ts | Passes realtime stream connection info into handler context. |
| packages/agents-runtime/src/openai-realtime.ts | Implements OpenAI Realtime WebSocket provider and event mapping/tool bridging. |
| packages/agents-runtime/src/index.ts | Exposes realtime APIs/providers/options from runtime package. |
| packages/agents-runtime/src/entity-timeline.ts | Adds realtime transcript rows to timeline query/data + improves run ordering anchor. |
| packages/agents-runtime/src/entity-stream-db.ts | Adds optional collection indexes to support timeline query performance (incl. transcript deltas). |
| packages/agents-runtime/src/entity-schema.ts | Adds realtime session/audio span/transcript collections + schema updates for transcript deltas. |
| packages/agents-runtime/src/client.ts | Re-exports realtime options and session start types for client consumers. |
| packages/agents-runtime/src/agents-client.ts | Adds startRealtimeSession to high-level agents client API. |
| packages/agents-desktop/src/shared/types.ts | Adds desktop realtime settings + credential status/types. |
| packages/agents-desktop/src/settings/store.ts | Bumps settings version and persists normalized realtime settings. |
| packages/agents-desktop/src/settings/realtime.ts | Adds realtime settings normalization + OpenAI key validation with TTL cache. |
| packages/agents-desktop/src/preload.ts | Exposes realtime settings IPC bridge to renderer. |
| packages/agents-desktop/src/ipc/preferences.ts | Registers IPC handlers for realtime settings get/set. |
| packages/agents-desktop/src/app/controller.ts | Implements realtime settings status + persistence in desktop controller. |
| packages/agents-desktop/package.json | Adds agents-runtime dependency for realtime options/types. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| session: { | ||
| type: `realtime`, | ||
| model, | ||
| instructions: input.systemPrompt, | ||
| output_modalities: outputFormat ? [`audio`] : [`text`], | ||
| tool_choice: input.tools.length > 0 ? `auto` : `none`, | ||
| ...(reasoningEffort ? { reasoning: { effort: reasoningEffort } } : {}), | ||
| ...(input.tools.length > 0 | ||
| ? { tools: input.tools.map((tool) => toOpenAITool(tool)) } | ||
| : {}), | ||
| ...(inputFormat || outputFormat || opts.voice | ||
| ? { | ||
| audio: { | ||
| ...(inputFormat | ||
| ? { | ||
| input: { | ||
| format: inputFormat, | ||
| ...(transcription ? { transcription } : {}), | ||
| turn_detection: realtimeTurnDetection( | ||
| input.audio?.turnDetection | ||
| ), | ||
| }, | ||
| } | ||
| : {}), | ||
| ...(outputFormat || opts.voice | ||
| ? { | ||
| output: { | ||
| ...(outputFormat ? { format: outputFormat } : {}), | ||
| ...(opts.voice ? { voice: opts.voice } : {}), | ||
| }, | ||
| } | ||
| : {}), | ||
| }, | ||
| } | ||
| : {}), | ||
| }, |
Summary
Adds Horton realtime voice mode to Electric Agents with durable streams as the client/server IO path and OpenAI Realtime as the initial provider. Horton can drop into voice mode from an existing conversation or from the new-session screen, keep its normal context and tool loop, stream microphone audio to the runtime, stream assistant audio back to the client, persist transcript/audio metadata, and expose model/voice/reasoning settings in the desktop app.
The design deliberately avoids making the browser a direct OpenAI client. Clients write/read durable streams. The agent runtime owns the OpenAI WebSocket, provider credentials, tool execution, transcript reconciliation, and session lifecycle.
Design
IO Model
Realtime sessions are represented in the entity manifest with durable stream refs:
audio_in: client to runtime, mono PCM16 at 24 kHz.audio_out: runtime/provider to client, mono PCM16 at 24 kHz.control_in: client to runtime JSON commands, such as typed text, stop, close, and truncation.control_out: runtime to client JSON provider/runtime events, such as session started, response started/completed, speech started, and audio deltas.The only WebSocket in the first implementation is server/runtime to OpenAI Realtime. Browser and app clients use durable HTTP streams for all Electric client/server communication.
Runtime API
Adds first-class realtime runtime types and provider hooks for:
ctx.realtime.activeSession()ctx.useRealtime(...)as a built-in runtime APIRealtimeTurnDetectionConfigsupports:server_vadsemantic_vadfalseor{ type: "none" }The runtime bridges durable streams into provider sessions and records realtime state into entity collections:
realtimeSessionsrealtimeAudioSpansrealtimeTranscriptstextDeltasOpenAI Provider
Adds an OpenAI Realtime provider that:
gpt-realtime-2maringpt-realtime-2session.audio.input/session.audio.outputconfigevent_idwhere availableHorton Behavior
Horton detects active realtime sessions and uses
ctx.useRealtime(...)instead of the normalctx.useAgent(...)path. Context assembly and tool composition remain the same, so realtime Horton still has conversation history, repo/project context, and the existing Electric/Pi tool stack.Current Horton realtime defaults:
gpt-realtime-2maringpt-realtime-whisperwithdelay: "minimal"server_vad0.55300ms500msTyped text during an active realtime session routes into the realtime session via
control_ininstead of starting a separate text run.Browser Audio
The UI audio path now:
AudioWorkletfor microphone capture withScriptProcessorNodefallbackAudioContext({ sampleRate: 24000 })UI and Settings
Adds:
OpenAI Realtime currently requires an OpenAI API key. ChatGPT/Codex sign-in alone is not treated as sufficient for Realtime API access.
Timeline and Replay Groundwork
Realtime output is represented in normal timeline order alongside tool calls and user turns:
textDeltas, avoiding repeated full-row rewritesApp Builder Integration Example
Entity authors integrate realtime by adding a realtime branch to their handler. The runtime session is created by the server/client; the entity decides how to run when
ctx.realtime.activeSession()is present.A client starts a realtime session through the agents server, then writes audio/control frames to the returned durable streams:
Validation
Focused checks run during this branch:
pnpm -C packages/agents-runtime exec vitest run test/openai-realtime.test.ts test/realtime-context.test.tspnpm -C packages/agents-server exec vitest run test/electric-agents-manager-write-validation.test.tspnpm -C packages/agents-runtime exec vitest run test/process-wake.test.tspnpm --filter @electric-ax/agents-runtime typecheckpnpm --filter @electric-ax/agents-runtime buildpnpm --filter @electric-ax/agents-server-ui typecheckpnpm --filter @electric-ax/agents-desktop typecheckgit diff --checkManual desktop testing covered:
Notes and Follow-ups