Skip to content

feat(agents): Add streaming tool call argument support#4555

Open
samwillis wants to merge 2 commits into
mainfrom
samwillis/agents-streaming-tool-calls
Open

feat(agents): Add streaming tool call argument support#4555
samwillis wants to merge 2 commits into
mainfrom
samwillis/agents-streaming-tool-calls

Conversation

@samwillis

@samwillis samwillis commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds generic runtime support for tools that want to observe partial tool-call arguments while Pi is still streaming them. This PR is intentionally limited to the runtime protocol and persistence layer; it does not add any markdown-document behavior.

What changes:

  • Adds AgentTool.onArgsDelta(context, signal) as an optional hook on runtime tools.
  • Wires Pi toolcall_start, toolcall_delta, and toolcall_end assistant-message events through the runtime adapter.
  • Persists streaming argument lifecycle on tool_call rows using args_streaming and args_complete statuses.
  • Adds a tool_arg_delta built-in collection so streamed argument chunks are durable and inspectable.
  • Preserves the existing non-streaming tool-call behavior: ordinary tool execution still inserts a tool_call row with status: "started".

Example

import { Type } from '@sinclair/typebox'
import type { AgentTool } from '@electric-ax/agents-runtime'

export const streamingDraftTool: AgentTool = {
  name: 'streaming_draft',
  label: 'Streaming Draft',
  description: 'Apply text as the model streams the tool arguments.',
  parameters: Type.Object({
    documentId: Type.String(),
    content: Type.String(),
  }),

  async onArgsDelta({ toolCallId, delta, argsPreview }, signal) {
    // Called for each Pi `toolcall_delta` before final tool execution.
    // `delta` is the raw streamed argument chunk.
    // `argsPreview` is Pi's best current parsed argument object.
    const args = argsPreview as
      | { documentId?: string; content?: string }
      | undefined

    if (signal?.aborted || !args?.documentId) return

    await appendPartialContent({
      toolCallId,
      documentId: args.documentId,
      delta,
      previewContent: args.content ?? '',
    })
  },

  async execute(_toolCallId, args) {
    // Final execution still runs after Pi completes the tool call args.
    await finishDraft(args.documentId, args.content)
    return {
      content: [{ type: 'text', text: 'Draft complete.' }],
      details: { documentId: args.documentId },
    }
  },
}

Validation

  • pnpm --filter @electric-ax/agents-runtime typecheck
  • pnpm --filter @electric-ax/agents-runtime exec vitest run test/outbound-bridge.test.ts test/pi-adapter.test.ts
  • pnpm --filter @electric-ax/agents-runtime exec vitest run test/runtime-dsl.test.ts

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Desktop Builds

Build artifacts for commit 636d716.

Platform Status Artifact
macOS Apple Silicon Passed DMG
macOS Intel Passed DMG
Windows x64 Passed Installer
Linux x64 Passed AppImage / deb

Workflow run

@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.85714% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.05%. Comparing base (146f238) to head (636d716).
⚠️ Report is 21 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/agents-runtime/src/pi-adapter.ts 87.27% 7 Missing ⚠️
packages/agents-runtime/src/outbound-bridge.ts 94.33% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4555       +/-   ##
===========================================
+ Coverage   31.76%   55.05%   +23.28%     
===========================================
  Files         213      318      +105     
  Lines       18202    37016    +18814     
  Branches     6418    10547     +4129     
===========================================
+ Hits         5782    20379    +14597     
- Misses      12402    16604     +4202     
- Partials       18       33       +15     
Flag Coverage Δ
packages/agents 70.53% <ø> (?)
packages/agents-mobile 71.42% <ø> (-12.67%) ⬇️
packages/agents-runtime 80.41% <92.85%> (?)
packages/agents-server 73.98% <ø> (+1.02%) ⬆️
packages/agents-server-ui 5.66% <ø> (-0.43%) ⬇️
packages/electric-ax 46.42% <ø> (?)
typescript 55.05% <92.85%> (+23.28%) ⬆️
unit-tests 55.05% <92.85%> (+23.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Mobile Build

Local mobile checks ran for commit 636d716.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

@netlify

netlify Bot commented Jun 10, 2026

Copy link
Copy Markdown

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit de96826
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/6a299c462f9f160008fb7065
😎 Deploy Preview https://deploy-preview-4555--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@samwillis samwillis changed the title Add streaming tool call argument support feat(agents): Add streaming tool call argument support Jun 10, 2026
@samwillis samwillis marked this pull request as ready for review June 10, 2026 19:44
@kevin-dp

Copy link
Copy Markdown
Contributor

I had Claude Fable review this PR, it seems to have found some problems that need to be addressed. Here's the full review:

Summary

Adds runtime plumbing for streaming tool-call arguments: Pi toolcall_start/delta/end events flow through the adapter into a new optional AgentTool.onArgsDelta hook, with persistence via a new tool_arg_delta collection and an args_streaming status on tool_call rows. The design is sound and follows the existing text_delta precedent well, but there are two significant issues: the persisted tool-call state machine silently changes for all streaming providers (contradicting the PR description and breaking a conformance-test invariant), and the fire-and-forget hook dispatch has no ordering or completion guarantees, which undermines the API's headline use case.

What's Working Well

  • ensureToolCall is a clean consolidation — making onToolCallStart idempotent over an already-streamed call (transitioning to executing) is the right shape, and it deduplicates the old insert logic.
  • Deterministic delta keys (${toolCall.key}:args-${seq}) follow the established text_delta pattern and stay replay-safe via outboundIdSeed.
  • Extending AgentTool as PiAgentTool & { onArgsDelta?: ... } in types.ts rather than depending on an upstream pi-agent-core change is the right call; the extra property passes harmlessly through to the Pi Agent constructor.
  • Hook errors are caught and logged rather than crashing the run.
  • Changeset present with appropriate minor bump; both new tests are well-targeted (bridge persistence and adapter dispatch are each tested at their own layer).

Issues Found

Critical (Must Fix)

1. Persisted tool-call state machine changes for every tool call on streaming providers — breaks documented invariant and downstream consumers

Files: packages/agents-runtime/src/pi-adapter.ts, packages/agents-server-conformance-tests/src/electric-agents-dsl.ts:2454

The PR description says "ordinary tool execution still inserts a tool_call row with status: 'started'" — but that's only true when the provider doesn't emit toolcall_start events. Real streaming providers (Anthropic included) emit them for every tool call; previously they fell into the debug-log else branch and were ignored. After this PR, every tool call from a streaming provider inserts with status: 'args_streaming' and never passes through started.

This breaks the conformance suite's E4 invariant, which asserts the tool_call insert must have status: 'started' (electric-agents-dsl.ts:2454), and affects UI consumers gating on started — e.g. packages/electric-ax/src/observe-ui.tsx:115 (ToolCallView shows the "pending" icon only for started; streamed calls will jump straight to the spinner during arg streaming, before execution begins).

Suggested fix: Either (a) update the conformance spec/tests and audit status consumers as part of this PR, treating the state-machine change as intentional and documented, or (b) have onToolCallArgsStart insert with status: 'started' and only move to args_streaming on the first delta, preserving the insert invariant. Either way, the PR description's "preserves existing behavior" claim needs correcting.

2. onArgsDelta dispatch has no ordering or completion guarantees — the API's primary use case is exposed to corruption

File: packages/agents-runtime/src/pi-adapter.ts (the void Promise.resolve(tool.onArgsDelta(...)).catch(...) dispatch)

Two related problems:

  • Out-of-order execution: each delta's hook invocation is fire-and-forget. If the hook is async and a call is slow (the PR's own example does await appendPartialContent(...)), invocations for delta N and N+1 run concurrently and can complete out of order — append-style consumers will interleave/corrupt content.
  • No barrier before execute: nothing awaits in-flight onArgsDelta promises before final tool execution. A partial write from a late delta can race with (or land after) the execute finalization, e.g. a stale appendPartialContent overwriting finishDraft.

Suggested fix: serialize hook invocations per toolCallId (chain each call onto the previous promise), and await the chain's settlement before invoking execute for that tool call. If you'd rather keep dispatch fully detached, the contract ("invocations may interleave and may still be in flight at execute time; rely on argsPreview snapshots, not delta ordering") must be documented prominently — but serialization is cheap and makes the example code actually correct.

Important (Should Fix)

3. The signal parameter on onArgsDelta is declared but never passed

Files: packages/agents-runtime/src/types.ts (hook signature), packages/agents-runtime/src/pi-adapter.ts

onArgsDelta?: (context, signal?: AbortSignal) => ... — the adapter never supplies a signal, yet the PR description's example checks signal?.aborted, implying it works. The run's abortSignal is a run() parameter (pi-adapter.ts:505) and isn't currently in scope at the message_update handler, so wiring it needs minor plumbing (stash the active run's signal in adapter scope). Either wire it or remove the parameter; shipping a documented-but-dead parameter is worse than either.

4. args_preview on the tool_call row is written once at start and then goes permanently stale

File: packages/agents-runtime/src/outbound-bridge.ts (onToolCallArgsDelta)

onToolCallArgsDelta only inserts a tool_arg_delta row — it never updates the tool_call row's args_preview. Since durable-streams updates merge field-wise, the row keeps whatever preview existed at toolcall_start (typically empty/near-empty) for its entire lifetime, including after completion. A subscriber reading args_preview gets a misleading snapshot. Either update args_preview on each delta (accepting the row churn — the adapter already has the fresh preview in opts.argsPreview and passes it, but the bridge drops it for existing calls), or stop persisting args_preview entirely and let consumers fold tool_arg_delta rows, which exist precisely for this. Persisting a value that's stale by design is the worst of both options.

Suggestions (Nice to Have)

5. Latent key collision between provider tool-call IDs and legacy synthetic IDsonToolCallStart's legacy path computes toolCallId = `tc-${counters.tc}` without incrementing, relying on ensureToolCall to allocate the same value. If a provider ever supplies a tool-call ID of the form tc-N, toolCallsById lookups can cross-wire two distinct calls (the legacy call would "find" the streamed call's entry and clobber its row with executing). Unlikely with call_/toolu_ prefixed IDs, but cheap to defend against (e.g. prefix legacy IDs distinctly, like legacy-tc-N).

6. Tool-call events missing id/name are now silently dropped — the old code's debug-log else branch caught all non-text events; now toolcall_* events that fail the toolCallId && toolName guard vanish without a trace. Add a runtimeLog.debug in that path to keep parity with the existing unhandled-event logging.

7. The status union is now hand-duplicated in five placesentity-schema.ts, entity-timeline.ts (×3), types.ts. This was pre-existing, but the PR just paid the cost of touching all five; a shared ToolCallStatus type exported from entity-schema.ts would make the next status addition a one-line change.

8. Storage amplification for large streamed argstool_arg_delta rows accumulate forever (consistent with text_delta, which has no pruning), but tool args can be much larger than chat text (e.g. streaming whole documents), and the final args is stored on the row too — roughly 2× the payload, durably. Worth a code comment or doc note acknowledging this is accepted, so it's a decision rather than an accident.

9. Test gaps — no test covers: (a) the streamed call's transition to executing when onToolCallStart fires after onToolCallArgsEnd (the existing ? 'executing' : 'started' branch); (b) an onArgsDelta hook that throws/rejects (the catch-and-warn path); (c) toolcall_delta arriving without a prior toolcall_start (the ?? ensureToolCall fallback in onToolCallArgsDelta).

Issue Conformance

No linked issue — worth adding a reference if one exists. The PR description is clear and the example is helpful, but it makes two claims the code doesn't currently honor: that non-streaming behavior is preserved (see Critical #1 — it's only preserved when the provider doesn't stream, which real providers do), and that the hook receives an abort signal (see Important #3). Both the description and the example should be updated alongside the fixes.

Monorepo / Cross-Package Impact

  • Changeset present for @electric-ax/agents-runtime ✓.
  • The new args_streaming status flows into types consumed by agents-server-ui, electric-ax, and the conformance tests. No exhaustive switches break (all consumers use field-level guards), but the semantics of started change for streamed calls — see Critical # 1 for the two consumers that need attention.

Review iteration: 1 | 2026-06-11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants