feat(streaming): stream tool call argument deltas in TemporalStreamingModel by vkalmathscale · Pull Request #355 · scaleapi/scale-agentex-python

vkalmathscale · 2026-05-12T15:31:41Z

Summary

TemporalStreamingModel already streams text deltas and reasoning summary deltas to Redis via StreamingTaskMessageContext, but ResponseFunctionCallArgumentsDeltaEvent was being silently buffered into function_calls_in_progress[...]['arguments'] with no per-delta publish. Consumers only saw the completed tool call surface later (after the activity returned, via downstream hooks if any).

For write-heavy tools — write_file, apply_patch, anything that puts a 2–20KB string into a single argument — the model spends multiple seconds generating the argument body, and the UI sees nothing until the entire activity finishes. The result is a frozen UI followed by an abrupt jump when the activity returns.

This PR threads tool-call argument deltas through the same streaming machinery used for text and reasoning, riding on the CoalescingBuffer + StreamingMode infrastructure added in #333. The buffer's merge helpers already key on tool_call_id for ToolRequestDelta, so coalescing, mode dispatch, and opt-out are inherited from existing infra.

Design

TemporalStreamingModel now opens a streaming_task_message_context per function call (keyed off the call's output_index), with initial_content=ToolRequestContent(...) and the model's configured streaming_mode. Three event handlers participate:

Event	Behavior added
`ResponseOutputItemAddedEvent` (`type=function_call`)	Open the per-call streaming context and stash it on `function_calls_in_progress[output_index]['context']`.
`ResponseFunctionCallArgumentsDeltaEvent`	Emit `StreamTaskMessageDelta(delta=ToolRequestDelta(arguments_delta=..., tool_call_id=..., name=...))` into the per-call context. The coalescing buffer merges consecutive deltas with the same `tool_call_id`.
`ResponseOutputItemDoneEvent` (`type=function_call`)	Parse the accumulated args (with a graceful empty-dict fallback on `JSONDecodeError`), emit a final `StreamTaskMessageFull(content=ToolRequestContent(...))`, and close the context.

End-of-loop cleanup defensively closes any function-call contexts that didn't see a Done event (truncated stream or mid-stream exception).

ModelResponse output is unchanged: output_items still receives the same complete ResponseFunctionToolCall. Activity determinism is unaffected — streaming is a side effect.

What this does NOT change

Text and reasoning streaming paths are untouched.
StreamingMode is already the on/off knob. No new flag. streaming_mode="off" suppresses tool-arg deltas the same way it suppresses text deltas. "per_token" publishes immediately; "coalesced" (default) batches at 50ms / 128 chars.
TemporalStreamingHooks.on_tool_start is unchanged. It still fires after the activity returns and still emits a ToolRequestContent Full message via the stream_lifecycle_content activity. See Caveats.

Caveats

Overlap with TemporalStreamingHooks.on_tool_start. Users who pass TemporalStreamingHooks to Runner.run will now see two persisted task_messages per tool call: one created by the model (delta stream + final Full) and one created by the hook (Full only). Both land on the same Redis topic task:{task_id} with different parent_task_message.ids, so a default UI will render two cards for the same logical tool call.

This needs a follow-up to decide which path owns the canonical ToolRequest emission. Options for review discussion:
- Silence on_tool_start's Full emit when the model is also emitting (auto-detect via a workflow-instance flag, mirroring how _task_id / _trace_id are threaded today).
- Remove on_tool_start's Full emit entirely in a follow-up major bump (the model becomes the single source of truth for ToolRequest events).
Until that follow-up, users who want streamed tool args without duplicate emits should subclass TemporalStreamingHooks and override on_tool_start to a no-op.
Coalescing windows still apply. With the default 50ms / 128-char window, tool args render in ~50ms-granularity chunks rather than per-token. This is the same tradeoff already made for text streaming in perf(streaming): coalesce per-token publishes to Redis (50ms / 128-char window) #333, and the right default for write-heavy tools (UX value is "watch the artifact appear", not "see each token").
Malformed argument JSON. If the model produces invalid JSON for the args (truncated stream, hallucinated structure), the path logs a WARNING and emits arguments={} in the final ToolRequestContent. The raw delta stream is preserved on the consumer side regardless — only the structured final view falls back.

Test plan

Two new unit tests in test_streaming_model.py::TestStreamingModelFunctionCallArgsStreaming:
- Happy path: well-formed args produce one streaming context opened with ToolRequestContent, one StreamTaskMessageDelta(ToolRequestDelta) per ArgumentsDelta event preserving the delta text, and one final StreamTaskMessageFull(ToolRequestContent) with parsed args.
- Malformed args: emits arguments={} in the final Full and logs a WARNING.
Full test_streaming_model.py suite passes (42/42).
ruff check clean on both modified files.
Manual smoke: deploy to a dev environment with an agent that calls a write-heavy tool, confirm UI sees tool args streaming in coalesced batches.
Manual smoke: streaming_mode="off" suppresses tool-arg deltas (only the final persisted message exists on close).

cc reviewers familiar with #333's CoalescingBuffer design.

Greptile Summary

Threads ResponseFunctionCallArgumentsDeltaEvent through the existing CoalescingBuffer + StreamingMode infrastructure by opening a streaming_task_message_context per function call (keyed on output_index), emitting ToolRequestDelta updates per delta, and a final StreamTaskMessageFull with parsed JSON on ResponseOutputItemDoneEvent. Text and reasoning paths are unchanged.
As acknowledged in the PR description, users combining TemporalStreamingHooks with this change will see two ToolRequest task messages per tool call (one from the model stream, one from on_tool_start) until a follow-up removes the hook's duplicate emit.
The orphan-cleanup loop at the end of the event loop will double-close every function-call context that completed normally, because call_data['context'] is not set to None after ResponseOutputItemDoneEvent calls close() — this was flagged in a prior review thread.

Confidence Score: 3/5

Not safe to merge without addressing the double-close of streaming contexts on every successful function call (flagged in a prior review thread but unresolved in this revision).

A pre-existing P1 (double-close of already-closed streaming contexts in the orphan-cleanup loop) remains unaddressed — call_data['context'] is never nulled out after the ResponseOutputItemDoneEvent handler closes it, so every successfully-completed function call context is closed twice, potentially publishing a duplicate stream-ended event per tool call. Score is pulled below the P1 ceiling of 4 because this affects all function calls in every response, not an isolated edge case.

src/agentex/lib/core/temporal/plugins/openai_agents/models/temporal_streaming_model.py — specifically the ResponseOutputItemDoneEvent handler (around line 894–897) where call_data['context'] must be set to None after close() to prevent the orphan-cleanup loop from double-closing it.

Important Files Changed

Filename	Overview
src/agentex/lib/core/temporal/plugins/openai_agents/models/temporal_streaming_model.py	Adds per-delta tool argument streaming via per-call `streaming_task_message_context`; has a pre-existing double-close bug (flagged in previous review) where successfully-closed contexts are re-closed by the orphan-cleanup loop because `call_data['context']` is never set to `None` after the `ResponseOutputItemDoneEvent` handler calls `close()`.
src/agentex/lib/core/temporal/plugins/openai_agents/tests/test_streaming_model.py	Two new tests cover the happy-path delta-and-final-full contract and the malformed-JSON fallback. Context assertions rely on all `streaming_task_message_context` calls sharing a single `return_value` mock, which is a fragile assumption when multiple context types coexist.

Sequence Diagram

sequenceDiagram
    participant OAI as OpenAI Stream
    participant TSM as TemporalStreamingModel
    participant CTX as StreamingTaskMessageContext
    participant Redis as Redis (task:{task_id})

    OAI->>TSM: "ResponseOutputItemAddedEvent(type=function_call)"
    TSM->>CTX: "__aenter__(initial_content=ToolRequestContent(args={}))"
    CTX-->>Redis: persist streaming task_message (IN_PROGRESS)

    loop Per argument token
        OAI->>TSM: "ResponseFunctionCallArgumentsDeltaEvent(delta=chunk)"
        TSM->>TSM: "call_data['arguments'] += chunk"
        TSM->>CTX: stream_update(StreamTaskMessageDelta(ToolRequestDelta))
        CTX-->>Redis: "publish delta (CoalescingBuffer @ 50ms/128ch)"
    end

    OAI->>TSM: "ResponseFunctionCallArgumentsDoneEvent(arguments=full_str)"
    TSM->>TSM: "call_data['arguments'] = full_str (authoritative)"

    OAI->>TSM: "ResponseOutputItemDoneEvent(type=function_call)"
    TSM->>TSM: "json.loads(call_data['arguments']) => parsed_args"
    TSM->>CTX: "stream_update(StreamTaskMessageFull(ToolRequestContent(args=parsed_args)))"
    CTX-->>Redis: publish full message
    TSM->>CTX: close()
    Note over TSM,CTX: call_data['context'] NOT set to None here

    OAI->>TSM: ResponseCompletedEvent
    TSM->>TSM: "output_items = response.output"

    Note over TSM: Orphan-cleanup loop
    TSM->>CTX: close() again (double-close - call_data['context'] still non-None)

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
src/agentex/lib/core/temporal/plugins/openai_agents/tests/test_streaming_model.py:889-897
**Shared mock context conflates calls from all streaming context types**

`mock_adk_streaming.streaming_task_message_context.return_value` is a single `MagicMock` regardless of what `initial_content` was passed. If a future test (or an extended version of these tests) triggers both a text/reasoning context and a function-call context in the same `get_response` call, `ctx.stream_update.call_args_list` will capture delta updates from every context type, and the `isinstance(..., ToolRequestDelta)` filter will silently save the assertions from failing rather than proving isolation. Consider making the fixture produce a distinct mock per call via `side_effect=lambda **kw: make_ctx(kw["initial_content"])` so each context type is independently assertable.

_{Reviews (2): Last reviewed commit: "fix(streaming): drop raw tool args from ..." | Re-trigger Greptile}

…gModel Wire ResponseFunctionCallArgumentsDeltaEvent into the streaming layer introduced in #333, so write-heavy tools (write_file, apply_patch) no longer freeze the UI for the duration of argument generation. The model now opens a per-function-call streaming context with a ToolRequestContent placeholder, emits ToolRequestDelta updates for each argument delta, and finalizes with a StreamTaskMessageFull containing the parsed arguments on ResponseOutputItemDoneEvent. Coalescing and mode dispatch are inherited from the existing streaming infrastructure -- no new flags or surface area. ModelResponse output is unchanged; activity determinism is unaffected. End-of-loop cleanup defensively closes any function-call contexts that didn't see a Done event (truncated stream or mid-stream exception). Adds two tests covering the happy path (well-formed JSON args -> deltas + parsed Full) and the malformed-args fallback (invalid JSON -> empty dict + WARNING log).

github-actions · 2026-05-12T15:31:53Z

This PR is targeting main, but PRs should target the next branch by default.

The main branch is reserved for release-please and Stainless automation. To resolve, pick one of:

Re-target the PR to next (recommended). On the PR page, click Edit next to the title and change the base branch to next.
Add the target-main label if this is an intentional exception (e.g. an urgent hotfix). The check will re-run and pass.

See CONTRIBUTING.md for the full branch model.

greptile-apps · 2026-05-12T15:34:52Z

+                                    try:
+                                        await call_context.close()
+                                    except Exception as e:
+                                        logger.warning(f"Failed to close tool request context: {e}")
+
                    elif isinstance(event, ResponseReasoningSummaryPartAddedEvent):


Double-close of streaming context on every successful function call — after ResponseOutputItemDoneEvent closes the context (line 894), call_data['context'] is left non-None. The orphan-cleanup loop (lines 933–940) then iterates function_calls_in_progress.values() and closes every entry whose 'context' is not None — which includes every context that was already closed normally. close() will be called twice for every function call that finished cleanly, potentially publishing a duplicate "stream ended" event to Redis for each tool call in the response.

Suggested change

try:

await call_context.close()

except Exception as e:

logger.warning(f"Failed to close tool request context: {e}")

elif isinstance(event, ResponseReasoningSummaryPartAddedEvent):

try:

await call_context.close()

except Exception as e:

logger.warning(f"Failed to close tool request context: {e}")

finally:

call_data['context'] = None

elif isinstance(event, ResponseReasoningSummaryPartAddedEvent):

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agentex/lib/core/temporal/plugins/openai_agents/models/temporal_streaming_model.py Line: 893-898 Comment: **Double-close of streaming context on every successful function call** — after `ResponseOutputItemDoneEvent` closes the context (line 894), `call_data['context']` is left non-`None`. The orphan-cleanup loop (lines 933–940) then iterates `function_calls_in_progress.values()` and closes every entry whose `'context'` is not `None` — which includes every context that was already closed normally. `close()` will be called twice for every function call that finished cleanly, potentially publishing a duplicate "stream ended" event to Redis for each tool call in the response. ```suggestion try: await call_context.close() except Exception as e: logger.warning(f"Failed to close tool request context: {e}") finally: call_data['context'] = None elif isinstance(event, ResponseReasoningSummaryPartAddedEvent): ``` How can I resolve this? If you propose a fix, please make it concise.

Logging raw_args[:200] could leak partial file contents, PII, or secrets from write_file / apply_patch arguments into production log pipelines. Switch to logging only bounded metadata (tool name + raw arg byte count). The existing malformed-args test still passes since it asserts on the "Failed to parse tool call arguments" prefix, which is preserved.

greptile-apps Bot reviewed May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(streaming): stream tool call argument deltas in TemporalStreamingModel#355

feat(streaming): stream tool call argument deltas in TemporalStreamingModel#355
vkalmathscale wants to merge 2 commits into
mainfrom
vkalmath/stream-tool-call-arg-deltas

vkalmathscale commented May 12, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

greptile-apps Bot May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vkalmathscale commented May 12, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

What this does NOT change

Caveats

Test plan

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

greptile-apps Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vkalmathscale commented May 12, 2026 •

edited by greptile-apps Bot

Loading