Skip to content

fix(streaming): skip duplicate delta.tool_calls already emitted during streaming#9286

Closed
walcz-de wants to merge 1 commit intomudler:masterfrom
walcz-de:fix/streaming-tool-calls-duplicate-emit
Closed

fix(streaming): skip duplicate delta.tool_calls already emitted during streaming#9286
walcz-de wants to merge 1 commit intomudler:masterfrom
walcz-de:fix/streaming-tool-calls-duplicate-emit

Conversation

@walcz-de
Copy link
Copy Markdown
Contributor

@walcz-de walcz-de commented Apr 9, 2026

Problem

When using tool calls with streaming enabled, the processTools function in core/http/endpoints/openai/chat.go emits duplicate delta.tool_calls events for the same tool call. This causes streaming clients to accumulate argument strings as {args}{args} which is invalid JSON, breaking tool call execution.

Observed error in cogito:

WARN Attempt to parse streamed tool arguments failed attempt=1 error=invalid character '{' after top-level value
WARN Attempt to parse streamed tool arguments failed attempt=2 error=invalid character '{' after top-level value
WARN Attempt to parse streamed tool arguments failed attempt=3 error=invalid character '{' after top-level value
ERROR Error executing cogito action

Root Cause

processTools has two code paths that both emit delta.tool_calls for the same tool call:

  1. During streaming (inside the ComputeChoices callback): ParseJSONIterative / ParseXMLIterative detects complete tool calls and emits them immediately. lastEmittedCount is incremented for each emitted tool call.

  2. After streaming (the default: case): iterates functionResults from index 0 and emits name-only + args-only chunks — but functionResults contains the same tool calls that were already emitted during step 1.

Both paths execute for the same request, producing 2× the tool call events.

Fix

In the default: case, skip indices i < lastEmittedCount. Those tool calls were already fully emitted during streaming. Only emit tool calls that the incremental parser did not handle (e.g. tool calls discovered only after the full response was available).

Testing

Verified with curl against a Qwen3.5-35B-A3B model with tool_choice: required:

  • Before fix: ToolArgs emitted twice with identical complete JSON → {args}{args} → parse error
  • After fix: single clean emission → correct tool call execution

No changes to non-streaming (stream: false) code path.

…g streaming

When the incremental JSON/XML parser emits tool calls during ComputeChoices
(tracked by lastEmittedCount), the post-streaming default: case was iterating
functionResults from index 0 and re-emitting all of them as separate
name-only + args-only chunks.

This produced duplicate delta.tool_calls events for the same tool call,
causing streaming clients (e.g. cogito) to accumulate arguments as
"{args}{args}" — invalid JSON that fails parsing on every attempt.

Fix: skip indices i < lastEmittedCount in the default: loop so that
only tool calls NOT already emitted during streaming are sent post-stream.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
// lastEmittedCount). Re-emitting them would produce duplicate
// delta.tool_calls events, causing clients to accumulate
// "{args}{args}" which is invalid JSON (cogito streaming bug).
if i < lastEmittedCount {
Copy link
Copy Markdown
Owner

@mudler mudler Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real problem is the Go-side incremental JSON parser at lines 270-316. It runs on every streaming token, re-parses the full accumulated text, and re-emits the same tool call every time the JSON is valid. This produces more duplicate emissions, while the default: case only adds 1 more on top.

But you should issue this only when backends don't emit chatdeltas. That tells me you are running an old version of the llama.cpp backend - try to upgrade it at least - and should fix this issue.

While the problem is indeed there, the PR is partially fixing it and needs adding some robust testing first to fix it properly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a glitch in my local build setup to build localai with rocm-7.12-preview which did not use the proper llama.cpp version. Now it works. So I will close this pr

@walcz-de
Copy link
Copy Markdown
Contributor Author

Using the right version of llama.cpp fixed it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants