fix(streaming): skip duplicate delta.tool_calls already emitted during streaming#9286
fix(streaming): skip duplicate delta.tool_calls already emitted during streaming#9286walcz-de wants to merge 1 commit intomudler:masterfrom
Conversation
…g streaming
When the incremental JSON/XML parser emits tool calls during ComputeChoices
(tracked by lastEmittedCount), the post-streaming default: case was iterating
functionResults from index 0 and re-emitting all of them as separate
name-only + args-only chunks.
This produced duplicate delta.tool_calls events for the same tool call,
causing streaming clients (e.g. cogito) to accumulate arguments as
"{args}{args}" — invalid JSON that fails parsing on every attempt.
Fix: skip indices i < lastEmittedCount in the default: loop so that
only tool calls NOT already emitted during streaming are sent post-stream.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| // lastEmittedCount). Re-emitting them would produce duplicate | ||
| // delta.tool_calls events, causing clients to accumulate | ||
| // "{args}{args}" which is invalid JSON (cogito streaming bug). | ||
| if i < lastEmittedCount { |
There was a problem hiding this comment.
The real problem is the Go-side incremental JSON parser at lines 270-316. It runs on every streaming token, re-parses the full accumulated text, and re-emits the same tool call every time the JSON is valid. This produces more duplicate emissions, while the default: case only adds 1 more on top.
But you should issue this only when backends don't emit chatdeltas. That tells me you are running an old version of the llama.cpp backend - try to upgrade it at least - and should fix this issue.
While the problem is indeed there, the PR is partially fixing it and needs adding some robust testing first to fix it properly.
There was a problem hiding this comment.
It was a glitch in my local build setup to build localai with rocm-7.12-preview which did not use the proper llama.cpp version. Now it works. So I will close this pr
|
Using the right version of llama.cpp fixed it |
Problem
When using tool calls with streaming enabled, the
processToolsfunction incore/http/endpoints/openai/chat.goemits duplicatedelta.tool_callsevents for the same tool call. This causes streaming clients to accumulate argument strings as{args}{args}which is invalid JSON, breaking tool call execution.Observed error in cogito:
Root Cause
processToolshas two code paths that both emitdelta.tool_callsfor the same tool call:During streaming (inside the
ComputeChoicescallback):ParseJSONIterative/ParseXMLIterativedetects complete tool calls and emits them immediately.lastEmittedCountis incremented for each emitted tool call.After streaming (the
default:case): iteratesfunctionResultsfrom index0and emits name-only + args-only chunks — butfunctionResultscontains the same tool calls that were already emitted during step 1.Both paths execute for the same request, producing 2× the tool call events.
Fix
In the
default:case, skip indicesi < lastEmittedCount. Those tool calls were already fully emitted during streaming. Only emit tool calls that the incremental parser did not handle (e.g. tool calls discovered only after the full response was available).Testing
Verified with
curlagainst a Qwen3.5-35B-A3B model withtool_choice: required:ToolArgsemitted twice with identical complete JSON →{args}{args}→ parse errorNo changes to non-streaming (
stream: false) code path.