fix(streaming): skip duplicate delta.tool_calls already emitted during streaming by walcz-de · Pull Request #9286 · mudler/LocalAI

walcz-de · 2026-04-09T12:56:44Z

Problem

When using tool calls with streaming enabled, the processTools function in core/http/endpoints/openai/chat.go emits duplicate delta.tool_calls events for the same tool call. This causes streaming clients to accumulate argument strings as {args}{args} which is invalid JSON, breaking tool call execution.

Observed error in cogito:

WARN Attempt to parse streamed tool arguments failed attempt=1 error=invalid character '{' after top-level value
WARN Attempt to parse streamed tool arguments failed attempt=2 error=invalid character '{' after top-level value
WARN Attempt to parse streamed tool arguments failed attempt=3 error=invalid character '{' after top-level value
ERROR Error executing cogito action

Root Cause

processTools has two code paths that both emit delta.tool_calls for the same tool call:

During streaming (inside the ComputeChoices callback): ParseJSONIterative / ParseXMLIterative detects complete tool calls and emits them immediately. lastEmittedCount is incremented for each emitted tool call.
After streaming (the default: case): iterates functionResults from index 0 and emits name-only + args-only chunks — but functionResults contains the same tool calls that were already emitted during step 1.

Both paths execute for the same request, producing 2× the tool call events.

Fix

In the default: case, skip indices i < lastEmittedCount. Those tool calls were already fully emitted during streaming. Only emit tool calls that the incremental parser did not handle (e.g. tool calls discovered only after the full response was available).

Testing

Verified with curl against a Qwen3.5-35B-A3B model with tool_choice: required:

Before fix: ToolArgs emitted twice with identical complete JSON → {args}{args} → parse error
After fix: single clean emission → correct tool call execution

No changes to non-streaming (stream: false) code path.

…g streaming When the incremental JSON/XML parser emits tool calls during ComputeChoices (tracked by lastEmittedCount), the post-streaming default: case was iterating functionResults from index 0 and re-emitting all of them as separate name-only + args-only chunks. This produced duplicate delta.tool_calls events for the same tool call, causing streaming clients (e.g. cogito) to accumulate arguments as "{args}{args}" — invalid JSON that fails parsing on every attempt. Fix: skip indices i < lastEmittedCount in the default: loop so that only tool calls NOT already emitted during streaming are sent post-stream. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mudler · 2026-04-09T14:36:49Z

core/http/endpoints/openai/chat.go

+				// lastEmittedCount). Re-emitting them would produce duplicate
+				// delta.tool_calls events, causing clients to accumulate
+				// "{args}{args}" which is invalid JSON (cogito streaming bug).
+				if i < lastEmittedCount {


The real problem is the Go-side incremental JSON parser at lines 270-316. It runs on every streaming token, re-parses the full accumulated text, and re-emits the same tool call every time the JSON is valid. This produces more duplicate emissions, while the default: case only adds 1 more on top.

But you should issue this only when backends don't emit chatdeltas. That tells me you are running an old version of the llama.cpp backend - try to upgrade it at least - and should fix this issue.

While the problem is indeed there, the PR is partially fixing it and needs adding some robust testing first to fix it properly.

It was a glitch in my local build setup to build localai with rocm-7.12-preview which did not use the proper llama.cpp version. Now it works. So I will close this pr

walcz-de · 2026-04-10T05:05:12Z

Using the right version of llama.cpp fixed it

mudler reviewed Apr 9, 2026

View reviewed changes

mudler mentioned this pull request Apr 9, 2026

fix(streaming): deduplicate tool call emissions during streaming #9292

Merged

walcz-de closed this Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(streaming): skip duplicate delta.tool_calls already emitted during streaming#9286

fix(streaming): skip duplicate delta.tool_calls already emitted during streaming#9286
walcz-de wants to merge 1 commit intomudler:masterfrom
walcz-de:fix/streaming-tool-calls-duplicate-emit

walcz-de commented Apr 9, 2026

Uh oh!

mudler Apr 9, 2026 •

edited

Loading

Uh oh!

walcz-de Apr 10, 2026

Uh oh!

walcz-de commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

walcz-de commented Apr 9, 2026

Problem

Root Cause

Fix

Testing

Uh oh!

mudler Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

walcz-de Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

walcz-de commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mudler Apr 9, 2026 •

edited

Loading