fix(streaming): skip chat deltas for role-init elements to prevent first token duplication by mudler · Pull Request #9299 · mudler/LocalAI

mudler · 2026-04-09T20:07:27Z

When TASK_RESPONSE_TYPE_OAI_CHAT is used, the first streaming token produces a JSON array with two elements: a role-init chunk and the actual content chunk. The grpc-server loop called attach_chat_deltas for both elements with the same raw_result pointer, stamping the first token's ChatDelta.Content on both replies. The Go side accumulated both, emitting the first content token twice to SSE clients.

Fix: in the array iteration loops in PredictStream, detect role-init elements (delta has "role" key) and skip attach_chat_deltas for them. Only content/reasoning elements get chat deltas attached.

Reasoning models are unaffected because their first token goes into reasoning_content, not content.

Fixes: #9298

…rst token duplication When TASK_RESPONSE_TYPE_OAI_CHAT is used, the first streaming token produces a JSON array with two elements: a role-init chunk and the actual content chunk. The grpc-server loop called attach_chat_deltas for both elements with the same raw_result pointer, stamping the first token's ChatDelta.Content on both replies. The Go side accumulated both, emitting the first content token twice to SSE clients. Fix: in the array iteration loops in PredictStream, detect role-init elements (delta has "role" key) and skip attach_chat_deltas for them. Only content/reasoning elements get chat deltas attached. Reasoning models are unaffected because their first token goes into reasoning_content, not content.

mudler force-pushed the fix/first-token-dup branch from 3eafab5 to a8ad30d Compare April 9, 2026 20:12

mudler mentioned this pull request Apr 9, 2026

Regression: first streaming token duplicated in /v1/chat/completions #9298

Open

mudler force-pushed the fix/first-token-dup branch from a8ad30d to bcd0d32 Compare April 9, 2026 20:20

mudler added the bug Something isn't working label Apr 9, 2026

mudler force-pushed the fix/first-token-dup branch from bcd0d32 to 95c0a5d Compare April 9, 2026 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(streaming): skip chat deltas for role-init elements to prevent first token duplication#9299

fix(streaming): skip chat deltas for role-init elements to prevent first token duplication#9299
mudler wants to merge 1 commit intomasterfrom
fix/first-token-dup

mudler commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mudler commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant