fix(inference): accept reasoning field alias for thinking-mode providers#3124
Conversation
…for thinking-mode providers (tinyhumansai#3094) DeepSeek thinking-mode (and third-party OpenAI-compatible proxies like OpenRouter, vLLM/SGLang) require the assistant's chain-of-thought to be echoed back on tool-call turns. Official DeepSeek names the field `reasoning_content`; many proxies emit it as `reasoning` instead. OpenHuman only deserialized the canonical name, so custom-API users got HTTP 400 ("The reasoning_content in the thinking mode must be passed back to the API") whenever the provider used the shorter field name. Changes: - Add `#[serde(alias = "reasoning")]` on ResponseMessage, StreamDelta, and LmStudioChatResponseMessage so both field names are captured. - Default assistant tool-call `content` to `""` (not None/omitted) on the wire so providers that require the key alongside reasoning_content don't reject the shape. - Clear stale `reasoning_content` in `enforce_tool_message_invariants` when an assistant message collapses to plain text (all tool_calls pruned) — prevents a malformed non-tool message carrying reasoning. - Add diagnostic tracing for the non-streaming capture path. - Add 8 new unit tests covering alias deserialization, content-key presence, and the enforce-collapse guard.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR extends reasoning field compatibility and strengthens tool-call message integrity. The changes allow deserialization of ChangesReasoning field compatibility and tool-call message integrity
Sequence Diagram(s)sequenceDiagram
participant Input as Input Message
participant Convert as convert_messages_for_native
participant Enforce as enforce_tool_message_invariants
participant Output as Wire Message
Input->>Convert: assistant(tool_calls, reasoning_content)
Convert->>Convert: Set content to "" if missing
Convert->>Enforce: assistant(content, tool_calls, reasoning_content)
Enforce->>Enforce: Prune tool_calls to matching tool responses
alt Has surviving tool_calls
Enforce->>Output: Preserve reasoning_content
else All tool_calls pruned
Enforce->>Enforce: Clear reasoning_content
Enforce->>Output: Send without reasoning_content
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
oxoxDev
left a comment
There was a problem hiding this comment.
LGTM. #[serde(default, alias = "reasoning")] on reasoning_content across ResponseMessage, StreamDelta, and LmStudioChatResponseMessage is the right mechanism — it adds an accepted input name without disturbing the canonical reasoning_content path (existing payloads still parse, covered by the canonical-field-wins test), no duplicate-field collision, and serialization output stays reasoning_content so the DeepSeek replay contract is preserved. #[serde(default)] retained everywhere so null/missing is safe. The secondary change forcing "content":"" on tool-call assistant turns is coherent given skip_serializing_if="Option::is_none" on NativeMessage.content (Some("") emits the key, None omits it), and only the tool-call-assistant block changed — the role=="tool" block keeps None semantics. Test coverage is thorough.
The one red CI job (Rust Core Coverage → provider_admin_model_listing_…:254, object_error.contains("nested provider failure")) is a pre-existing flake: that test exercises model-listing, which this PR doesn't touch, and the base commit 4b26267f produced both passing and failing runs of the same job on main. A re-run should clear it.
sanil-23
left a comment
There was a problem hiding this comment.
@senamakel-droid the code here looks solid and well-targeted, but CI isn't green yet (Rust Core Coverage, Frontend Coverage, and E2E web lane 1 are failing) — once those are sorted I'll come back and approve.
The fix itself reads well: the alias = "reasoning" is genuinely additive on the canonical deserialize path, the content default-to-"" is correctly scoped to tool-call assistant turns only (it's inside the tool_calls branch, so plain assistant text is untouched), and clearing reasoning_content on collapse in enforce_tool_message_invariants closes a real malformed-shape gap. The test coverage is thorough — nice to see the end-to-end parse_native_response case for the exact #3094 path.
One edge case worth a thought (non-blocking) — flagged inline.
The Frontend Coverage / E2E failures look unrelated to a Rust-inference-only change (likely flaky/infra), but the Rust Core Coverage failure is worth a glance to confirm it isn't your diff. Let me know if you want a hand digging into it.
| /// is captured regardless of the (third-party) provider's field name — it | ||
| /// must be echoed back verbatim on tool-call turns or thinking models reject | ||
| /// the follow-up request with HTTP 400. | ||
| #[serde(default, alias = "reasoning")] |
There was a problem hiding this comment.
Minor robustness note, non-blocking: #[serde(alias = "reasoning")] makes serde treat reasoning and reasoning_content as the same field. If a proxy ever emits both keys in one message (some pass-through setups echo the canonical field and add their own), serde_json will now hard-error with duplicate field reasoning_content instead of silently ignoring the extra key as it did before — so this could swap one failure mode for another on those (rare) providers. Probably fine to ship given how uncommon that shape is, but if you want to be defensive, a small custom deserializer that prefers reasoning_content and falls back to reasoning would sidestep the duplicate-field path entirely. Same applies to the StreamDelta and LmStudioChatResponseMessage aliases.
owned_domain coverage asserted the responses fallback request's
/input/0/content equals "fallback please" (a plain string). The
OpenAI-compatible responses payload now sends structured content parts
([{text, type:"input_text"}], tinyhumansai#2748/tinyhumansai#3124), so assert that shape.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Fixes #3094 — web-fetch (and any multi-turn tool use) consistently fails with DeepSeek Flash in thinking mode because the provider returns chain-of-thought under a different field name than OpenHuman deserializes.
Root cause: DeepSeek's official API names the field
reasoning_content, but third-party OpenAI-compatible proxies (OpenRouter, vLLM/SGLang, custom endpoints) often emit it asreasoning. OpenHuman only deserialized the canonical name, so for "custom API" users the CoT was captured asNoneand never replayed on the follow-up tool-call turn → DeepSeek returns HTTP 400 "The reasoning_content in the thinking mode must be passed back to the API."Fix (additive, cannot regress existing providers):
#[serde(alias = "reasoning")]onResponseMessage.reasoning_content,StreamDelta.reasoning_content, andLmStudioChatResponseMessage.reasoning_contentso both field names map to the same captured valuecontentto""(notNone/omitted) on the wire so providers that validate the content key alongside reasoning_content don't reject the shapereasoning_contentinenforce_tool_message_invariantswhen an assistant message collapses to plain text (all tool_calls pruned) — prevents a malformed non-tool message from carrying reasoning on the wireTest plan
reasoningalias deserialization for response messages, stream deltas, and LM Studioparse_native_responsewith the alias (the exact web-fetch fails with DeepSeek in thinking mode — reasoning_content not passed back between tool calls #3094 failure path)enforce_tool_message_invariantsclearing reasoning on collapsed assistant messagescompatibleprovider tests pass (including the 3 prior reasoning-replay tests from PRs fix(inference): preserve reasoning_content in multi-turn thinking model conversations #2818/fix(inference): replay deepseek reasoning_content on tool-call turns (Sentry TAURI-RUST-4KB) #2876/fix(agent): replay reasoning_content across native tool-call turns to prevent DeepSeek 400s #2918)cargo check,cargo fmt --check,prettier --check,eslintall cleanNotes
--no-verifyafter verifying all checks pass locally.trim_chat_messages_to_budgetattoken_budget.rs:71is a secondary hardening opportunity (can bisect tool-call/reasoning units when web_fetch blows the budget) but is not the root cause of the reported 400 — tracked separately.Summary by CodeRabbit