fix: drop orphaned tool results on session resume#3001
Conversation
Compaction's kept-tail boundary (FirstKeptEntry) could fall between an assistant tool_use and its tool_result. On resume, the reconstructed history then started with an orphaned tool_result whose tool_use had been folded into the summary. sanitizeToolCalls only injected synthetic results for dangling tool_use blocks; it never dropped orphaned tool_results, so the invalid history reached the provider and AWS Bedrock rejected it (ValidationException: toolResult blocks exceed toolUse blocks). sanitizeToolCalls now also drops tool results with no matching tool_use in the preceding assistant message, keeping the invariant balanced in both directions. Fixes #1676
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
The fix correctly addresses the Bedrock ValidationException by dropping orphaned tool results in sanitizeToolCalls. The logic is sound: pendingIDs is populated when an assistant message with tool calls is processed and reset on flushPending(), so each turn's scope is correctly isolated. The three new tests provide good coverage of the primary fix path and the existing dangling-call direction.
One minor behavioral change is noted inline.
| // Drop orphaned tool results: a tool_result with no matching | ||
| // tool_use in the preceding assistant message violates the | ||
| // provider contract. | ||
| if msg.ToolCallID == "" || !pendingIDs[msg.ToolCallID] { |
There was a problem hiding this comment.
[LOW] Behavioral change: tool results with empty ToolCallID are now silently dropped
Under the old logic the condition was if msg.ToolCallID != "" which only gated the resultIDs book-keeping — a tool-result with an empty ID still fell through to out = append(out, msg) and was forwarded to the provider. Under the new condition:
if msg.ToolCallID == "" || !pendingIDs[msg.ToolCallID] {
continue // drops the message
}an empty-ID tool result is now silently dropped. In practice this is likely fine — a tool result with no ID cannot be matched to any tool_use and most providers mandate non-empty IDs — but it is a silent behavioral regression from the previous behavior. If any in-flight session persisted with empty-ID tool results they would be discarded on the next resume.
Consider documenting the intentional drop (e.g., // empty ToolCallID is malformed; drop it) so the behavior is explicit rather than a side-effect of the orphan check.
Problem
Resuming a session via
--session//sessionsimmediately fails on AWS Bedrock withValidationException: The number of toolResult blocks ... exceeds the number of toolUse blocks of previous turn.Root cause
When a session is compacted, the kept-tail boundary (
FirstKeptEntry) can land between an assistanttool_useand itstool_result. On resume,buildSessionSummaryMessagesemits the conversation from that boundary, so the reconstructed history starts with an orphanedtool_result(no matchingtool_use).sanitizeToolCalls— the last guard before messages reach the provider — only injected synthetic results for dangling tool calls and always forwardedtoolmessages, so orphans passed straight through. OpenAI-compatible providers (litellm) forward the invalid request; Bedrock rejects it.Fix
sanitizeToolCallsnow also drops tool results whoseToolCallIDhas no matchingtool_usein the preceding assistant message, keeping thetoolUse/toolResultinvariant balanced in both directions.Tests
TestSanitizeToolCalls_DropsOrphanedToolResult— unit guard.TestGetMessages_ResumeAfterCompaction_NoOrphanedToolResult— fullGetMessagesresume path with a summary boundary splitting a tool pair.TestSanitizeToolCalls_KeepsMissingResultBalanced— pins existing dangling-tool-call behaviour.Both new tests fail without the fix and pass with it.
Fixes #1676