fix: drop message items orphaned by handoff function calls consuming their reasoning item#3574
Conversation
…their reasoning item When a model turn during a handoff emits [reasoning, function_call, message], providers such as Azure OpenAI treat the reasoning item as consumed by the function_call. The trailing message item then has no paired reasoning and is rejected with HTTP 400: Item 'msg_...' of type 'message' was provided without its required 'reasoning' item Add drop_orphaned_messages_after_consumed_reasoning() and call it from prepare_model_input_items() alongside the existing drop_orphan_function_calls() pass. The new function tracks whether the most-recent reasoning item has been consumed by a function_call and drops any subsequent message item that would be left without a partner. This is the inverse of drop_orphan_function_calls(), which removes function calls without outputs and their preceding reasoning items.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 737335ccc5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
The previous state machine carried fresh_reasoning=False across all subsequent turns, incorrectly dropping valid assistant messages from later agents that legitimately emit responses without a reasoning item. Replace had_any_reasoning + fresh_reasoning with a single consumed_by_call flag that is reset to False as soon as the first orphaned message is dropped. This limits pruning to the one trailing message inside the same handoff turn and leaves all subsequent turns unaffected. Add clarifying comments to the test showing that the delegate agent response (no reasoning) must survive and reach final_output.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4be2bbeb25
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…turn bleed When the handoff turn emits [reasoning, function_call] with no trailing message, consumed_by_call stayed True and leaked into the next accumulated turn, silently dropping the delegate agent response. The SDK appends HandoffOutputItem (function_call_output) after all model output items, so any orphaned trailing message is dropped before we reach fc_out. Resetting consumed_by_call at function_call_output therefore scopes the drop to the current handoff sequence only and keeps all subsequent turns clean. Add test_handoff_without_trailing_message_keeps_delegate_response to cover this path explicitly.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b3dcb805cd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…on_call_output computer_call_output, shell_call_output, and other output types were not resetting consumed_by_call, so a reasoning-backed computer_call with no trailing message would leak the flag into the next turn and silently drop the following assistant message. Extract _CALL_OUTPUT_TYPES = frozenset(_TOOL_CALL_TO_OUTPUT_TYPE.values()) and use it as the reset condition so every call output type is covered.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5ddb96218e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
save_result_to_session() persists raw run items including any orphaned trailing message from a reasoning handoff turn. On the next Runner.run() with the same session, prepare_input_with_session() rebuilt history using only drop_orphan_function_calls(), so the orphaned message was re-sent to the provider and triggered the same HTTP 400. Import drop_orphaned_messages_after_consumed_reasoning into session_persistence.py and call it immediately after drop_orphan_function_calls() in prepare_input_with_session(), mirroring the existing pattern for function-call orphan pruning. Add test_session_history_drops_orphaned_message_on_next_run to verify the session replay path explicitly.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 702595022f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…ion_call/computer_call custom_tool_call, shell_call, apply_patch_call, local_shell_call, and tool_search_call were not setting consumed_by_call, so a reasoning item followed by any of those call types and then a message would still be sent as [reasoning, <tool_call>, message] and trigger the same provider 400 this PR is fixing. Replace the hardcoded (function_call, computer_call) tuple with _TOOL_CALL_TO_OUTPUT_TYPE, which already enumerates every call type that the runtime tracks and matches to outputs.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 04a4072db9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…ion tracker Map of every path that assembles input[] for the model, cross-checked against which ones already call drop_orphan_function_calls (all of them should also call drop_orphaned_messages_after_consumed_reasoning): Path Before After prepare_model_input_items done done prepare_input_with_session done done normalize_resumed_input (RunState resume) miss fixed OpenAIServerConversationTracker.prepare miss fixed Changes: - items.py: normalize_resumed_input chains drop_orphaned_messages_after_consumed_reasoning after drop_orphan_function_calls (same pattern as the other call sites) - oai_conversation.py: import + one-line call after drop_orphan_function_calls in OpenAIServerConversationTracker.prepare_input; id() tracking is safe because the function returns items from the input list without copying Tests added: - test_normalize_resumed_input_drops_orphaned_message_after_consumed_reasoning - test_server_conversation_tracker_drops_orphaned_message_after_consumed_reasoning All 155 tests pass.
Resetting consumed_by_call after the first dropped message meant a second orphaned message in the same turn — e.g. [reasoning, function_call, msg1, msg2, function_call_output] — would slip through unchecked. Remove the reset from the message branch entirely. The only correct reset point is _CALL_OUTPUT_TYPES (the call output item), which marks the actual turn boundary. Messages that arrive before any call output while consumed_by_call is True are all orphaned and are all dropped; messages that arrive after the call output (delegate agent, next turn) are unaffected because the flag has already been cleared. Add test_normalize_resumed_input_drops_multiple_orphaned_messages_in_same_turn to cover this case explicitly.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 20da2dce0a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
drop_orphan_function_calls() strips [reasoning, function_call] pairs with no output before message-pruning runs, so a trailing message in [reasoning, function_call, message] (no output) lost its consumed-call context and survived. Swap the order at all four call sites so message pruning sees the full reasoning context, then call-pair removal cleans up what remains. Add a regression test covering the no-output case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 678ce4cfec
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| elif item_type == "message": | ||
| if not consumed_by_call: | ||
| result.append(item) |
There was a problem hiding this comment.
Preserve user messages after consumed tool calls
When the consumed-call state is still set, this branch drops every type == "message" item without checking role. In a resumed/session history that ends with an unanswered reasoning-backed tool call, the next turn's user input is represented as a message immediately after that stale call; this new pass removes the user's follow-up before drop_orphan_function_calls() later cleans up the stale call/reasoning pair, so the next model request can silently lose the user's input. The pruning should be limited to assistant/model messages that can actually be orphaned by consumed reasoning.
Useful? React with 👍 / 👎.
drop_orphaned_messages_after_consumed_reasoning was dropping any type:message item when consumed_by_call was set, including user messages. If a session was interrupted after [reasoning, function_call] (no output) and the user resumed with a new user message appended, that message was silently discarded. Only assistant messages can be orphaned in this context; check role == assistant before dropping. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1f4c4a0ebb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
drop_orphaned_messages_after_consumed_reasoning may remove items and shift positions in the list, making prune_history_indexes (built from pre-filter offsets) misaligned for the subsequent drop_orphan_function_calls call. Fix by snapshotting history item identities before filtering, then rebuilding the index set from the surviving items so positional indexes always correspond to the filtered list. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Problem
When a reasoning-enabled agent hands off to another agent, the model turn can emit
[reasoning, function_call, message]in a single response. Providers such as Azure OpenAItreat the
reasoningitem as consumed by thefunction_call. The trailingmessageitemthen has no paired reasoning and is rejected with HTTP 400:
Item 'msg_...' of type 'message' was provided without its required 'reasoning' item 'rs_...'
The SDK faithfully forwards all three items as
input[]to the next API call, causing everyhandoff in a reasoning-enabled multi-agent pipeline to fail.
What the invalid input looks like
[user question]
[rs_111] ← reasoning item
[fc_222] ← transfer_to_nextagent (consumes rs_111 per provider rules)
[fc_output]
[msg_333] ← orphaned — no paired reasoning → HTTP 400
Fix
Adds
drop_orphaned_messages_after_consumed_reasoning()insrc/agents/run_internal/items.py, called fromprepare_model_input_items()alongsidethe existing
drop_orphan_function_calls().The function uses a simple state machine: a reasoning item is marked fresh when emitted
and consumed when a
function_callfollows it. Any subsequentmessageitem without afresh reasoning partner is dropped before the payload reaches the API.
This is the inverse of
drop_orphan_function_calls(), which removes function callswithout outputs and their preceding reasoning items.
Test
test_handoff_drops_orphaned_message_after_consumed_reasoningintests/test_agent_runner.py:[reasoning, handoff_call, message]input[]for the delegate's turn