Skip to content

fix: drop message items orphaned by handoff function calls consuming their reasoning item#3574

Open
utkarshkr100 wants to merge 11 commits into
openai:mainfrom
utkarshkr100:fix/drop-orphaned-message-after-handoff-reasoning
Open

fix: drop message items orphaned by handoff function calls consuming their reasoning item#3574
utkarshkr100 wants to merge 11 commits into
openai:mainfrom
utkarshkr100:fix/drop-orphaned-message-after-handoff-reasoning

Conversation

@utkarshkr100
Copy link
Copy Markdown

Problem

When a reasoning-enabled agent hands off to another agent, the model turn can emit
[reasoning, function_call, message] in a single response. Providers such as Azure OpenAI
treat the reasoning item as consumed by the function_call. The trailing message item
then has no paired reasoning and is rejected with HTTP 400:

Item 'msg_...' of type 'message' was provided without its required 'reasoning' item 'rs_...'

The SDK faithfully forwards all three items as input[] to the next API call, causing every
handoff in a reasoning-enabled multi-agent pipeline to fail.

What the invalid input looks like

[user question]
[rs_111] ← reasoning item
[fc_222] ← transfer_to_nextagent (consumes rs_111 per provider rules)
[fc_output]
[msg_333] ← orphaned — no paired reasoning → HTTP 400

Fix

Adds drop_orphaned_messages_after_consumed_reasoning() in
src/agents/run_internal/items.py, called from prepare_model_input_items() alongside
the existing drop_orphan_function_calls().

The function uses a simple state machine: a reasoning item is marked fresh when emitted
and consumed when a function_call follows it. Any subsequent message item without a
fresh reasoning partner is dropped before the payload reaches the API.

This is the inverse of drop_orphan_function_calls(), which removes function calls
without outputs and their preceding reasoning items.

Test

test_handoff_drops_orphaned_message_after_consumed_reasoning in tests/test_agent_runner.py:

  • Sets up a triage → delegate handoff where the first turn emits [reasoning, handoff_call, message]
  • Asserts the orphaned message is absent from input[] for the delegate's turn
  • Fails before this fix, passes after
  • All 151 existing tests continue to pass

…their reasoning item

When a model turn during a handoff emits [reasoning, function_call, message], providers
such as Azure OpenAI treat the reasoning item as consumed by the function_call. The
trailing message item then has no paired reasoning and is rejected with HTTP 400:

  Item 'msg_...' of type 'message' was provided without its required 'reasoning' item

Add drop_orphaned_messages_after_consumed_reasoning() and call it from
prepare_model_input_items() alongside the existing drop_orphan_function_calls() pass.
The new function tracks whether the most-recent reasoning item has been consumed by a
function_call and drops any subsequent message item that would be left without a partner.

This is the inverse of drop_orphan_function_calls(), which removes function calls
without outputs and their preceding reasoning items.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 737335ccc5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/run_internal/items.py
The previous state machine carried fresh_reasoning=False across all subsequent
turns, incorrectly dropping valid assistant messages from later agents that
legitimately emit responses without a reasoning item.

Replace had_any_reasoning + fresh_reasoning with a single consumed_by_call flag
that is reset to False as soon as the first orphaned message is dropped. This
limits pruning to the one trailing message inside the same handoff turn and
leaves all subsequent turns unaffected.

Add clarifying comments to the test showing that the delegate agent response
(no reasoning) must survive and reach final_output.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4be2bbeb25

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/run_internal/items.py Outdated
…turn bleed

When the handoff turn emits [reasoning, function_call] with no trailing message,
consumed_by_call stayed True and leaked into the next accumulated turn, silently
dropping the delegate agent response.

The SDK appends HandoffOutputItem (function_call_output) after all model output
items, so any orphaned trailing message is dropped before we reach fc_out.
Resetting consumed_by_call at function_call_output therefore scopes the drop to
the current handoff sequence only and keeps all subsequent turns clean.

Add test_handoff_without_trailing_message_keeps_delegate_response to cover
this path explicitly.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b3dcb805cd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/run_internal/items.py Outdated
…on_call_output

computer_call_output, shell_call_output, and other output types were not
resetting consumed_by_call, so a reasoning-backed computer_call with no
trailing message would leak the flag into the next turn and silently drop
the following assistant message.

Extract _CALL_OUTPUT_TYPES = frozenset(_TOOL_CALL_TO_OUTPUT_TYPE.values())
and use it as the reset condition so every call output type is covered.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ddb96218e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/run_internal/items.py Outdated
save_result_to_session() persists raw run items including any orphaned
trailing message from a reasoning handoff turn. On the next Runner.run()
with the same session, prepare_input_with_session() rebuilt history using
only drop_orphan_function_calls(), so the orphaned message was re-sent
to the provider and triggered the same HTTP 400.

Import drop_orphaned_messages_after_consumed_reasoning into
session_persistence.py and call it immediately after drop_orphan_function_calls()
in prepare_input_with_session(), mirroring the existing pattern for
function-call orphan pruning.

Add test_session_history_drops_orphaned_message_on_next_run to verify the
session replay path explicitly.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 702595022f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/run_internal/items.py Outdated
…ion_call/computer_call

custom_tool_call, shell_call, apply_patch_call, local_shell_call, and
tool_search_call were not setting consumed_by_call, so a reasoning item
followed by any of those call types and then a message would still be
sent as [reasoning, <tool_call>, message] and trigger the same provider
400 this PR is fixing.

Replace the hardcoded (function_call, computer_call) tuple with
_TOOL_CALL_TO_OUTPUT_TYPE, which already enumerates every call type
that the runtime tracks and matches to outputs.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 04a4072db9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/run_internal/items.py Outdated
…ion tracker

Map of every path that assembles input[] for the model, cross-checked against
which ones already call drop_orphan_function_calls (all of them should also
call drop_orphaned_messages_after_consumed_reasoning):

  Path                                      Before  After
  prepare_model_input_items                  done    done
  prepare_input_with_session                 done    done
  normalize_resumed_input (RunState resume)  miss    fixed
  OpenAIServerConversationTracker.prepare    miss    fixed

Changes:
- items.py: normalize_resumed_input chains drop_orphaned_messages_after_consumed_reasoning
  after drop_orphan_function_calls (same pattern as the other call sites)
- oai_conversation.py: import + one-line call after drop_orphan_function_calls in
  OpenAIServerConversationTracker.prepare_input; id() tracking is safe because the
  function returns items from the input list without copying

Tests added:
- test_normalize_resumed_input_drops_orphaned_message_after_consumed_reasoning
- test_server_conversation_tracker_drops_orphaned_message_after_consumed_reasoning

All 155 tests pass.
Resetting consumed_by_call after the first dropped message meant a second
orphaned message in the same turn — e.g. [reasoning, function_call, msg1,
msg2, function_call_output] — would slip through unchecked.

Remove the reset from the message branch entirely. The only correct reset
point is _CALL_OUTPUT_TYPES (the call output item), which marks the actual
turn boundary. Messages that arrive before any call output while
consumed_by_call is True are all orphaned and are all dropped; messages
that arrive after the call output (delegate agent, next turn) are
unaffected because the flag has already been cleared.

Add test_normalize_resumed_input_drops_multiple_orphaned_messages_in_same_turn
to cover this case explicitly.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 20da2dce0a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/run_internal/items.py Outdated
drop_orphan_function_calls() strips [reasoning, function_call] pairs
with no output before message-pruning runs, so a trailing message in
[reasoning, function_call, message] (no output) lost its consumed-call
context and survived. Swap the order at all four call sites so message
pruning sees the full reasoning context, then call-pair removal cleans
up what remains.

Add a regression test covering the no-output case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 678ce4cfec

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +229 to +231
elif item_type == "message":
if not consumed_by_call:
result.append(item)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve user messages after consumed tool calls

When the consumed-call state is still set, this branch drops every type == "message" item without checking role. In a resumed/session history that ends with an unanswered reasoning-backed tool call, the next turn's user input is represented as a message immediately after that stale call; this new pass removes the user's follow-up before drop_orphan_function_calls() later cleans up the stale call/reasoning pair, so the next model request can silently lose the user's input. The pruning should be limited to assistant/model messages that can actually be orphaned by consumed reasoning.

Useful? React with 👍 / 👎.

drop_orphaned_messages_after_consumed_reasoning was dropping any
type:message item when consumed_by_call was set, including user messages.

If a session was interrupted after [reasoning, function_call] (no output)
and the user resumed with a new user message appended, that message was
silently discarded. Only assistant messages can be orphaned in this
context; check role == assistant before dropping.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1f4c4a0ebb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/run_internal/session_persistence.py Outdated
drop_orphaned_messages_after_consumed_reasoning may remove items and shift
positions in the list, making prune_history_indexes (built from pre-filter
offsets) misaligned for the subsequent drop_orphan_function_calls call.

Fix by snapshotting history item identities before filtering, then
rebuilding the index set from the surviving items so positional indexes
always correspond to the filtered list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants