Skip to content

fix(realtime): coalesce response.create across parallel tool calls#3405

Open
adityasingh2400 wants to merge 1 commit into
openai:mainfrom
adityasingh2400:fix-realtime-parallel-tool-response
Open

fix(realtime): coalesce response.create across parallel tool calls#3405
adityasingh2400 wants to merge 1 commit into
openai:mainfrom
adityasingh2400:fix-realtime-parallel-tool-response

Conversation

@adityasingh2400
Copy link
Copy Markdown
Contributor

Summary

Fixes the "no voice output" failure reported in #1168 where a Realtime turn that contains more than one function_call item completes without the model ever speaking. The root cause is a race in RealtimeSession: when _async_tool_calls=True (the default since #1984), each tool runs in its own background task and each one finishes with RealtimeModelSendToolOutput(start_response=True). The two response.create events the SDK forwards to the OpenAI Realtime API for the same turn collide, the second one comes back with conversation_already_has_active_response, and because that error carries event_id=None the existing recovery path in openai_realtime.py does not clear it. The user hears nothing until the next user turn, which matches the reporter's trace and the follow-up report on #1912.

This change makes the session coalesce response.create per model response_id. RealtimeModelToolCallEvent now carries the response_id it was emitted under (propagated from response.output_item.added/done), and RealtimeModelTurnEndedEvent carries the response_id from response.done. The session tracks pending tool call ids per response: each tool output is sent with start_response=False while other outputs are still in flight, the last completing tool flips start_response=True once turn_ended has been observed, and if every tool happened to finish before turn_ended, the session sends a single response.create raw message itself. Sessions whose models still omit response_id keep the historical "always start a response" behavior.

Test plan

  • make format
  • make lint
  • make typecheck (no new findings in changed files)
  • uv run pytest tests/realtime/ -x (248 passed, including the new TestParallelToolCallCoalescing regression suite)

When the Realtime model emits multiple function_call items in a single
response, each completing tool task previously fired its own
RealtimeModelSendToolOutput(start_response=True). The two
response.create events race the API and the second one is rejected with
conversation_already_has_active_response, so the model never speaks for
the rest of the turn. Track tool calls per response_id and drive a
single response.create from the last completing call (or directly from
turn_ended if all tools finish before the response is done).

Refs openai#1168
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants