Skip to content

fix: #3334 await realtime background tasks during cleanup#3335

Open
Aphroq wants to merge 1 commit into
openai:mainfrom
Aphroq:fix/realtime-await-cancelled-tasks
Open

fix: #3334 await realtime background tasks during cleanup#3335
Aphroq wants to merge 1 commit into
openai:mainfrom
Aphroq:fix/realtime-await-cancelled-tasks

Conversation

@Aphroq
Copy link
Copy Markdown
Contributor

@Aphroq Aphroq commented May 10, 2026

Summary

  • Await cancelled Realtime guardrail and tool-call background tasks during session cleanup before clearing their tracking sets.
  • Keep cleanup exception handling exercised with real asyncio tasks, and add a regression test that verifies cancelled tasks finish their finally blocks before _cleanup() returns.

Related: #1976 previously identified the missing await for Realtime guardrail task cleanup. This PR keeps that direction, also covers Realtime tool-call task cleanup, and adds regression coverage.

Test plan

  • uv run pytest tests/realtime/test_session.py::test_cleanup_awaits_cancelled_background_tasks tests/realtime/test_session_exceptions.py::TestSessionExceptions::test_exception_during_guardrail_processing
  • bash .agents/skills/code-change-verification/scripts/run.sh

Issue number

Closes #3334

Checks

  • I've added new tests (if relevant)
  • I've added/updated the relevant documentation
  • I've run make lint and make format
  • I've made sure tests pass

@Aphroq Aphroq changed the title Await realtime background tasks during cleanup fix: #3334 Await realtime background tasks during cleanup May 10, 2026
@Aphroq Aphroq changed the title fix: #3334 Await realtime background tasks during cleanup fix: #3334 await realtime background tasks during cleanup May 10, 2026
@Aphroq
Copy link
Copy Markdown
Contributor Author

Aphroq commented May 10, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f50cdfb6b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/realtime/session.py Outdated
task.cancel()
self._tool_call_tasks.clear()

await asyncio.gather(*tasks, return_exceptions=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound cleanup waits for cancelled user tasks

When a realtime tool call is running user code that suppresses or delays CancelledError, close()/__aexit__ now waits indefinitely here before closing the model connection or waking event iterators. Tool functions are arbitrary async callables, so a cancellation-resistant tool can make an async with RealtimeSession(...) hang during shutdown; the guardrail cleanup uses the same unbounded gather pattern. Consider bounding the wait or otherwise ensuring cleanup can finish even when a background task does not promptly exit after cancellation.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed Realtime background task cleanup to use a bounded wait: cleanup still cancels guardrail and tool-call background tasks and gives them a short grace period to run finally / cancellation cleanup, but if a task still does not exit within that window, we log a warning and continue session shutdown instead of letting close() / __aexit__ hang indefinitely on user code.

This applies to both guardrail tasks and tool-call tasks, while preserving the existing behavior for already-finished tasks and normally cancellable tasks:

  • Finished tasks are not cancelled again.
  • Tasks that respond to cancellation are still awaited, so their finally blocks can finish before _cleanup() returns.
  • Tasks that suppress or delay CancelledError no longer make cleanup wait forever.
  • After the bounded wait, tracking sets are cleared so model close and event iterator wakeup can continue.

@seratch seratch marked this pull request as draft May 10, 2026 22:07
@Aphroq Aphroq force-pushed the fix/realtime-await-cancelled-tasks branch from 4f50cdf to f04563e Compare May 11, 2026 04:53
@github-actions github-actions Bot added the bug Something isn't working label May 11, 2026
@Aphroq
Copy link
Copy Markdown
Contributor Author

Aphroq commented May 11, 2026

@codex review

@Aphroq Aphroq marked this pull request as ready for review May 11, 2026 06:12
@Aphroq Aphroq force-pushed the fix/realtime-await-cancelled-tasks branch from f04563e to ac75d32 Compare May 11, 2026 14:25
@Aphroq
Copy link
Copy Markdown
Contributor Author

Aphroq commented May 11, 2026

Updated the branch with one more shutdown hardening pass :

  • Guardrail and tool-call background task cleanup now waits concurrently, so the worst case does not stack two cleanup grace periods sequentially.
  • Added an internal _closing guard so late background task callbacks after a cleanup timeout do not write _stored_exception or enqueue RealtimeError events while the session is closing/closed.
  • Added regression coverage for:
    • guardrail/tool-call cleanup waiting concurrently
    • background tasks that fail after the timeout not polluting a closed session

About the timeout: _BACKGROUND_TASK_CLEANUP_TIMEOUT_SECONDS remains a private best-effort cleanup grace period, not public configuration or a completion guarantee. It is only meant to give normally cancellable tasks a chance to run finally / cancellation cleanup while keeping close() / __aexit__ bounded if user code suppresses or delays CancelledError. The previous behavior cancelled tasks and immediately cleared the tracking sets; this keeps shutdown bounded while giving cancellation cleanup a better chance to run.

@seratch seratch removed the bug Something isn't working label May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Realtime cleanup should await cancelled background tasks

2 participants