fix: sendAndWait checks backgroundTasks before resolving on session.idle#991
fix: sendAndWait checks backgroundTasks before resolving on session.idle#991PureWeen wants to merge 3 commits intogithub:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a cross-SDK protocol handling bug where sendAndWait/SendAndWaitAsync/send_and_wait would resolve on the first session.idle event even when the CLI indicated active backgroundTasks, causing premature “turn complete” resolution in multi-agent/background-shell workflows.
Changes:
- Update
sendAndWaitimplementations in Node.js, Go, Python, and .NET to only resolve onsession.idlewhenbackgroundTasksis absent or empty. - Add regression tests across SDKs (Node/Go/Python unit tests + .NET E2E-style replay test) covering active/empty/absent background task scenarios and timeout behavior.
- Update method documentation to clarify “fully idle” semantics with respect to background tasks.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
nodejs/src/session.ts |
Gate sendAndWait completion on session.idle having no active backgroundTasks; update JSDoc. |
nodejs/test/client.test.ts |
Add unit regression tests for backgroundTasks-aware sendAndWait resolution + timeout. |
python/copilot/session.py |
Gate send_and_wait completion on session.idle having no active background_tasks; update docstring. |
python/test_session_background_tasks.py |
Add async unit regression tests for Python send_and_wait background tasks behavior. |
go/session.go |
Gate SendAndWait completion on session.idle having no active BackgroundTasks; update docs. |
go/session_test.go |
Add unit regression tests using an in-process fake JSON-RPC server to drive session.idle cases. |
dotnet/src/Session.cs |
Gate SendAndWaitAsync completion on session.idle having no active background tasks; update XML docs. |
dotnet/test/SessionTests.cs |
Add replay-backed test ensuring SendAndWaitAsync resolves on clean session.idle. |
test/snapshots/session/sendandwait_resolves_after_clean_sessionidle.yaml |
Add replay snapshot to support the new .NET session test prompt/response. |
|
|
||
| import asyncio | ||
| import threading | ||
| import time |
There was a problem hiding this comment.
Unused import: time is imported but never referenced in this test module. Removing it will keep the test file lint/formatter-friendly and avoid future unused-import failures if linting is enabled.
| import time |
go/session_test.go
Outdated
| go func() { | ||
| defer close(done) | ||
| session.SendAndWait(ctx, MessageOptions{Prompt: "test"}) //nolint:errcheck | ||
| resolved.Store(true) | ||
| }() |
There was a problem hiding this comment.
These subtests ignore the error returned by SendAndWait and still treat the call as “resolved”. If SendAndWait returns early due to an unexpected error (e.g., client stopped / context cancellation), the test could pass/fail for the wrong reason. Capture the returned error in the goroutine and assert it is nil for the success-path cases.
go/session.go
Outdated
| // Parameters: | ||
| // - options: The message options including the prompt and optional attachments. | ||
| // - timeout: How long to wait for completion. Defaults to 60 seconds if zero. | ||
| // Controls how long to wait; does not abort in-flight agent work. | ||
| // Controls how long to wait; does not abort in-flight agent work. If | ||
| // background tasks are stuck the context deadline will fire. |
There was a problem hiding this comment.
The doc comment lists a timeout parameter, but SendAndWait takes (ctx, options) and the timeout behavior is driven by the context deadline (or the default 60s when no deadline is set). Consider updating this parameter description to refer to ctx/context deadlines to avoid misleading API docs.
go/session_test.go
Outdated
| go func() { | ||
| defer close(done) | ||
| session.SendAndWait(ctx, MessageOptions{Prompt: "test"}) //nolint:errcheck | ||
| resolved.Store(true) | ||
| }() |
There was a problem hiding this comment.
This goroutine ignores the error returned by SendAndWait. If SendAndWait exits early due to an unexpected error, the test may not report the real failure cause. Capture the error and assert it is nil for this success-path case.
go/session_test.go
Outdated
| go func() { | ||
| defer close(done) | ||
| session.SendAndWait(ctx, MessageOptions{Prompt: "test"}) //nolint:errcheck | ||
| }() |
There was a problem hiding this comment.
This goroutine ignores the error returned by SendAndWait. Even in a “simple resolve” test, asserting the error is nil makes failures easier to diagnose and prevents false positives if SendAndWait returns early due to an error.
go/session_test.go
Outdated
| go func() { | ||
| defer close(done) | ||
| session.SendAndWait(ctx, MessageOptions{Prompt: "test"}) //nolint:errcheck | ||
| }() |
There was a problem hiding this comment.
This goroutine ignores the error returned by SendAndWait. Capturing and asserting the error is nil will make the test robust against unexpected early returns (e.g., context cancellation / client stop) and provide clearer failure output.
…kground tasks (PolyPilot github#299) Before this fix all four SDKs' sendAndWait/SendAndWaitAsync/send_and_wait/SendAndWait resolved immediately on ANY session.idle event, even when backgroundTasks.agents[] or backgroundTasks.shells[] was non-empty — indicating background agents/shells are still running and the session is not truly idle. Fix: only resolve when backgroundTasks is absent or both arrays are empty. Changed files: - nodejs/src/session.ts: check bgTasks.agents.length / bgTasks.shells.length - dotnet/src/Session.cs: check bgTasks.Agents.Length / bgTasks.Shells.Length - python/copilot/session.py: check len(bg_tasks.agents) / len(bg_tasks.shells) - go/session.go: check len(bgTasks.Agents) / len(bgTasks.Shells) Tests: added 4 Node.js unit tests in test/client.test.ts that reproduce the bug (tests 2+3 fail before fix, pass after): - resolves immediately when session.idle has no backgroundTasks (baseline) - does NOT resolve when session.idle reports active background agents [bug repro] - does NOT resolve when session.idle reports active background shells - resolves when session.idle has empty backgroundTasks arrays Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add timeout test: sendAndWait times out when backgroundTasks never clears (all 3 review models flagged this gap; test uses 300ms timeout for speed) - Add Go backgroundTasks unit tests (TestSendAndWait_BackgroundTasks, 4 subtests) using io.Pipe fake jsonrpc2 client - Add .NET SendAndWait_Resolves_After_Clean_SessionIdle test in SessionTests.cs - Add Python async backgroundTasks tests in test_session_background_tasks.py - Add YAML snapshot for .NET clean-idle test - Update sendAndWait docs in all 4 SDKs: clarify that it waits for ALL background tasks to complete (not just first session.idle), document timeout behavior for stuck background agents Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
563c216 to
1589fc7
Compare
- Remove unused 'time' import from python/test_session_background_tasks.py - Update go/session.go SendAndWait doc to reference ctx/context.Context instead of a 'timeout' parameter (the method signature takes ctx, not timeout) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
All four SDK
sendAndWaitmethods (sendAndWaitin Node.js/Go,SendAndWaitAsyncin .NET,send_and_waitin Python) resolve as soon as anysession.idleevent is received — ignoring thebackgroundTasksfield entirely.When background agents or shell commands are still running, the CLI emits
session.idlewith a populatedbackgroundTasksfield. The SDK incorrectly treats this as "turn complete," causing premature resolution while work is still in flight.Impact: Any consumer using
sendAndWaitwith multi-agent or background-shell workflows receives a truncated response — the promise resolves before background agents have finished.Root Cause
In every SDK, the
session.idlehandler was unconditional:Node.js / Go (before):
The
backgroundTasksfield onSessionIdleDatais documented in the schema as:This field was designed precisely to distinguish "foreground idle but background work still running" from "truly done." The old code ignored this distinction entirely.
Evidence
backgroundTasksonSessionIdleDataexplicitly documents agents and shells still runningsession.idlewith activebackgroundTasksis never a terminal event — 100% of cases are followed by more workbackgroundTaskscheck at the app layer as a workaround — proving the approach is correct and battle-tested in productionFix
Each SDK now checks
backgroundTasksbefore resolving. Only resolves whenbackgroundTasksis absent or bothagents[]andshells[]are empty:nodejs/src/session.tsdotnet/src/Session.cspython/copilot/session.pygo/session.goTests
5 Node.js unit tests (
nodejs/test/client.test.ts):session.idlehas nobackgroundTaskssession.idlereports active background agents → bug repro (fails before fix, passes after)session.idlereports active background shells → bug repro (fails before fix, passes after)session.idlehas emptybackgroundTasksarraysbackgroundTasksnever clears — proves the timeout path works for stuck background agents4 Go unit tests (
go/session_test.go,TestSendAndWait_BackgroundTasks):io.Pipe4 Python unit tests (
python/test_session_background_tasks.py):1 .NET E2E test (
dotnet/test/SessionTests.cs,SendAndWait_Resolves_After_Clean_SessionIdle):Documentation
Updated
sendAndWaitdocs in all 4 SDKs to reflect the new contract:Scope & Limitations
This PR fixes the SDK-layer premature resolution bug. Two related issues are out of scope:
session.idleis ephemeral by design — it carriesephemeral: trueand is intentionally never written toevents.jsonl. This is not a bug. A previously filed CLI issue (Headless server: session.idle not written to events.jsonl after subscriber disconnect copilot-cli#2485) based on this assumption has been closed.ResumeSessionAsync. This is a separate CLI feature gap tracked in IDLE-DEFER should re-arm IsProcessing when background tasks are active PureWeen/PolyPilot#403.session.idlesometimes not delivered over live stream — open CLI/SDK investigation in Zero-idle sessions: SDK never emits session.idle event PureWeen/PolyPilot#299.References