Skip to content

fix: sendAndWait checks backgroundTasks before resolving on session.idle#991

Open
PureWeen wants to merge 3 commits intogithub:mainfrom
PureWeen:fix/session-idle-background-tasks
Open

fix: sendAndWait checks backgroundTasks before resolving on session.idle#991
PureWeen wants to merge 3 commits intogithub:mainfrom
PureWeen:fix/session-idle-background-tasks

Conversation

@PureWeen
Copy link
Copy Markdown
Contributor

@PureWeen PureWeen commented Apr 2, 2026

Problem

All four SDK sendAndWait methods (sendAndWait in Node.js/Go, SendAndWaitAsync in .NET, send_and_wait in Python) resolve as soon as any session.idle event is received — ignoring the backgroundTasks field entirely.

When background agents or shell commands are still running, the CLI emits session.idle with a populated backgroundTasks field. The SDK incorrectly treats this as "turn complete," causing premature resolution while work is still in flight.

Impact: Any consumer using sendAndWait with multi-agent or background-shell workflows receives a truncated response — the promise resolves before background agents have finished.

Root Cause

In every SDK, the session.idle handler was unconditional:

Node.js / Go (before):

} else if (event.type === 'session.idle') {
    resolveIdle(); // fires even with active background agents!
}

The backgroundTasks field on SessionIdleData is documented in the schema as:

"Background tasks still running when the agent became idle"

This field was designed precisely to distinguish "foreground idle but background work still running" from "truly done." The old code ignored this distinction entirely.

Evidence

  • SDK schema: backgroundTasks on SessionIdleData explicitly documents agents and shells still running
  • 208 real-world cases from PolyPilot production data: session.idle with active backgroundTasks is never a terminal event — 100% of cases are followed by more work
  • PolyPilot IDLE-DEFER (PR feat(python): add cli_args support to CopilotClientOptions for parity with Node.js SDK #399) implemented the same backgroundTasks check at the app layer as a workaround — proving the approach is correct and battle-tested in production
  • 3 independent model reviews unanimously agreed this is the correct protocol-level fix, not a workaround
  • Mutation testing: reverting the fix causes 2 of 5 new Node.js tests to fail (exit 1)

Fix

Each SDK now checks backgroundTasks before resolving. Only resolves when backgroundTasks is absent or both agents[] and shells[] are empty:

} else if (event.type === 'session.idle') {
    const bgTasks = event.data?.backgroundTasks;
    const hasActiveTasks = bgTasks !== undefined &&
        ((bgTasks.agents?.length ?? 0) > 0 || (bgTasks.shells?.length ?? 0) > 0);
    if (!hasActiveTasks) {
        resolveIdle();
    }
}
SDK File
Node.js nodejs/src/session.ts
.NET dotnet/src/Session.cs
Python python/copilot/session.py
Go go/session.go

Tests

5 Node.js unit tests (nodejs/test/client.test.ts):

  1. ✅ Resolves immediately when session.idle has no backgroundTasks
  2. ✅ Does not resolve when session.idle reports active background agents → bug repro (fails before fix, passes after)
  3. ✅ Does not resolve when session.idle reports active background shells → bug repro (fails before fix, passes after)
  4. ✅ Resolves when session.idle has empty backgroundTasks arrays
  5. ✅ Times out when backgroundTasks never clears — proves the timeout path works for stuck background agents

4 Go unit tests (go/session_test.go, TestSendAndWait_BackgroundTasks):

  • Same 4 behavioral cases, using an in-process fake jsonrpc2 server via io.Pipe

4 Python unit tests (python/test_session_background_tasks.py):

  • Async pytest tests covering the same cases using mock session dispatch

1 .NET E2E test (dotnet/test/SessionTests.cs, SendAndWait_Resolves_After_Clean_SessionIdle):

  • Verifies the clean-idle path resolves correctly against the actual CLI

Documentation

Updated sendAndWait docs in all 4 SDKs to reflect the new contract:

"Waits until the session is fully idle — i.e., session.idle arrives with no active background tasks. If background tasks are stuck, the timeout fires."

Scope & Limitations

This PR fixes the SDK-layer premature resolution bug. Two related issues are out of scope:

References

@PureWeen PureWeen requested a review from a team as a code owner April 2, 2026 20:25
Copilot AI review requested due to automatic review settings April 2, 2026 20:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a cross-SDK protocol handling bug where sendAndWait/SendAndWaitAsync/send_and_wait would resolve on the first session.idle event even when the CLI indicated active backgroundTasks, causing premature “turn complete” resolution in multi-agent/background-shell workflows.

Changes:

  • Update sendAndWait implementations in Node.js, Go, Python, and .NET to only resolve on session.idle when backgroundTasks is absent or empty.
  • Add regression tests across SDKs (Node/Go/Python unit tests + .NET E2E-style replay test) covering active/empty/absent background task scenarios and timeout behavior.
  • Update method documentation to clarify “fully idle” semantics with respect to background tasks.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
nodejs/src/session.ts Gate sendAndWait completion on session.idle having no active backgroundTasks; update JSDoc.
nodejs/test/client.test.ts Add unit regression tests for backgroundTasks-aware sendAndWait resolution + timeout.
python/copilot/session.py Gate send_and_wait completion on session.idle having no active background_tasks; update docstring.
python/test_session_background_tasks.py Add async unit regression tests for Python send_and_wait background tasks behavior.
go/session.go Gate SendAndWait completion on session.idle having no active BackgroundTasks; update docs.
go/session_test.go Add unit regression tests using an in-process fake JSON-RPC server to drive session.idle cases.
dotnet/src/Session.cs Gate SendAndWaitAsync completion on session.idle having no active background tasks; update XML docs.
dotnet/test/SessionTests.cs Add replay-backed test ensuring SendAndWaitAsync resolves on clean session.idle.
test/snapshots/session/sendandwait_resolves_after_clean_sessionidle.yaml Add replay snapshot to support the new .NET session test prompt/response.


import asyncio
import threading
import time
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import: time is imported but never referenced in this test module. Removing it will keep the test file lint/formatter-friendly and avoid future unused-import failures if linting is enabled.

Suggested change
import time

Copilot uses AI. Check for mistakes.
Comment on lines +306 to +310
go func() {
defer close(done)
session.SendAndWait(ctx, MessageOptions{Prompt: "test"}) //nolint:errcheck
resolved.Store(true)
}()
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These subtests ignore the error returned by SendAndWait and still treat the call as “resolved”. If SendAndWait returns early due to an unexpected error (e.g., client stopped / context cancellation), the test could pass/fail for the wrong reason. Capture the returned error in the goroutine and assert it is nil for the success-path cases.

Copilot uses AI. Check for mistakes.
go/session.go Outdated
Comment on lines +160 to +164
// Parameters:
// - options: The message options including the prompt and optional attachments.
// - timeout: How long to wait for completion. Defaults to 60 seconds if zero.
// Controls how long to wait; does not abort in-flight agent work.
// Controls how long to wait; does not abort in-flight agent work. If
// background tasks are stuck the context deadline will fire.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment lists a timeout parameter, but SendAndWait takes (ctx, options) and the timeout behavior is driven by the context deadline (or the default 60s when no deadline is set). Consider updating this parameter description to refer to ctx/context deadlines to avoid misleading API docs.

Copilot uses AI. Check for mistakes.
Comment on lines +360 to +364
go func() {
defer close(done)
session.SendAndWait(ctx, MessageOptions{Prompt: "test"}) //nolint:errcheck
resolved.Store(true)
}()
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goroutine ignores the error returned by SendAndWait. If SendAndWait exits early due to an unexpected error, the test may not report the real failure cause. Capture the error and assert it is nil for this success-path case.

Copilot uses AI. Check for mistakes.
Comment on lines +409 to +412
go func() {
defer close(done)
session.SendAndWait(ctx, MessageOptions{Prompt: "test"}) //nolint:errcheck
}()
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goroutine ignores the error returned by SendAndWait. Even in a “simple resolve” test, asserting the error is nil makes failures easier to diagnose and prevents false positives if SendAndWait returns early due to an error.

Copilot uses AI. Check for mistakes.
Comment on lines +440 to +443
go func() {
defer close(done)
session.SendAndWait(ctx, MessageOptions{Prompt: "test"}) //nolint:errcheck
}()
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goroutine ignores the error returned by SendAndWait. Capturing and asserting the error is nil will make the test robust against unexpected early returns (e.g., context cancellation / client stop) and provide clearer failure output.

Copilot uses AI. Check for mistakes.
PureWeen and others added 2 commits April 2, 2026 15:57
…kground tasks (PolyPilot github#299)

Before this fix all four SDKs' sendAndWait/SendAndWaitAsync/send_and_wait/SendAndWait
resolved immediately on ANY session.idle event, even when backgroundTasks.agents[]
or backgroundTasks.shells[] was non-empty — indicating background agents/shells are
still running and the session is not truly idle.

Fix: only resolve when backgroundTasks is absent or both arrays are empty.

Changed files:
- nodejs/src/session.ts: check bgTasks.agents.length / bgTasks.shells.length
- dotnet/src/Session.cs: check bgTasks.Agents.Length / bgTasks.Shells.Length
- python/copilot/session.py: check len(bg_tasks.agents) / len(bg_tasks.shells)
- go/session.go: check len(bgTasks.Agents) / len(bgTasks.Shells)

Tests: added 4 Node.js unit tests in test/client.test.ts that reproduce the bug
(tests 2+3 fail before fix, pass after):
  - resolves immediately when session.idle has no backgroundTasks (baseline)
  - does NOT resolve when session.idle reports active background agents [bug repro]
  - does NOT resolve when session.idle reports active background shells
  - resolves when session.idle has empty backgroundTasks arrays

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add timeout test: sendAndWait times out when backgroundTasks never clears
  (all 3 review models flagged this gap; test uses 300ms timeout for speed)
- Add Go backgroundTasks unit tests (TestSendAndWait_BackgroundTasks, 4 subtests)
  using io.Pipe fake jsonrpc2 client
- Add .NET SendAndWait_Resolves_After_Clean_SessionIdle test in SessionTests.cs
- Add Python async backgroundTasks tests in test_session_background_tasks.py
- Add YAML snapshot for .NET clean-idle test
- Update sendAndWait docs in all 4 SDKs: clarify that it waits for ALL background
  tasks to complete (not just first session.idle), document timeout behavior for
  stuck background agents

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen force-pushed the fix/session-idle-background-tasks branch from 563c216 to 1589fc7 Compare April 2, 2026 21:01
- Remove unused 'time' import from python/test_session_background_tasks.py
- Update go/session.go SendAndWait doc to reference ctx/context.Context
  instead of a 'timeout' parameter (the method signature takes ctx, not timeout)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants