Skip to content

fix: Recover prompt loop after internal errors#660

Open
duclvz wants to merge 1 commit into
agentclientprotocol:mainfrom
duclvz:fix/graceful-internal-error
Open

fix: Recover prompt loop after internal errors#660
duclvz wants to merge 1 commit into
agentclientprotocol:mainfrom
duclvz:fix/graceful-internal-error

Conversation

@duclvz
Copy link
Copy Markdown

@duclvz duclvz commented May 14, 2026

Reproduction

Trigger an upstream HTTP 529 (overloaded) from Anthropic during a turn.
Send a prompt:

→ {"id":3,"method":"session/prompt",…}
← {"id":3,"error":{"code":-32603,"message":"…HTTP 529 overloaded…","data":{"errorKind":"overloaded"}}}

The error is reported correctly.
Send any second prompt:

→ {"id":4,"method":"session/prompt",…}
← {"id":4,"result":{"stopReason":"end_turn","usage":{"inputTokens":0,…,"totalTokens":0}}}

It returns end_turn instantly with zero usage instead of running.

Root cause: the failed turn left a trailing session_state_changed: idle in the SDK query stream. The next prompt’s first query.next() consumes that stale idle and short-circuits via the case "session_state_changed" branch with stopReason at its default "end_turn".

A secondary issue: any prompts queued behind the failed turn were resolved with false (handoff semantics), which made them try to continue on the broken stream — and only the first one in the queue was resolved at all, leaking the rest forever.

Solve approach

  • In the catch block, before re-throwing: call query.interrupt(), then drain query.next() until the trailing session_state_changed: idle is consumed. The next prompt now starts against a clean stream.
  • In the finally block, resolve every queued pending prompt with true (cancelled) and clear the map. Each waiting prompt() call returns { stopReason: "cancelled" } so the client can decide what to do next, instead of inheriting a poisoned stream.
  • The error itself is still re-thrown unchanged, so the ACP client receives the same structured RequestError as before.

Tests added under describe("post-error recovery", …):

  • Asserts the next prompt after a thrown error reaches a real result instead of the stale idle.
  • Asserts all queued pending prompts resolve with cancelled and the map is cleared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant