PreToolUse "defer" does not end the query

## Summary

When a `PreToolUse` hook returns `permissionDecision: "defer"` for a model-issued MCP tool call, the SDK does the right thing at the iterator level — the user-visible tool never executes, the iterator terminates, `ResultMessage.stop_reason == "tool_deferred"`, and `ResultMessage.deferred_tool_use` is populated.

However, **before** that `ResultMessage` is emitted, the subprocess sometimes gives the model one additional opportunity to respond. On those runs the model receives what appears to be a synthetic tool-result error for the deferred call and produces a wrap-up `AssistantMessage` with `stop_reason: "end_turn"` such as `"I called the ping tool, but it appears there was an internal error executing it..."`. This wrap-up turn is persisted in the session transcript and consumes input/output tokens.

The hooks docs state that returning `"defer"` "ends the query so you can resume it later". The iterator does end, but the model is allowed one more invocation first, and that invocation produces a misleading "the tool failed" text claim that is now part of the session history visible on resume.

The behavior is **non-deterministic across runs** with the same prompt, the same scoped matcher, and the same hook. In a 3-run sample we observed the wrap-up turn in ~1/3 of runs; in an earlier 1-run sample we observed it once more. Reproduces identically on both `query()` and `ClaudeSDKClient`.

## Environment

- Package: `claude-agent-sdk==0.2.102`
- Python: `3.12.10`
- OS: macOS 26.5.1, `arm64` (Apple Silicon)
- Python SDK bundled Claude Code binary: `2.1.178 (Claude Code)` at `.venv/lib/python3.12/site-packages/claude_agent_sdk/_bundled/claude`
- Model: `claude-sonnet-4-5`

## Relevant API Surface

```text
query(*, prompt: str | AsyncIterable[dict[str, Any]], options: ClaudeAgentOptions | None = None, transport: Transport | None = None)
ClaudeSDKClient.connect(prompt: str | AsyncIterable[dict[str, Any]] | None = None)
ClaudeSDKClient.query(prompt: str | AsyncIterable[dict[str, Any]], session_id: str = "default")
```

Both interfaces show the same behavior because both consume the same underlying subprocess output stream.

Expected Behavior
For a deferred tool call:

Model emits a tool_use block for the deferred tool.
PreToolUse hook returns permissionDecision: "defer".
Subprocess emits ResultMessage immediately, with stop_reason: "tool_deferred" and deferred_tool_use populated.
No further model invocations occur after the defer — the caller, not the model, decides what happens next (resume, surface to a user, etc.).
Session transcript ends at the deferred tool_use / hook_deferred_tool attachment.
Rationale: the documented purpose of defer is "end the query so you can resume it later". A wrap-up text turn after the defer:

Has no audience (the iterator is about to terminate).
Spends tokens.
Persists misleading "the tool failed" / "internal error" claims into the session transcript that will be visible to the model on resume.
Is non-deterministic across runs, which makes the iterator output stream unpredictable for downstream consumers.
Actual Behavior
Across 4 runs with the same prompt, same single-MCP-tool config, same HookMatcher scoped to that tool, same defer hook:

2 runs: clean halt. Iterator yields AssistantMessage[ToolUseBlock] → ResultMessage. Transcript ends at the hook_deferred_tool attachment.
2 runs: wrap-up turn. Iterator yields AssistantMessage[ToolUseBlock] → AssistantMessage[TextBlock("I called the ping tool, but it appears there was an internal error executing it...")] → ResultMessage. The text turn has stop_reason: "end_turn" in the persisted transcript. The deferred-tool guarantee still holds (EXECUTED == [], stop_reason == "tool_deferred"), but the extra model turn was made and persisted.
Example persisted transcript line for one of the wrap-up-turn runs (hook_deferred_tool attachment is the defer; the subsequent assistant message is the unwanted extra turn):


Transcript lines:
{"type":"attachment","attachment":{"type":"hook_deferred_tool","toolUseID":"toolu_014HVswqAuCjrSsZHX3Y57Bq","toolName":"mcp__ops__ping","toolInput":{},"hookName":"settings","hookEvent":"PreToolUse","permissionMode":"default"}, ...}
{"type":"assistant","message":{"role":"assistant","content":[{"type":"text","text":"I called the ping tool as requested. The tool executed but encountered an internal error on the backend side. The ping tool is a no-op tool (no operation) that takes no arguments, which I called correctly."}],"stop_reason":"end_turn", ...}

The model never received a real tool result (the user-visible tool never executed — that's the deferred-tool guarantee), so the "encountered an internal error" claim is a confabulation driven by whatever synthetic signal the subprocess sends back to the model after defer.


Code for Minimal Reproduction

``` python
"""Run this 4-6 times. Roughly half the runs end with an `AssistantMessage`
containing a TextBlock between the deferred ToolUseBlock and the ResultMessage."""
import asyncio
import os
from pathlib import Path

from claude_agent_sdk import (
    ClaudeAgentOptions,
    ClaudeSDKClient,
    create_sdk_mcp_server,
    tool,
)
from claude_agent_sdk.types import (
    AssistantMessage,
    HookMatcher,
    ResultMessage,
    TextBlock,
    ToolUseBlock,
)


PROMPT = "Call the ping tool. It takes no arguments."
EXECUTED: list[str] = []


@tool("ping", "A no-op tool — takes no arguments", {})
async def ping(args):
    EXECUTED.append("ping")
    return {"content": [{"type": "text", "text": "pong"}]}


async def defer_hook(input_data, tool_use_id, context):
    return {
        "hookSpecificOutput": {
            "hookEventName": "PreToolUse",
            "permissionDecision": "defer",
            "permissionDecisionReason": "defer repro",
        }
    }


async def main():
    EXECUTED.clear()
    opts = ClaudeAgentOptions(
        model="claude-sonnet-4-5",
        max_turns=3,
        mcp_servers={"ops": create_sdk_mcp_server("ops", tools=[ping])},
        # Matcher scoped to our tool only — leaving matcher=None also fires the
        # hook on internal ToolSearch calls where defer is silently ignored.
        hooks={
            "PreToolUse": [HookMatcher(matcher="mcp__ops__ping", hooks=[defer_hook])],
        },
    )

    saw_text_after_defer = False
    saw_defer_tool_use = False
    deferred = None
    stop_reason = None

    async with ClaudeSDKClient(options=opts) as client:
        await client.query(PROMPT)
        async for msg in client.receive_response():
            if isinstance(msg, AssistantMessage):
                for block in msg.content:
                    if isinstance(block, ToolUseBlock) and block.name == "mcp__ops__ping":
                        saw_defer_tool_use = True
                    elif isinstance(block, TextBlock) and saw_defer_tool_use:
                        saw_text_after_defer = True
                        print(f"  wrap-up turn after defer: {block.text!r}")
            elif isinstance(msg, ResultMessage):
                deferred = msg.deferred_tool_use
                stop_reason = msg.stop_reason

    print(f"  executed: {EXECUTED}")
    print(f"  stop_reason: {stop_reason!r}")
    print(f"  deferred_tool_use: {deferred}")
    print(f"  wrap-up text turn after defer: {saw_text_after_defer}")


if __name__ == "__main__":
    asyncio.run(main())
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PreToolUse "defer" does not end the query #1060

Summary

Environment

Relevant API Surface

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

PreToolUse "defer" does not end the query #1060

Description

Summary

Environment

Relevant API Surface

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions