Skip to content

PreToolUse "defer" does not end the query #1060

Description

@varunpandya2004

Summary

When a PreToolUse hook returns permissionDecision: "defer" for a model-issued MCP tool call, the SDK does the right thing at the iterator level — the user-visible tool never executes, the iterator terminates, ResultMessage.stop_reason == "tool_deferred", and ResultMessage.deferred_tool_use is populated.

However, before that ResultMessage is emitted, the subprocess sometimes gives the model one additional opportunity to respond. On those runs the model receives what appears to be a synthetic tool-result error for the deferred call and produces a wrap-up AssistantMessage with stop_reason: "end_turn" such as "I called the ping tool, but it appears there was an internal error executing it...". This wrap-up turn is persisted in the session transcript and consumes input/output tokens.

The hooks docs state that returning "defer" "ends the query so you can resume it later". The iterator does end, but the model is allowed one more invocation first, and that invocation produces a misleading "the tool failed" text claim that is now part of the session history visible on resume.

The behavior is non-deterministic across runs with the same prompt, the same scoped matcher, and the same hook. In a 3-run sample we observed the wrap-up turn in ~1/3 of runs; in an earlier 1-run sample we observed it once more. Reproduces identically on both query() and ClaudeSDKClient.

Environment

  • Package: claude-agent-sdk==0.2.102
  • Python: 3.12.10
  • OS: macOS 26.5.1, arm64 (Apple Silicon)
  • Python SDK bundled Claude Code binary: 2.1.178 (Claude Code) at .venv/lib/python3.12/site-packages/claude_agent_sdk/_bundled/claude
  • Model: claude-sonnet-4-5

Relevant API Surface

query(*, prompt: str | AsyncIterable[dict[str, Any]], options: ClaudeAgentOptions | None = None, transport: Transport | None = None)
ClaudeSDKClient.connect(prompt: str | AsyncIterable[dict[str, Any]] | None = None)
ClaudeSDKClient.query(prompt: str | AsyncIterable[dict[str, Any]], session_id: str = "default")

Both interfaces show the same behavior because both consume the same underlying subprocess output stream.

Expected Behavior
For a deferred tool call:

Model emits a tool_use block for the deferred tool.
PreToolUse hook returns permissionDecision: "defer".
Subprocess emits ResultMessage immediately, with stop_reason: "tool_deferred" and deferred_tool_use populated.
No further model invocations occur after the defer — the caller, not the model, decides what happens next (resume, surface to a user, etc.).
Session transcript ends at the deferred tool_use / hook_deferred_tool attachment.
Rationale: the documented purpose of defer is "end the query so you can resume it later". A wrap-up text turn after the defer:

Has no audience (the iterator is about to terminate).
Spends tokens.
Persists misleading "the tool failed" / "internal error" claims into the session transcript that will be visible to the model on resume.
Is non-deterministic across runs, which makes the iterator output stream unpredictable for downstream consumers.
Actual Behavior
Across 4 runs with the same prompt, same single-MCP-tool config, same HookMatcher scoped to that tool, same defer hook:

2 runs: clean halt. Iterator yields AssistantMessage[ToolUseBlock] → ResultMessage. Transcript ends at the hook_deferred_tool attachment.
2 runs: wrap-up turn. Iterator yields AssistantMessage[ToolUseBlock] → AssistantMessage[TextBlock("I called the ping tool, but it appears there was an internal error executing it...")] → ResultMessage. The text turn has stop_reason: "end_turn" in the persisted transcript. The deferred-tool guarantee still holds (EXECUTED == [], stop_reason == "tool_deferred"), but the extra model turn was made and persisted.
Example persisted transcript line for one of the wrap-up-turn runs (hook_deferred_tool attachment is the defer; the subsequent assistant message is the unwanted extra turn):

Transcript lines:
{"type":"attachment","attachment":{"type":"hook_deferred_tool","toolUseID":"toolu_014HVswqAuCjrSsZHX3Y57Bq","toolName":"mcp__ops__ping","toolInput":{},"hookName":"settings","hookEvent":"PreToolUse","permissionMode":"default"}, ...}
{"type":"assistant","message":{"role":"assistant","content":[{"type":"text","text":"I called the ping tool as requested. The tool executed but encountered an internal error on the backend side. The ping tool is a no-op tool (no operation) that takes no arguments, which I called correctly."}],"stop_reason":"end_turn", ...}

The model never received a real tool result (the user-visible tool never executed — that's the deferred-tool guarantee), so the "encountered an internal error" claim is a confabulation driven by whatever synthetic signal the subprocess sends back to the model after defer.

Code for Minimal Reproduction

"""Run this 4-6 times. Roughly half the runs end with an `AssistantMessage`
containing a TextBlock between the deferred ToolUseBlock and the ResultMessage."""
import asyncio
import os
from pathlib import Path

from claude_agent_sdk import (
    ClaudeAgentOptions,
    ClaudeSDKClient,
    create_sdk_mcp_server,
    tool,
)
from claude_agent_sdk.types import (
    AssistantMessage,
    HookMatcher,
    ResultMessage,
    TextBlock,
    ToolUseBlock,
)


PROMPT = "Call the ping tool. It takes no arguments."
EXECUTED: list[str] = []


@tool("ping", "A no-op tool — takes no arguments", {})
async def ping(args):
    EXECUTED.append("ping")
    return {"content": [{"type": "text", "text": "pong"}]}


async def defer_hook(input_data, tool_use_id, context):
    return {
        "hookSpecificOutput": {
            "hookEventName": "PreToolUse",
            "permissionDecision": "defer",
            "permissionDecisionReason": "defer repro",
        }
    }


async def main():
    EXECUTED.clear()
    opts = ClaudeAgentOptions(
        model="claude-sonnet-4-5",
        max_turns=3,
        mcp_servers={"ops": create_sdk_mcp_server("ops", tools=[ping])},
        # Matcher scoped to our tool only — leaving matcher=None also fires the
        # hook on internal ToolSearch calls where defer is silently ignored.
        hooks={
            "PreToolUse": [HookMatcher(matcher="mcp__ops__ping", hooks=[defer_hook])],
        },
    )

    saw_text_after_defer = False
    saw_defer_tool_use = False
    deferred = None
    stop_reason = None

    async with ClaudeSDKClient(options=opts) as client:
        await client.query(PROMPT)
        async for msg in client.receive_response():
            if isinstance(msg, AssistantMessage):
                for block in msg.content:
                    if isinstance(block, ToolUseBlock) and block.name == "mcp__ops__ping":
                        saw_defer_tool_use = True
                    elif isinstance(block, TextBlock) and saw_defer_tool_use:
                        saw_text_after_defer = True
                        print(f"  wrap-up turn after defer: {block.text!r}")
            elif isinstance(msg, ResultMessage):
                deferred = msg.deferred_tool_use
                stop_reason = msg.stop_reason

    print(f"  executed: {EXECUTED}")
    print(f"  stop_reason: {stop_reason!r}")
    print(f"  deferred_tool_use: {deferred}")
    print(f"  wrap-up text turn after defer: {saw_text_after_defer}")


if __name__ == "__main__":
    asyncio.run(main())

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions