Skip to content

Feature request: clear() and runtime resume for warm subprocess reuse #1035

Description

@scdeng

Summary

Request two new control protocol subtypes on an already-connected ClaudeSDKClient subprocess:

  1. clear — reset in-process conversation state without tearing down the subprocess (MCP, permissions, model, env preserved)
  2. Runtime resume — load an existing session's history into the running subprocess (equivalent to CLI --resume, but without disconnect() / connect()), from a SessionStore or local .jsonl

Together these enable a generic subprocess slot pool instead of today's one-subprocess-per-session binding in multi-tenant server deployments.

Distinct from existing work: This is not a request to restore the /clear slash command (#493), nor is it satisfied by SessionStore resume at connect() time (#837) or external history seeding into a fresh client (#848). We need a runtime control plane on a warm subprocess.

Motivation

We run a multi-tenant FastAPI + uvicorn service on Kubernetes. Each pod maintains a SessionPool of long-lived ClaudeSDKClient instances backed by a Postgres SessionStore and distributed leases for cross-pod coordination.

Today's architecture forces one subprocess per active session because:

  1. There is no supported way to clear conversation state on a running subprocess. The interactive CLI's /clear maps to internal clearConversation logic, but SDK mode does not expose it (#493 closed as expected behavior; #120 confirms /clear via query() does not actually reset context).
  2. CLI resume only works at connect() time (ClaudeAgentOptions(resume=...)--resume on subprocess spawn). There is no way to resume a different session on an already-running CLI subprocess. SessionStore (#837) materializes history for a new connection — it does not resume on a warm subprocess.
  3. query(session_id=...) does not isolate context in a single long-lived client (#560), so we cannot multiplex sessions on one subprocess without a real reset primitive.

Idle subprocesses stay locked to their original session and cannot serve new users, causing:

  • Higher memory (many idle subprocesses holding session state + MCP servers)
  • Repeated cold-start cost (subprocess spawn, CLI init, MCP handshake) for new sessions
  • Poor pool utilization in multi-tenant deployments

FastAPI / asyncio note

We are aware that ClaudeSDKClient cannot be used across different asyncio tasks (#576, #925). We already run each pooled client inside a dedicated owner task (queue bridge pattern from #576). This issue does not ask to remove TaskContextError — only for clear / runtime resume operations callable from that owner task on a warm subprocess.

Proposed API

1. ClaudeSDKClient.clear()

Clear conversation history while preserving process-level state (MCP connections, permission mode, model, settings, process env).

# After a conversation ends, reuse the subprocess slot for a new session
await client.clear()
await client.query("New conversation starts here")

Implementation sketch: a new clear control request subtype invoking the same clearConversation path as the interactive CLI /clear command. Based on CLI behavior, this should:

  • Reset in-memory messages to []
  • Generate a new session ID
  • Clear per-session caches (file state, web fetch cache, prompt cache, etc.)
  • Preserve MCP server connections, permission mode, model selection, and process-level env

Optional sugar:

await client.clear(resume="uuid-of-existing-session")  # clear + load in one step

2. Runtime resume (a.k.a. load_session())

Expose CLI resume semantics on a warm subprocess — the same behavior as connect(options=ClaudeAgentOptions(resume=session_id)), but callable mid-flight:

# Option A: explicit resume after clear (slot reuse → restore prior session)
await client.clear()
await client.resume(session_id="uuid-of-existing-session")
await client.query("Continue from where we left off")

# Option B: alias name (either is fine)
await client.load_session(session_id="uuid-of-existing-session")

This is the critical half of the request: clear() alone only starts a blank session; runtime resume restores history on the same subprocess so a pool slot can switch between sessions without cold start.

resume() / load_session() should accept the same sources as connect-time --resume:

  • Local ~/.claude/projects/.../*.jsonl (default)
  • ClaudeAgentOptions.session_store when configured (materialize + inject in-process, no full reconnect)

Emit a SystemMessage(subtype="init", ...) (or equivalent) with the resumed session_id so callers can confirm the active session after load.

Use case: generic subprocess slot pool

Current (per-session binding):
  Session A → Subprocess 1 (locked)
  Session B → Subprocess 2 (locked)
  Session C → cold start (new subprocess)

With clear + runtime resume (generic slot pool):
  Slot 1: serve A → clear() → serve C → clear() → resume(B) → serve B → ...
  Slot 2: serve D → clear() → ...

Pool size becomes bounded by concurrency, not by distinct session count.

What existing issues do NOT solve

Issue What it addresses What it leaves open
#560 Context leak in long-lived client; asks for clear_context No API design, no SessionStore, no pool architecture
#493 /clear unavailable in SDK mode Official answer: use new session ID — still implies reconnect/cold start
#837 SessionStore + resume at connect() No runtime switch on warm subprocess
#848 Seed external history into fresh client No in-process load; no slot reuse
#576 Cross-task hang → TaskContextError No session lifecycle on warm process

Related issues

Environment

  • claude-agent-sdk version: 0.2.87
  • Claude CLI version: 2.1.153
  • Python: 3.12
  • Deployment: FastAPI + uvicorn, multi-pod (Kubernetes), Postgres SessionStore

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions