diff --git a/sdk/agentserver/PREVIEW-SHARE.md b/sdk/agentserver/PREVIEW-SHARE.md
new file mode 100644
index 000000000000..8a022dd72f14
--- /dev/null
+++ b/sdk/agentserver/PREVIEW-SHARE.md
@@ -0,0 +1,87 @@
+# Agentserver durable preview — share bundle
+
+This branch is a **self-contained preview distribution** of the
+`azure-ai-agentserver-*` durable + Responses/Invocations primitives,
+assembled for internal teams to experiment with. It bundles
+pre-built wheels, runnable **durable** samples, developer guides, and
+copy-into-your-project Copilot skills — no PyPI publish or source build
+required.
+
+> Built off `main`. The package **source** under
+> `azure-ai-agentserver-*/azure/...` is `main`'s — consume the
+> **wheels** below, not the in-tree source.
+
+## What's here
+
+| Path | Contents |
+|------|----------|
+| [`wheels/`](wheels/) | Pre-built `core` / `invocations` / `responses` wheels. Install these. |
+| [`skills/`](skills/) | 4 standalone Copilot skills (durable-task, streaming, invocations, responses). Drop next to your code. |
+| [`azure-ai-agentserver-core/docs/`](azure-ai-agentserver-core/docs/) | Durable-task + streaming developer guides + the `task-and-streaming-spec.md` source-of-truth spec. |
+| [`azure-ai-agentserver-responses/docs/`](azure-ai-agentserver-responses/docs/) | Responses durability + handler-implementation guides + the `responses-durability-spec.md` SOT spec and `durability-contract.md` contract matrix. |
+| `azure-ai-agentserver-responses/samples/` | Durable Responses samples + the `durable-responses-agent-demo`. |
+| `azure-ai-agentserver-invocations/samples/` | Durable Invocations samples + the `durable-agent-demo`. |
+
+Only **durable** samples are included.
+
+## Latest refresh
+
+The bundled wheels carry the current durable + responses fixes:
+
+- **core** — steering fixes: `ctx.pending_input_count` now reflects the live
+  queued-input count (was always `0`); the steering drain transitions the
+  record `suspended→in_progress` so the steered turn runs on hosted (was
+  failing with "lease renewal is only supported for in_progress tasks");
+  plus write-serialization hardening (read-inside-lock, lock-held update
+  primitive, no blind writes). Validated end-to-end on a hosted deployment.
+- **responses** — durable stored streams are created under SSE keep-alive
+  (hosted responses no longer hang `in_progress`).
+- **invocations sample** — `durable-agent-demo` uses `gpt-4o`; `demo-client.sh`
+  auto-resolves the endpoint from your azd env after `azd deploy`.
+
+## Install
+
+```bash
+pip install wheels/*.whl
+```
+
+## Run the crash → recover demo locally
+
+The durable demos run end-to-end against a **hosted** Foundry deployment
+(`azd deploy` the sample, then drive it with `demo-client.sh` — the client
+auto-resolves the endpoint from your azd env). An equivalent **local**,
+file-backed kit (task store + response store on disk, no hosted dependency)
+is also provided for offline experimentation. A ready-to-run, verified kit
+lives at
+**[`azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/`](azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/README.md)**:
+
+```bash
+cd azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local
+./setup.sh                                  # venv from ../../../../wheels
+az login
+export FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project>
+export AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o
+./run.sh                                    # stream -> crash -> recover -> verify
+```
+
+The local switch is two env vars (the kit sets them for you):
+
+```bash
+export AGENTSERVER_TASKS_BACKEND=local
+export AGENTSERVER_DURABLE_ROOT=/tmp/durable   # task + response store
+```
+
+There is an equivalent verified kit for the **invocations** durable demo at
+[`azure-ai-agentserver-invocations/samples/durable-agent-demo/local/`](azure-ai-agentserver-invocations/samples/durable-agent-demo/local/README.md)
+(same `./setup.sh` → `./run.sh` flow).
+
+## Versions
+
+| Wheel | Version |
+|-------|---------|
+| `azure-ai-agentserver-core` | `2.0.0b7` |
+| `azure-ai-agentserver-invocations` | `1.0.0b6` |
+| `azure-ai-agentserver-responses` | `1.0.0b8` |
+
+These are unreleased preview (`bN`) builds. To rebuild the wheels from
+updated source, see [`wheels/build-wheels.sh`](wheels/build-wheels.sh).
diff --git a/sdk/agentserver/azure-ai-agentserver-core/docs/durable-task-guide.md b/sdk/agentserver/azure-ai-agentserver-core/docs/durable-task-guide.md
new file mode 100644
index 000000000000..485a3911e084
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-core/docs/durable-task-guide.md
@@ -0,0 +1,868 @@
+# Durable Tasks — Developer Guide
+
+This is the developer guide for `azure.ai.agentserver.core.durable` —
+the durable-task primitive that turns an `async def` function into a
+crash-resilient unit of agent work.
+
+If your agent needs to survive container crashes, OOM kills, or
+redeployments without losing its place, you want this. If your turn
+of work could plausibly outlive the request that started it (long
+LLM calls, multi-step tool chains, multi-message conversations), you
+want this.
+
+---
+
+## 1. Why
+
+There is **one primitive in two flavours**:
+
+- **`@task`** — *one-shot*. A single durable run of a function.
+  Returns its `Output`, then the record is gone. Use for "do this
+  one thing durably".
+
+- **`@multi_turn_task`** — *chain*. A series of turns sharing a
+  conversation identity (a `task_id`). Each `return X` is one turn;
+  the chain stays alive in between turns and can accept more inputs.
+  Use for chat sessions, agents that work across multiple user
+  messages, durable orchestrations.
+
+Both run the same way under the hood: lease-based crash recovery, a
+single typed input per turn, a `TaskContext` handle, optional retry,
+optional steering (for `multi_turn_task`).
+
+What this primitive solves:
+
+- **Crash survival.** If the process dies mid-call, the next
+  process picks up the same task with the same input and runs the
+  handler again (or, for a chain in `suspended`, the next caller
+  resumes the chain).
+- **Identity.** A `task_id` is the durable name of the work. Two
+  callers naming the same `task_id` don't double-execute — they
+  attach to the same run.
+- **Typed inputs and outputs.** Generic in `Input` and `Output`;
+  the framework persists the input and surfaces the output through
+  a typed handle.
+- **Cooperative cancellation.** The caller can ask the handler to
+  stop; the handler decides how to wind down.
+- **Lightweight, small surface.** A few decorators, a few classes,
+  a handful of exceptions.
+
+What this primitive deliberately does **not** do:
+
+- Deterministic replay. The handler is re-invoked from the top on
+  recovery; effects are your responsibility (use `ctx.metadata`
+  watermarks for at-most-once patterns — see §6).
+- Workflow orchestration (fan-out / fan-in / child workflows). If
+  you want Temporal-style orchestration, use Temporal; you can
+  still wrap durable tasks inside it.
+- A bulk data store. `ctx.metadata` is small and JSON-only;
+  conversation history and big blobs belong in your own storage.
+- A queue. One `task_id` is one logical job — not a competing-consumer
+  pull queue.
+
+---
+
+## 2. Mental model
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         Your code                               │
+│                                                                 │
+│  @task                              @multi_turn_task            │
+│  async def summarize(ctx):          async def chat(ctx):        │
+│      return work(ctx.input)             return reply(ctx.input) │
+│                                                                 │
+│  await summarize.run(input=X)       await chat.run(             │
+│                                         task_id="c1", input=X)  │
+└─────────────────────────────────────────────────────────────────┘
+                              ▲
+                              │   (your async caller)
+                              │
+┌─────────────────────────────────────────────────────────────────┐
+│                      Durable task framework                     │
+│                                                                 │
+│   - persists input + metadata + lease                           │
+│   - invokes your handler with TaskContext                       │
+│   - watches for crashes, reclaims abandoned leases              │
+│   - delivers output via TaskRun.result() / await run            │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│           Task store (hosted or local file-backed)              │
+│                                                                 │
+│   PATCH-with-ETag store of task records:                        │
+│     id, status, lease_owner, payload, attachments, etag         │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### One-shot vs multi-turn — at a glance
+
+|                          | `@task` (one-shot) | `@multi_turn_task` (chain) |
+|--------------------------|--------------------|-----------------------------|
+| Lifetime                 | One run            | Multiple turns, chain stays alive between turns |
+| `task_id` on `.start`    | Optional (auto-gen GUID) | Mandatory |
+| `input_id`               | Defaults to `task_id` (1:1) | Per turn (auto-gen GUID per turn) |
+| Terminal status          | `completed` / `failed` / `cancelled` → record deleted | `suspended` between turns; deleted only via `.delete(task_id)` |
+| `.delete(task_id)`       | Not available (auto-cleans on terminal) | Available — chain-level delete |
+| Handler `return X`       | Finishes the run; `await run.result()` resolves to `X` | Finishes the **turn**; chain goes to `suspended`; caller receives `X` |
+| Steering queue           | n/a                | `steerable=True` opt-in    |
+| Concurrent `.start` on same `task_id` while in-flight | `TaskConflictError` | If `steerable=True`: queued; else `TaskConflictError` |
+
+---
+
+## 3. Hello world
+
+### One-shot
+
+```python
+import asyncio
+from azure.ai.agentserver.core.durable import task, TaskContext
+
+@task(name="summarize")
+async def summarize(ctx: TaskContext[str]) -> str:
+    # ctx.input is typed as str; the framework persisted it before invoking us.
+    return ctx.input.upper()
+
+async def main():
+    # Lifecycle-aware: creates fresh, attaches to in-flight, recovers a
+    # crashed prior lifetime — all automatic. task_id is optional.
+    output: str = await summarize.run(input="hello")
+    print(output)  # 'HELLO'
+
+asyncio.run(main())
+```
+
+### Multi-turn chain
+
+```python
+import asyncio
+from azure.ai.agentserver.core.durable import multi_turn_task, TaskContext
+
+@multi_turn_task(name="chat")
+async def chat(ctx: TaskContext[dict]) -> dict:
+    return {"reply": f"Echo: {ctx.input['msg']}",
+            "input_id": ctx.input_id}
+
+async def main():
+    # Turn 1 — fresh chain.
+    r1 = await chat.run(task_id="conv-7", input={"msg": "hi"})
+    print(r1)  # {"reply": "Echo: hi", "input_id": "<turn-1-guid>"}
+
+    # Turn 2 — same task_id resumes the persisted chain; same handler
+    # is invoked with the new ctx.input.
+    r2 = await chat.run(task_id="conv-7", input={"msg": "what's up?"})
+    print(r2)  # {"reply": "Echo: what's up?", "input_id": "<turn-2-guid>"}
+
+asyncio.run(main())
+```
+
+---
+
+## 4. Concepts
+
+### 4.1 Identifiers
+
+- **`task_id`** — the durable name of the work.
+  - One-shot: optional; the framework generates a GUID when omitted.
+    Two callers passing the same `task_id` for a one-shot **converge**
+    (the second caller either attaches to the first's in-flight run
+    or sees `TaskConflictError` if it has already terminated).
+  - Multi-turn: mandatory; identifies the chain.
+
+- **`input_id`** — the durable name of one input within the chain.
+  - One-shot: defaults to `task_id` (one run, one input — the 1:1
+    invariant).
+  - Multi-turn: per turn; the framework generates a GUID per turn
+    unless the caller supplies one (callers managing their own per-
+    message ids — e.g. chat clients — pass them through).
+
+- **`if_last_input_id="<prev>"`** — an optional precondition on
+  `.start` / `.run`. The framework verifies that the chain's
+  currently-stored last-accepted `input_id` equals `<prev>` before
+  accepting the new input. If a concurrent caller advanced the
+  chain first, the call raises `LastInputIdPreconditionFailed`.
+  Use this when your caller is reasoning about message ordering
+  (HTTP `If-Match`-style optimistic concurrency on the input
+  queue).
+
+### 4.2 Entry mode
+
+The handler can branch on `ctx.entry_mode`:
+
+| Value         | Means                                                      |
+|---------------|------------------------------------------------------------|
+| `"fresh"`     | First invocation for this `(task_id, input_id)`            |
+| `"resumed"`   | This is a subsequent turn of an existing chain (multi-turn)|
+| `"recovered"` | A previous lifetime ran this same `(task_id, input_id)` and didn't finish (lease was abandoned); the framework is re-invoking with the persisted input |
+
+```python
+@multi_turn_task(name="checkpointer")
+async def step(ctx: TaskContext[dict]) -> dict:
+    if ctx.entry_mode == "recovered":
+        # Skip any work we already wrote to ctx.metadata; pick up where we left off.
+        last_done = ctx.metadata.get("last_done_step")
+    ...
+```
+
+### 4.3 Inputs and outputs
+
+The handler signature is `async def fn(ctx: TaskContext[Input]) -> Output`.
+The framework infers `Input` and `Output` from the annotation; the
+typing flows through `task_id.run(input=X) -> Output`.
+
+- **Inputs are persisted before the handler runs.** That is the
+  guarantee crash recovery rests on: a recovered handler is invoked
+  with the same `ctx.input` it would have seen in the lost lifetime.
+- **Outputs are not persisted.** When the handler returns, the
+  value resolves the caller's `await run.result()` — that is the
+  only place it appears. There is no `payload["output"]` and no
+  output attachment to inspect later. If you want to keep a
+  per-turn artifact across crashes, write it through your handler
+  (LangGraph checkpoint, your own DB, etc.) before you return.
+- **Per-input size limit** ≈ 2 MB (after JSON serialization).
+  Larger inputs raise `InputTooLarge` at the caller before any
+  network round-trip. Externalize (blob store + reference) for
+  bigger payloads.
+
+### 4.4 The handler's context (`TaskContext`)
+
+```python
+class TaskContext:
+    input: Input                   # the value the caller passed
+    task_id: str
+    input_id: str                  # per-turn id
+    entry_mode: Literal["fresh", "resumed", "recovered"]
+    metadata: TaskMetadata         # callable namespace facade (see §4.5)
+    retry_attempt: int             # 0 on the first try
+    is_steered_turn: bool          # True iff this turn was promoted from the queue
+    pending_input_count: int       # how many newer turns are queued
+
+    # Cancellation signals — all cooperative.
+    cancel: asyncio.Event          # any-cause cancel
+    cancel_requested: bool         # cause: TaskRun.cancel() was called
+    timeout_exceeded: bool         # cause: per-task timeout fired
+    shutdown: asyncio.Event        # container is shutting down
+
+    async def exit_for_recovery(self) -> None: ...
+```
+
+The first parameter MUST be named `ctx`. The framework binds
+positionally, but it validates the name at decoration time so the
+guide examples and your code stay consistent.
+
+### 4.5 Metadata
+
+`ctx.metadata` is a **callable namespace facade**: small key-value
+state that survives crashes and is visible across turns of a chain.
+Values must be JSON-serializable (the framework exposes the
+`JSONValue` type alias).
+
+```python
+@multi_turn_task(name="agent")
+async def agent(ctx: TaskContext[dict]) -> dict:
+    # Default namespace.
+    ctx.metadata["score"] = 42
+    # Named namespace — auto-vivified.
+    ctx.metadata("billing")["tokens_in"] = 130
+    return {"ok": True}
+```
+
+Names starting with `_` are reserved for the framework and raise
+`ValueError` at write time. Use `ctx.metadata.flush()` if you need
+an explicit at-most-once fence before a side effect.
+
+### 4.6 The result handle (`TaskRun`)
+
+`.start(...)` returns a `TaskRun[Output]`:
+
+```python
+class TaskRun(Generic[Output]):
+    task_id: str
+    input_id: str
+    metadata: TaskMetadata                # live ref while the run is in-flight
+
+    async def result(self) -> Output: ...
+    async def cancel(self) -> None: ...
+    def __await__(self) -> Output: ...    # so `output = await run` works
+```
+
+That is the entire `TaskRun` surface. The framework intentionally
+does **not** expose `.delete`, `.refresh`, `.status`, or
+`.lease_expiry_count` on the handle — for chain-level deletion use
+`MultiTurnTask.delete(task_id)`, and for status inspection consult
+the store directly via the task manager.
+
+### 4.7 Steering (multi-turn only)
+
+Pass `steerable=True` to `@multi_turn_task` to opt into the steering
+queue. With steering on, a `.start` against an in-flight chain
+**queues** the new input rather than raising — the framework
+delivers it as the next turn after the current turn ends.
+
+```python
+@multi_turn_task(name="conv", steerable=True)
+async def conv(ctx: TaskContext[dict]) -> dict:
+    return await llm(ctx.input)
+
+# Mid-conversation steering: user changes their mind 50 ms into turn 1.
+r1 = asyncio.create_task(conv.start(task_id="c1", input={"msg": "Plan a trip to Rome"}))
+await asyncio.sleep(0.05)
+r2 = asyncio.create_task(conv.start(task_id="c1", input={"msg": "Actually, Paris"}))
+# r1 resolves with turn 1's outcome; r2 resolves with turn 2's outcome.
+```
+
+The handler observes `ctx.cancel.is_set()` during turn 1 if there's
+something queued — it can wind down early and let the queued turn
+take over (see §6 "interruptible turns").
+
+### 4.8 Retry
+
+Per-turn (multi-turn) or per-run (one-shot). Configure via the
+decorator:
+
+```python
+from datetime import timedelta
+from azure.ai.agentserver.core.durable import RetryPolicy
+
+@task(
+    name="fetch",
+    retry=RetryPolicy(
+        max_attempts=3,
+        initial_delay=timedelta(seconds=1),
+        max_delay=timedelta(seconds=10),
+        backoff_coefficient=2.0,
+        jitter=True,
+    ),
+)
+async def fetch(ctx: TaskContext[str]) -> bytes: ...
+```
+
+`ctx.retry_attempt` (0-based) is exposed if your handler wants to
+branch. The retry counter resets at every new turn boundary
+(multi-turn) so a new turn starts with a fresh budget.
+
+When the budget is exhausted, the caller sees
+`TaskFailed(error=TaskExhaustedRetriesErrorDict(...))` (vs the
+normal `TaskFailed(error=TaskErrorDict(...))` for a non-retryable
+raise).
+
+`ctx.retry_attempt` is persisted: **crash recovery does NOT consume
+retry budget**. If attempt 2 of 3 crashes mid-flight, the recovered
+handler sees `ctx.retry_attempt == 2` and still has its third
+attempt available — the recovery is not counted as an extra retry.
+
+### 4.9 Cancellation
+
+Cancellation is **cooperative**. The framework never force-stops a
+running handler. The handler observes `ctx.cancel` (an
+`asyncio.Event`) and chooses how to wind down:
+
+- Raise `asyncio.CancelledError` → caller sees `TaskCancelled`.
+- `return X` → caller sees `X` (treated as a normal completion;
+  for multi-turn that's an implicit suspend of the chain).
+- Call `await ctx.exit_for_recovery()` (only valid when
+  `ctx.shutdown` is set) → caller sees `TaskDeferred`; the task
+  stays `in_progress`; the recovery scanner re-invokes the
+  handler in a future process lifetime.
+
+When the handler sees `ctx.cancel.is_set()`, it can branch on
+the cause via the cause-discriminator booleans:
+
+| Trigger                              | `ctx.cancel_requested` | `ctx.timeout_exceeded` | `ctx.shutdown.is_set()` |
+|--------------------------------------|------------------------|------------------------|-------------------------|
+| `await run.cancel()` (caller-cancel) | `True`                 | `False`                | `False`                 |
+| Per-turn `timeout=` watchdog fires   | `False`                | `True`                 | `False`                 |
+| Container graceful shutdown          | `False`                | `False`                | `True`                  |
+
+`ctx.is_steered_turn` and `ctx.pending_input_count` round out the
+steering-observability surface: a steerable handler that sees
+`ctx.cancel.is_set()` AND `ctx.pending_input_count > 0` knows the
+cancel was triggered by a newer turn being queued behind it and
+can choose to wind down early so the next turn gets the lane.
+
+### 4.10 Timeout
+
+Each task can specify a `timeout` on its decorator. The watchdog
+is **per-turn**, **wall-clock**, and **durable**:
+
+- **Per-turn** — the budget resets at every turn boundary
+  (multi-turn) or at the start of each fresh run (one-shot). It is
+  NOT a per-invocation budget; if a recovered handler is re-invoked
+  with the same `ctx.input` after a crash, the timeout starts from
+  the persisted turn-start timestamp — not from the new lifetime's
+  re-invocation.
+- **Wall-clock** — the watchdog uses the persisted turn-start
+  timestamp (UTC) and "now" wall-clock. It survives crashes: a
+  recovered handler that started its turn one minute before a
+  process death and has a 90-second budget gets ~30 seconds before
+  the watchdog fires.
+- **Durable** — the persisted turn-start timestamp means the
+  watchdog's view of "time elapsed" is the same across crashes,
+  so a long-running turn cannot game the budget by triggering
+  recovery to reset its clock.
+
+When the watchdog fires it sets `ctx.cancel` and flips
+`ctx.timeout_exceeded`. The handler decides what to do (see §4.9).
+
+### 4.11 Shutdown
+
+Container shutdown sets `ctx.shutdown` (an `asyncio.Event`) AND
+`ctx.cancel`. The intended handler response is to call
+`await ctx.exit_for_recovery()`, which:
+
+1. Releases the lease without writing a terminal status.
+2. Raises `TaskDeferred` to the caller of `.result()`.
+3. Leaves the task `in_progress` so the next process lifetime's
+   recovery scanner picks it up and re-invokes the handler with
+   the persisted `ctx.input`.
+
+`exit_for_recovery()` is only meaningful during shutdown; calling
+it outside that context is a programming error.
+
+### 4.12 Multi-turn chain deletion
+
+```python
+await chat.delete("conv-7")
+```
+
+Force-removes the chain: cancels any in-flight turn, resolves all
+queued steerer callers with `TaskCancelled`, and deletes the
+record. Idempotent (no-op if the chain is already gone).
+
+---
+
+## 5. Reference
+
+### 5.1 Decorators
+
+```python
+def task(
+    *,
+    name: str,                          # required — used for registration / recovery
+    title: str | None = None,           # static label for telemetry
+    timeout: timedelta | None = None,   # cooperative watchdog
+    retry: RetryPolicy | None = None,   # None = no retry
+) -> Callable[[Handler], Task[Input, Output]]: ...
+
+def multi_turn_task(
+    *,
+    name: str,
+    title: str | None = None,
+    timeout: timedelta | None = None,
+    retry: RetryPolicy | None = None,
+    steerable: bool = False,
+) -> Callable[[Handler], MultiTurnTask[Input, Output]]: ...
+```
+
+Each decorator produces a **distinct class** (`Task` vs
+`MultiTurnTask`) — the type checker enforces "no `.delete()` on
+one-shot" and "multi-turn `get_active_run` takes `(task_id,
+input_id)`" statically.
+
+### 5.2 `Task` (one-shot)
+
+```python
+class Task(Generic[Input, Output]):
+    name: str
+
+    async def run(
+        self, *,
+        input: Input,
+        task_id: str | None = None,
+        input_id: str | None = None,
+        if_last_input_id: str | None = None,
+    ) -> Output: ...
+
+    async def start(
+        self, *,
+        input: Input,
+        task_id: str | None = None,
+        input_id: str | None = None,
+        if_last_input_id: str | None = None,
+    ) -> TaskRun[Output]: ...
+
+    async def get_active_run(self, task_id: str) -> TaskRun[Output] | None: ...
+```
+
+### 5.3 `MultiTurnTask`
+
+```python
+class MultiTurnTask(Generic[Input, Output]):
+    name: str
+
+    async def run(
+        self, *,
+        task_id: str,
+        input: Input,
+        input_id: str | None = None,
+        if_last_input_id: str | None = None,
+    ) -> Output: ...
+
+    async def start(
+        self, *,
+        task_id: str,
+        input: Input,
+        input_id: str | None = None,
+        if_last_input_id: str | None = None,
+    ) -> TaskRun[Output]: ...
+
+    async def get_active_run(
+        self, task_id: str, input_id: str,
+    ) -> TaskRun[Output] | None: ...
+
+    async def delete(self, task_id: str) -> None: ...
+```
+
+### 5.4 `TaskRun[Output]`
+
+```python
+class TaskRun(Generic[Output]):
+    task_id: str
+    input_id: str
+    metadata: TaskMetadata
+
+    async def result(self) -> Output: ...
+    async def cancel(self) -> None: ...
+    def __await__(self) -> Generator[Any, None, Output]: ...
+```
+
+### 5.5 `TaskContext[Input]`
+
+```python
+class TaskContext(Generic[Input]):
+    # Identifiers (read-only).
+    input: Input
+    task_id: str
+    input_id: str
+    entry_mode: EntryMode             # "fresh" | "resumed" | "recovered"
+    retry_attempt: int                # 0 on the first try; survives crash recovery
+
+    # Steering observability (multi-turn).
+    is_steered_turn: bool             # True iff this turn was promoted from the queue
+    pending_input_count: int          # how many newer turns are queued behind this one
+
+    # Cancellation — all cooperative.
+    cancel: asyncio.Event             # any-cause cancel
+    cancel_requested: bool            # cause: TaskRun.cancel() was called
+    timeout_exceeded: bool            # cause: per-turn timeout watchdog fired
+    shutdown: asyncio.Event           # container is shutting down
+
+    # Cross-turn / cross-attempt state.
+    metadata: TaskMetadata
+
+    # Control.
+    async def exit_for_recovery(self) -> None: ...
+```
+
+The handler's first parameter MUST be named `ctx`. The framework
+binds positionally, but it validates the name at decoration time
+so the guide examples and your handler stay consistent.
+
+Read-only enumeration:
+
+- `ctx.input`, `ctx.task_id`, `ctx.input_id`, `ctx.entry_mode`,
+  `ctx.retry_attempt`
+- `ctx.is_steered_turn`, `ctx.pending_input_count`
+- `ctx.cancel`, `ctx.cancel_requested`, `ctx.timeout_exceeded`,
+  `ctx.shutdown`
+- `ctx.metadata`
+- `ctx.exit_for_recovery()`
+
+### 5.6 Exceptions
+
+Public exception taxonomy. Each carries only **new** information the
+caller doesn't already have (the caller already has `task_id` /
+`input_id` from the call site or `TaskRun`).
+
+| Exception | Shape | When it is raised |
+|-----------|-------|-------------------|
+| `TaskFailed` | `error: TaskErrorDict \| TaskExhaustedRetriesErrorDict` | Handler raised; caller of `.result()` / `.run()` sees this. |
+| `TaskCancelled` | bare | Cooperative cancel honoured (handler raised `CancelledError`); per-task timeout watchdog honoured; `MultiTurnTask.delete()` invalidating an in-flight run; queued steerer cancelled before promotion. |
+| `TaskDeferred` | bare | Handler called `ctx.exit_for_recovery()` — the task continues durably; the recovery scanner re-invokes in a future lifetime. |
+| `TaskConflictError` | `current_status: str` | `.start` / `.run` against an in-flight or terminal task that can't accept the call (one-shot in-progress / completed; multi-turn non-steerable in-progress). |
+| `LastInputIdPreconditionFailed` | `actual_last_input_id: str \| None` | `if_last_input_id=` precondition didn't match. |
+| `SteeringQueueFull` | bare | Steering queue at capacity (multi-turn `steerable=True` only). |
+| `InputTooLarge` | bare | Input value exceeds the per-input cap. |
+
+`TaskFailed.error` is one of two `TypedDict`s:
+
+```python
+class TaskErrorDict(TypedDict):
+    type: str            # exception class name, e.g. "ValueError"
+    message: str         # str(exc)
+    traceback: str       # traceback.format_exc()
+
+class TaskExhaustedRetriesErrorDict(TypedDict):
+    type: Literal["exhausted_retries"]
+    attempts: int        # number of attempts made (>= max_attempts)
+    last_error: str
+    last_error_type: str
+    traceback: str
+```
+
+### 5.7 `RetryPolicy`
+
+```python
+class RetryPolicy:
+    initial_delay: timedelta
+    backoff_coefficient: float
+    max_delay: timedelta
+    max_attempts: int
+    retry_on: tuple[type[BaseException], ...] | None
+    jitter: bool
+
+    def __init__(
+        self, *,
+        initial_delay: timedelta = timedelta(seconds=1),
+        backoff_coefficient: float = 2.0,
+        max_delay: timedelta = timedelta(seconds=60),
+        max_attempts: int = 3,
+        retry_on: tuple[type[BaseException], ...] | None = None,
+        jitter: bool = True,
+    ) -> None: ...
+```
+
+Presets: `exponential_backoff(...)`, `fixed_delay(delay, ...)`,
+`linear_backoff(...)`, `no_retry()`.
+
+### 5.8 `TaskMetadata` and `JSONValue`
+
+```python
+JSONValue = Union[
+    str, int, float, bool, None,
+    list[JSONValue],
+    dict[str, JSONValue],
+]
+
+class TaskMetadata:
+    def __getitem__(self, key: str) -> JSONValue: ...
+    def __setitem__(self, key: str, value: JSONValue) -> None: ...
+    def __delitem__(self, key: str) -> None: ...
+    def __contains__(self, key: str) -> bool: ...
+    def __iter__(self) -> Iterator[str]: ...
+    def get(self, key: str, default: JSONValue = None) -> JSONValue: ...
+    def __call__(self, namespace: str) -> TaskMetadata: ...   # sibling ns
+    async def flush(self) -> None: ...                        # at-most-once fence
+```
+
+### 5.9 `EntryMode`
+
+```python
+EntryMode = Literal["fresh", "resumed", "recovered"]
+```
+
+---
+
+## 6. Patterns
+
+### 6.1 Multi-turn agent (the common case)
+
+```python
+@multi_turn_task(name="session_agent")
+async def session_agent(ctx: TaskContext[dict]) -> dict:
+    # ctx.entry_mode is "fresh" on the first turn, "resumed" on
+    # subsequent turns of this conversation.
+    history = ctx.metadata.get("history", [])
+    user_msg = ctx.input["message"]
+    history.append({"role": "user", "content": user_msg})
+
+    reply = await llm.chat(history)
+
+    history.append({"role": "assistant", "content": reply})
+    ctx.metadata["history"] = history
+    return {"reply": reply, "turn": ctx.metadata.get("turn", 0) + 1}
+
+# Turn 1.
+r1 = await session_agent.run(task_id="conv-A", input={"message": "hi"})
+
+# Turn 2 — same task_id resumes the chain; history is preserved.
+r2 = await session_agent.run(task_id="conv-A", input={"message": "what time is it?"})
+```
+
+### 6.2 At-most-once side effects across crashes
+
+```python
+@task(name="charge_card")
+async def charge_card(ctx: TaskContext[dict]) -> str:
+    # Survive recovery: if we already charged in a prior lifetime,
+    # don't double-charge.
+    if ctx.metadata.get("charge_done"):
+        return ctx.metadata["charge_receipt"]
+
+    # Reserve a dedup token before the side effect, flush, then act.
+    ctx.metadata["pending_charge_token"] = generate_uuid()
+    await ctx.metadata.flush()
+
+    receipt = await payment_gateway.charge(
+        ctx.input["card"],
+        ctx.input["amount"],
+        idempotency_key=ctx.metadata["pending_charge_token"],
+    )
+
+    ctx.metadata["charge_done"] = True
+    ctx.metadata["charge_receipt"] = receipt
+    return receipt
+```
+
+### 6.3 Steering — interruptible long turn
+
+```python
+@multi_turn_task(name="thinker", steerable=True)
+async def thinker(ctx: TaskContext[dict]) -> dict:
+    partial = []
+    async for chunk in slow_llm_stream(ctx.input):
+        if ctx.cancel.is_set():
+            # User changed their mind — surface what we have and bow out.
+            return {"interrupted": True, "partial": "".join(partial)}
+        partial.append(chunk)
+    return {"reply": "".join(partial)}
+
+# Turn 1 starts a slow generation.
+r1 = asyncio.create_task(thinker.start(task_id="t1", input={"msg": "long question"}))
+# 50 ms later the user pivots.
+await asyncio.sleep(0.05)
+r2 = asyncio.create_task(thinker.start(task_id="t1", input={"msg": "shorter question"}))
+# r1.result() resolves with {"interrupted": True, ...}; r2 with the answer.
+```
+
+### 6.4 Graceful shutdown — `exit_for_recovery`
+
+```python
+@multi_turn_task(name="long_runner")
+async def long_runner(ctx: TaskContext[dict]) -> dict:
+    for step in plan(ctx.input):
+        if ctx.shutdown.is_set():
+            # Container is going down; defer to the next lifetime.
+            await ctx.exit_for_recovery()      # raises TaskDeferred upstream
+        await do(step)
+    return {"done": True}
+```
+
+The caller awaiting `await run.result()` sees `TaskDeferred`. The
+task record stays `in_progress`; the next lifetime's recovery
+scanner re-invokes the handler with the same `ctx.input` and
+`entry_mode="recovered"`.
+
+### 6.5 Late-join an in-flight run
+
+```python
+# Caller A launched the work…
+run_a = await chat.start(task_id="conv-9", input_id="i1", input={"msg": "hi"})
+
+# … but caller B (different coroutine / different request) wants to
+# attach to the same in-flight turn:
+run_b = await chat.get_active_run("conv-9", "i1")
+if run_b is not None:
+    output = await run_b              # same Output that A sees
+```
+
+`get_active_run` returns `None` when the chain isn't in-flight for
+that exact `(task_id, input_id)` — no retrospective attach to a
+terminated turn.
+
+### 6.6 Optimistic concurrency on the input queue
+
+```python
+prev_input_id = "msg-7"   # what the caller thinks the chain last accepted
+
+try:
+    await chat.run(
+        task_id="conv-2",
+        input_id="msg-8",
+        input={"msg": "next"},
+        if_last_input_id=prev_input_id,
+    )
+except LastInputIdPreconditionFailed as exc:
+    # Concurrent caller advanced the chain to exc.actual_last_input_id;
+    # re-fetch UI state and try again.
+    ...
+```
+
+---
+
+## 7. Operational notes
+
+- **Heartbeats / lease.** The framework holds a lease on the
+  task record while the handler runs and renews it automatically.
+  If the process dies, the lease expires and the recovery scanner
+  reclaims the record on a future process startup.
+- **Recovery is from the persisted input.** A recovered handler is
+  invoked with the same `ctx.input` the lost lifetime saw — not
+  with any new input the caller may now be passing. (A caller's
+  new `.start` against an in-flight record with an expired lease
+  follows the normal lifecycle: rejected for one-shot /
+  non-steerable, queued for `steerable=True` multi-turn.)
+- **Structured failure logs.** Every handler raise emits an
+  ERROR-level event named `durable_task_handler_failure` with
+  `task_id`, `input_id`, `error_type`, `error_message` fields —
+  visible in your observability pipeline whether or not your caller
+  awaited the failed `.result()`.
+- **Storage backends.** The same primitive runs against the hosted
+  task store and against a local file-backed store for development
+  and tests.
+- **Streaming** is a separate primitive in
+  `azure.ai.agentserver.core.streaming` — `await streams.get_or_create(invocation_id)`
+  gives the handler a stream handle. `TaskRun` itself is not
+  iterable.
+
+---
+
+## 8. What This Is NOT
+
+- **Not a deterministic-replay framework.** The handler is re-invoked
+  from the top on recovery; the framework does not record and
+  replay every effect. Determinism across re-invocations is the
+  handler's responsibility — use `ctx.metadata` watermarks for
+  at-most-once patterns (see §6.2).
+- **Not a workflow engine.** No fan-out / fan-in, no child-workflow
+  orchestration, no first-class signals or timers. If you need
+  those, use Temporal / Durable Functions and wrap durable tasks
+  inside them.
+- **Not a bulk data store.** `ctx.metadata` is intentionally small
+  and JSON-only. Persist conversation history, LLM outputs, and
+  big checkpoints through your own storage (LangGraph SqliteSaver,
+  your own DB). Use metadata only for small watermarks and dedup
+  tokens.
+- **Not a queue.** A `task_id` identifies one logical unit of
+  work. If you want competing consumers off a shared queue, use a
+  different primitive.
+
+---
+
+## Quick FAQ
+
+**Q. How do I do "fire and forget"?**
+A. `await task_fn.start(input=...)` — the call returns a `TaskRun`
+handle as soon as the work is registered. You can drop the handle
+and the task runs durably; the next caller can attach via
+`get_active_run(task_id)` if they care about the outcome.
+
+**Q. Can two callers run the same `task_id` concurrently?**
+A. No — `task_id` is the identity. The second caller either attaches
+to the first's in-flight run (one-shot via the lifecycle merge),
+gets queued (multi-turn `steerable=True`), or sees `TaskConflictError`.
+
+**Q. Does the framework retry by default?**
+A. No. Pass `retry=RetryPolicy(...)` to opt in.
+
+**Q. Can I store conversation history in `ctx.metadata`?**
+A. Small histories fit, but `metadata` is intentionally small and
+JSON-only. Use a dedicated checkpointer (LangGraph SqliteSaver,
+your own DB, etc.) for large multi-turn state, and keep `metadata`
+to small watermarks and dedup tokens.
+
+**Q. What if my handler ignores `ctx.cancel`?**
+A. Cooperative cancel is a request; nothing forces the handler to
+stop. If your handler must be interruptible, check
+`ctx.cancel.is_set()` in your loop. `MultiTurnTask.delete(task_id)`
+is the only call that force-cancels: it sets the cancel event AND
+hard-cancels the underlying asyncio task so a non-cooperating
+handler still exits.
+
+**Q. How do I inspect a task's persisted state from outside the handler?**
+A. Consult the task manager's provider directly:
+`await manager.provider.get(task_id)` returns a `TaskInfo` snapshot.
+The decorator's public surface intentionally doesn't expose a
+`.get()` method — read paths go through the provider so the public
+decorator surface stays small and write-shaped.
diff --git a/sdk/agentserver/azure-ai-agentserver-core/docs/streaming-guide.md b/sdk/agentserver/azure-ai-agentserver-core/docs/streaming-guide.md
new file mode 100644
index 000000000000..6d8cd0adf99e
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-core/docs/streaming-guide.md
@@ -0,0 +1,520 @@
+# Streaming guide — `azure.ai.agentserver.core.streaming`
+
+This package gives you one way to **emit events from one coroutine
+and receive them from one or more other coroutines** — typically:
+your `@task` handler produces events, and your HTTP layer fans them
+out to a Server-Sent-Events / WebSocket / long-poll endpoint.
+
+You pick a backing once at app startup, then everywhere else you
+look streams up by id and call `emit` / `subscribe`.
+
+---
+
+## 5-minute getting started
+
+```python
+from azure.ai.agentserver.core.streaming import streams
+
+# 1. At app startup — pick a backing.
+streams.use_in_memory_replay(cursor_fn=lambda ev: ev["n"], ttl_seconds=600)
+
+# 2. The producer (e.g. your @task handler):
+async def produce(stream_id: str) -> None:
+    stream = await streams.get_or_create(stream_id)
+    try:
+        for n in range(5):
+            await stream.emit({"n": n, "msg": f"hello {n}"})
+    finally:
+        await stream.close()
+
+# 3. The subscriber (e.g. your HTTP handler) — attach BEFORE the
+# producer starts (see §Subscribing for why):
+async def consume(stream_id: str) -> None:
+    stream = await streams.get_or_create(stream_id)
+    async for event in stream.subscribe():
+        print(event)
+    # Loop terminates cleanly when the producer calls close().
+```
+
+`streams.get_or_create(id)` is idempotent: the producer and the
+subscriber both call it with the same id and get the **same**
+`EventStream` instance back.
+
+---
+
+## Public surface
+
+Six exports, total:
+
+```python
+from azure.ai.agentserver.core.streaming import (
+    streams,                    # the process-level registry singleton
+    EventStream,                # @runtime_checkable Protocol
+    EventStreamError,           # base exception (catch-all)
+    EventStreamClosedError,     # emit on a closed stream
+    EventStreamNotFoundError,   # any op on an id that isn't currently a live stream
+)
+```
+
+That's it. Obtain stream instances from the registry and program
+against the `EventStream` Protocol.
+
+---
+
+## Choosing a backing
+
+| Backing | Use when | Reconnect / replay? | Survives process restart? | Notes |
+|---|---|---|---|---|
+| `use_in_memory_live()` (default) | Single subscriber that attaches before the producer; lowest memory; you don't need late subscribers to catch up. | No — late subscribers miss earlier events. | No. | Constant memory: only the subscriber list, no event buffer. |
+| `use_in_memory_replay(...)` | Multiple subscribers that may attach at different times; client may reconnect within `ttl_seconds`. | Yes (within the per-event TTL window). | No. | Each event is retained until its TTL elapses (or `delete` runs). |
+| `use_file_backed_replay(...)` | Long-running turns where you need to survive a process crash and a fresh worker resuming the same turn. | Yes. | Yes — events are persisted to `storage_dir / f"{id}.jsonl"` and rehydrated on the next `get_or_create(id)`. | Single-writer-per-file enforced. |
+
+**Call a configurator before you create any streams** (typically
+once at app startup). Later calls only affect streams created
+after the call — streams already in the registry keep their original
+backing. Switching mid-process is supported but discouraged.
+
+### Configurator signatures
+
+```python
+streams.use_in_memory_live() -> None
+
+streams.use_in_memory_replay(
+    *,
+    cursor_fn:    Callable[[Any], int] | None = None,
+    ttl_seconds:  float | None             = None,
+) -> None
+
+streams.use_file_backed_replay(
+    *,
+    storage_dir:  Path,
+    cursor_fn:    Callable[[Any], int] | None       = None,
+    ttl_seconds:  float | None                      = None,
+    serializer:   Callable[[Any], bytes] | None     = None,
+    deserializer: Callable[[bytes], Any] | None     = None,
+) -> None
+```
+
+- **`cursor_fn`** — pass this if you want cursored re-subscription
+  (`subscribe(after=N)`) and a usable `last_cursor()`. It receives
+  each payload and returns an `int` you choose as its cursor (a
+  monotonically increasing sequence number is typical). Without it,
+  `subscribe(after=...)` is silently ignored and `last_cursor()`
+  always returns `None`.
+- **`ttl_seconds`** — per-event retention. Each emitted event becomes
+  evictable `ttl_seconds` after its emit time, regardless of whether
+  the stream is still active. Use this to bound memory / disk usage.
+  Once the stream is closed AND its last retained event has expired
+  AND at least one event was ever emitted, the stream itself
+  transitions to "destroyed" (see §Lifecycle). A stream that was
+  created and closed without ever emitting stays in CLOSED forever
+  (or until `streams.delete(id)`).
+- **`storage_dir`** (file-backed only) — directory that holds one
+  `<id>.jsonl` file per stream. Created if it doesn't exist.
+- **`serializer` / `deserializer`** (file-backed only) — bring your
+  own codec for non-JSON-serializable payloads. Defaults assume the
+  payload is JSON-serializable.
+
+---
+
+## The stream id
+
+A stream id is the identity of a single producer/consumer
+conversation. Pick the per-turn identifier from your framework:
+
+| Context | Use as id |
+|---|---|
+| Inside `azure-ai-agentserver-invocations` | `request.state.invocation_id` (HTTP layer); `ctx.input["invocation_id"]` (handler) |
+| Inside `azure-ai-agentserver-responses` | `response_id` |
+| Bare-Python / custom | Any per-turn `str` you control end-to-end |
+
+**Do NOT use a durable `task_id` as the stream id.** A durable task
+can span multiple turns (steering, recovery). Reusing the id across
+turns means the second turn finds the previous turn's already-closed
+stream and `emit` raises `EventStreamClosedError`. Always scope the
+id to one logical request/turn/invocation.
+
+**File-backed backing only:** because the file-backed backing maps
+the id directly to `<storage_dir>/<id>.jsonl`, the id must be safe
+for use as a single filename — no path separators, no characters
+your filesystem rejects, ideally short. The framework-provided
+`invocation_id` / `response_id` values already satisfy this; if you
+mint your own id, sanitize it.
+
+---
+
+## The `EventStream` Protocol
+
+Every stream — regardless of backing — exposes the same four
+methods:
+
+```python
+class EventStream(Protocol):
+    async def emit(self, payload: Any, *, close: bool = False) -> None: ...
+    async def close(self) -> None: ...
+    def     subscribe(self, *, after: int | None = None) -> AsyncIterator[Any]: ...
+    async def last_cursor(self) -> int | None: ...
+```
+
+### `emit(payload, *, close=False)`
+
+Publishes one event to every currently-attached subscriber.
+
+- `payload` is yours — pass any value compatible with your
+  serializer. For file-backed replay the default expects JSON-
+  serializable values.
+- `close=True` is an **atomic emit-and-close**: the payload is
+  delivered + the stream is closed in one step, with no opportunity
+  to emit again in between. For replay backings, the payload is
+  still retained in history and a late subscriber can see it; for
+  the live backing, late subscribers see neither the payload nor any
+  earlier events.
+- Raises `EventStreamClosedError` if you call `emit` after `close`.
+  This means a producer bug (you should not be emitting any more);
+  HTTP layers should treat this as `5xx`, not a client error.
+- Raises `EventStreamNotFoundError` if the stream has been destroyed.
+
+### `close()`
+
+Marks the stream done. Idempotent — calling it twice (or on a
+destroyed stream) is a no-op, never raises. After `close()`:
+
+- New `emit` calls raise `EventStreamClosedError`.
+- Existing subscriber iterators drain any in-flight events, then
+  exit cleanly with `StopAsyncIteration`.
+- New `subscribe` calls still work as long as the stream hasn't yet
+  been destroyed (for replay backings, they will see the retained
+  history).
+
+### `subscribe(*, after=None)`
+
+Returns an **async iterator** over emitted payloads. **Not** a
+coroutine — call it WITHOUT `await`, use directly in `async for`:
+
+```python
+async for event in stream.subscribe():
+    handle(event)
+```
+
+The iterator terminates cleanly with `StopAsyncIteration` when the
+stream is closed (after draining any in-flight events) **or** when
+the stream is destroyed while you are iterating (whether by
+`streams.delete(id)` or by the auto-transition described in
+§Lifecycle). `subscribe()` itself raises `EventStreamNotFoundError`
+synchronously only if the stream is already destroyed at the time
+you call it.
+
+`after=N` is the **reconnection primitive** — only yield events
+whose cursor is strictly greater than `N`. Requires the active
+backing to have a `cursor_fn`; silently ignored otherwise. See
+§Recovery & resumption.
+
+Multiple subscribers are supported; each gets its own independent
+queue.
+
+### `last_cursor()`
+
+Returns the highest cursor value seen so far, or `None` if no
+events were emitted, or `None` if the active backing has no
+`cursor_fn`. After the stream is closed, this is the last cursor
+the backing saw — even if that event has since expired from
+replay. Raises `EventStreamNotFoundError` if the stream is destroyed.
+
+`last_cursor()` is the producer's recovery primitive: a recovering
+handler reads it to learn "what cursor should I assign to my next
+emit?".
+
+---
+
+## Lifecycle: ACTIVE → CLOSED → (destroyed)
+
+Each stream is **ACTIVE** or **CLOSED**. After CLOSED, the id may
+be destroyed; once destroyed, every operation against it raises
+`EventStreamNotFoundError`.
+
+| State | What it means | How you reach it |
+|---|---|---|
+| **ACTIVE** | Open to `emit`. Subscribable. | Construction (first `get_or_create(id)`). |
+| **CLOSED** | No new emits (`emit` raises `EventStreamClosedError`). Existing subscribers drain. New subscribers can still attach (replay backings) but no new events arrive. | `close()` from ACTIVE. |
+
+Three independent paths into destroyed:
+
+- the id was **never registered** (no `get_or_create(id)` for it ever ran);
+- the id was **explicitly `streams.delete(id)`**d;
+- the id's stream was **Closed** and its close-clock TTL
+  (`close_time + ttl_seconds`) **elapsed** — only applies to replay
+  backings constructed with `ttl_seconds`.
+
+A few practical implications:
+
+- The live backing (`use_in_memory_live`) never auto-destroys — it
+  has no TTL machinery. Call `streams.delete(id)` explicitly if you
+  need to release the id.
+- After `close_time + ttl_seconds`, the id is destroyed — regardless
+  of whether anyone is still subscribed or any retained events are
+  still in the buffer.
+- `last_cursor()` is safe to call during the close window — a
+  recovering handler can always read the last cursor it had seen
+  before close.
+
+---
+
+## The registry
+
+```python
+streams.get(id)            -> EventStream      # raises NotFound for any id that is not currently live
+streams.get_or_create(id)  -> EventStream      # idempotent
+streams.delete(id)         -> None             # idempotent
+```
+
+- `get(id)` returns the registered stream, or raises
+  `EventStreamNotFoundError`. Treat any `NotFound` uniformly:
+  "this id is not a live stream; subscribe to a new id or treat as
+  missing".
+- `get_or_create(id)` is idempotent — every caller using the same
+  id gets the same `EventStream` instance, even from concurrent
+  coroutines. If the id was previously destroyed, a fresh stream is
+  created.
+- `delete(id)` removes the stream and any backing resources (including
+  the on-disk log for file-backed replay). Idempotent — safe to call
+  on an unknown or already-deleted id.
+
+You typically do not need to call `delete(id)` for replay backings
+with `ttl_seconds` configured — the close-clock auto-destroy
+cleans up for you. Call `delete(id)` explicitly when you want
+immediate cleanup (end-of-request hook, test teardown) or for
+backings without `ttl_seconds`.
+
+---
+
+## Exceptions → wire mapping
+
+```text
+EventStreamError                  (base — catch-all)
+├── EventStreamClosedError        producer bug — wire-map to HTTP 5xx
+└── EventStreamNotFoundError      id is not currently a live stream — HTTP 404
+```
+
+Every "this id is not currently a live stream" condition raises
+`EventStreamNotFoundError` (HTTP 404). Treat it uniformly:
+subscribe to a new id, or render the id as missing.
+
+---
+
+## Subscribing — the subscribe-before-start rule
+
+For the **default live backing** (`use_in_memory_live`), subscribers
+only see events emitted after they attach. With the live backing
+"attach" means **`async for` over the iterator has begun (i.e.
+`__aiter__` has run)** — not merely that you've called
+`get_or_create` or `subscribe`. So just calling
+`asyncio.create_task(_serve_sse(stream))` does not guarantee the SSE
+task has actually begun iterating before your producer starts
+emitting — there is a race.
+
+Safe options:
+
+1. **Use a replay backing** (`use_in_memory_replay` or
+   `use_file_backed_replay`). Late subscribers catch up via the
+   retained history, so the race doesn't matter. This is the
+   recommended default for HTTP layers.
+2. **Drive iteration before starting the producer.** Spawn the SSE
+   task, then `await asyncio.sleep(0)` (or any explicit signal from
+   the SSE task that it has started its `async for`) before calling
+   `task.start(...)`. This is harder to get right than option 1; we
+   recommend option 1 unless you have a strong reason to avoid
+   buffering.
+
+Once you've picked your strategy, the canonical pattern is:
+
+1. HTTP layer reads the per-turn id from the request.
+2. HTTP layer calls `await streams.get_or_create(id)` and arranges
+   for a subscriber to be attached (per the strategy above).
+3. HTTP layer starts the producer (e.g. `await task.start(...)`)
+   with the id propagated via input.
+4. Producer also calls `await streams.get_or_create(id)` and gets
+   the same instance.
+
+```python
+# At startup (option 1 — recommended):
+streams.use_in_memory_replay(cursor_fn=lambda ev: ev["n"], ttl_seconds=600)
+
+# HTTP layer
+async def handle_request(request):
+    inv_id = request.state.invocation_id
+
+    stream = await streams.get_or_create(inv_id)          # 1 + 2
+    sse = asyncio.create_task(_serve_sse(stream))         # safe: replay backing
+
+    await my_task.start(
+        task_id=...,
+        input={"invocation_id": inv_id, ...},             # 3
+    )
+    return StreamingResponse(...)
+
+# Handler
+@task
+async def my_task(ctx):
+    inv_id = ctx.input["invocation_id"]
+    stream = await streams.get_or_create(inv_id)          # 4 — same instance
+    await stream.emit({"event": "hello"})
+```
+
+---
+
+## Recovery & resumption
+
+### Cursored reconnect (client side)
+
+If your subscriber drops (network blip, client refresh) and your
+backing has a `cursor_fn`, the client reconnects with the last
+cursor it saw and the SDK only re-delivers later events:
+
+```python
+# Client reconnects with Last-Event-ID: 42
+stream = await streams.get_or_create(stream_id)
+async for event in stream.subscribe(after=42):
+    push_to_client(event)
+```
+
+Events with cursor ≤ 42 are skipped from the retained history;
+delivery resumes at 43.
+
+### Crash-recoverable producer (file-backed)
+
+With `use_file_backed_replay`, a fresh process resuming the same
+turn rehydrates the stream automatically:
+
+```python
+from azure.ai.agentserver.core.streaming import (
+    streams, EventStreamNotFoundError,
+)
+
+streams.use_file_backed_replay(
+    storage_dir=Path("/var/lib/myapp/streams"),
+    cursor_fn=lambda ev: ev["n"],
+    ttl_seconds=3600,
+)
+
+@task
+async def producer(ctx):
+    inv_id = ctx.input["invocation_id"]
+    stream = await streams.get_or_create(inv_id)
+    try:
+        # On crash recovery this is the highest n that made it to disk.
+        last = await stream.last_cursor()
+    except EventStreamNotFoundError:
+        # The previous run closed the stream AND every persisted event
+        # has since expired. The on-disk log is stale; drop it and start
+        # fresh. delete() removes the file and records the deletion;
+        # the next get_or_create() then mints a brand-new stream.
+        await streams.delete(inv_id)
+        stream = await streams.get_or_create(inv_id)
+        last = None
+
+    next_n = (last + 1) if last is not None else 0
+    for n in range(next_n, total):
+        await stream.emit({"n": n, "msg": ...})
+    await stream.close()
+```
+
+The typical recovery scenario — process crashed mid-stream, no
+terminal marker on disk — is handled by the first branch:
+rehydration loads the persisted events, `last_cursor()` returns the
+highest cursor, and the handler resumes emitting from the next
+cursor.
+
+The `EventStreamNotFoundError` branch handles the edge case where the
+previous run completed cleanly (wrote a close marker to disk) AND
+every persisted event has since expired AND your application policy
+is "start over with a fresh stream". Without the explicit
+`delete(id)`, the next `get_or_create(id)` would re-hand-back the
+same expired stream. `delete(id)` lets you mint a fresh one.
+
+### Don't double-track in `@task` metadata
+
+Anti-pattern:
+
+```python
+# Don't do this.
+await stream.emit({"n": n, ...})
+ctx.metadata.set("last_event_n", n)
+await ctx.metadata.flush()
+```
+
+The stream already persisted the event; `last_cursor()` will return
+`n` for you. `ctx.metadata` is for **workflow** watermarks — which
+units of side-effecting work (LLM calls, tool invocations) you've
+already completed — not for mirroring stream state.
+
+---
+
+## HTTP / SSE bridging pattern
+
+Typical helper for serving a stream over Server-Sent-Events:
+
+```python
+import json
+
+from azure.ai.agentserver.core.streaming import EventStreamNotFoundError
+
+async def _serve_sse(stream):
+    """Bridge an EventStream to an SSE wire format."""
+    last_seen: int | None = None
+    try:
+        async for event in stream.subscribe():
+            cursor = event.get("n")
+            yield f"id: {cursor}\ndata: {json.dumps(event)}\n\n".encode()
+            last_seen = cursor
+    except EventStreamNotFoundError:
+        # Server-side cleanup ran while we were attached; tell the
+        # client we're done.
+        yield b"event: gone\ndata: {}\n\n"
+```
+
+If your client sends `Last-Event-ID`, pass it through to
+`stream.subscribe(after=int(last_event_id))` to skip already-delivered
+events.
+
+---
+
+## Bringing your own `EventStream` implementation
+
+You can write your own `EventStream` Protocol impl (e.g. a Redis-
+backed stream). It will be accepted anywhere the Protocol is — the
+`@runtime_checkable` decorator on the Protocol means
+`isinstance(s, EventStream)` works.
+
+**But** don't register your custom impl with the SDK `streams`
+registry — its cleanup is wired to the bundled backings only. Ship
+your own peer registry instead, and let consumers pick which one
+to call:
+
+```python
+class _MyRedisStreams:
+    """Peer namespace to the SDK ``streams`` registry."""
+    def __init__(self, *, redis_url, **opts): ...
+    async def get(self, id: str) -> EventStream: ...
+    async def get_or_create(self, id: str) -> EventStream: ...
+    async def delete(self, id: str) -> None: ...
+
+my_redis_streams = _MyRedisStreams(redis_url="...")
+```
+
+Consumers explicitly choose which registry they want:
+`await my_redis_streams.get_or_create(id)` vs
+`await streams.get_or_create(id)`. The shared interface is the
+`EventStream` Protocol; lifecycle is each registry's own concern.
+
+---
+
+## See also
+
+- [`durable-task-guide.md`](./durable-task-guide.md) — `@task` developer
+  guide; Pattern E shows the streaming integration end-to-end.
+- `samples/durable_streaming/durable_streaming.py` (in this package)
+  — minimal standalone sample.
+- `azure-ai-agentserver-invocations/samples/durable_research/`,
+  `durable_langgraph/`, `durable_copilot/` — HTTP-server samples
+  exercising the registry + per-turn `invocation_id` +
+  subscribe-before-start pattern end-to-end.
diff --git a/sdk/agentserver/azure-ai-agentserver-core/docs/task-and-streaming-spec.md b/sdk/agentserver/azure-ai-agentserver-core/docs/task-and-streaming-spec.md
new file mode 100644
index 000000000000..6314e2124ab3
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-core/docs/task-and-streaming-spec.md
@@ -0,0 +1,4355 @@
+# Durable Task & Streaming Primitives — Design Specification
+
+**Status:** Authoritative, source-of-truth specification.
+**Scope:** The **`@task` durable-task primitive** and the **`streams`
+streaming primitive** in `azure-ai-agentserver-core` — i.e.
+everything that ships under `azure.ai.agentserver.core.durable.*`
+and `azure.ai.agentserver.core.streaming.*`. NOT a spec for the
+rest of the core package (the hosting foundation, middleware,
+logging, tracing, server-side ASGI plumbing, etc. are outside
+this document's scope).
+**Audience:** Implementers building or maintaining these two
+primitives in any language (Python, .NET, …), and contributors
+modifying the canonical Python implementation. Treat this document
+as the only doc a re-implementer needs.
+**Out of scope:** Everything else in `azure-ai-agentserver-core`
+beyond the two named primitives. The `azure-ai-agentserver-responses`
+and `azure-ai-agentserver-invocations` packages. Response-event-stream
+wire shapes. HTTP route plumbing for response APIs. The platform
+itself.
+
+This document is the authoritative single source of truth for the
+two primitives in scope.
+
+It **references** the *Foundry Task Storage Protocol Specification*
+as the authoritative description of the hosted task store's HTTP
+contract (routes, request/response envelopes, server-side merge
+rules, authentication, activation, ETag/CAS, error codes). Where
+this spec talks about wire shape, the framework MUST conform to
+that protocol spec; this spec only describes **how the framework
+uses** the store, plus the framework-reserved keys / conventions
+it layers on top.
+
+---
+
+## Table of contents
+
+### Part I — Orientation
+- §1. Purpose and design goals
+- §2. Non-goals
+- §3. Architecture overview
+- §4. Glossary (forward-reference)
+
+### Part II — Programming model (developer-facing concepts)
+- §5. The durable task primitive
+- §6. Lifecycle and entry mode
+- §7. Identity (`task_id`, `agent_name`, `session_id`, lease owner)
+- §8. Inputs, outputs, and per-input size limit
+- §9. Persistence ownership (framework vs developer)
+- §10. Crash recovery
+- §11. Suspend, resume, and multi-turn
+- §12. Steering primitive
+- §13. Cancellation and cause booleans
+- §14. Timeout (per-turn, cooperative)
+- §15. Retry
+- §16. Shutdown and `exit_for_recovery`
+- §17. Metadata namespaces
+
+### Part III — Storage contract (wire-level)
+- §18. Reference to the Foundry Task Storage Protocol
+- §19. The framework's view of the task record
+- §20. Framework-reserved payload keys
+- §21. Framework-reserved tag and source values
+- §22. Lease structure and ownership semantics (+ §22.1 lease write rules)
+- §23. Attachments and input promotion (+ §23.9 key validation, §23.10 clear-all)
+- §24. Status state machine (+ §24.1 transition matrix, §24.2 terminal immutability, §24.3 delete force semantics)
+- §25. ETag (optimistic concurrency) usage
+- §26. Recovery — internal lifecycle (no public HTTP endpoint)
+
+### Part IV — Provider abstraction (storage backends)
+- §27. `TaskProvider` interface
+- §28. Hosted provider (HTTP)
+- §28a. Field validation (shared between providers)
+- §29. Local provider (file-backed)
+- §30. Provider auto-selection
+- §31. Background loops
+- §31a. List filter parity (internal `list()`)
+
+### Part V — Public API surface (language-agnostic)
+- §32. `task` and `multi_turn_task` decorators
+- §33. `Task` (one-shot) and `MultiTurnTask` (multi-turn) handles
+- §34. `TaskContext`
+- §35. `TaskRun`
+- §35a. Read-only inspection (internal — via the task manager's provider)
+- §36. `TaskRun.result()` returns `Output` directly
+- §37. `TaskMetadata`
+- §38. `RetryPolicy`
+- §39. Error taxonomy
+
+### Part VI — Streaming primitive (peer subpackage)
+- §40. Why streaming is decoupled from `@task`
+- §41. `EventStream` protocol
+- §42. The `streams` registry
+- §43. Stream lifecycle states (Active ↔ Closed; registry tombstones)
+- §44. Concrete backings (live, replay, file-backed)
+- §45. Cursor and `subscribe(after=...)`
+- §46. TTL eviction and the close-clock (replay backings)
+- §47. Streaming error taxonomy
+- §48. Third-party stream-impl pattern
+
+### Part VII — Implementation guidance (algorithms)
+- §49. Cold-start sequence
+- §50. `.start()` lifecycle resolution
+- §51. Steering append (atomic)
+- §52. Steering drain (two-phase)
+- §53. Suspend write
+- §54. Recovery + reclaim
+- §55. Periodic recovery loop
+- §56. Lease renewal loop
+- §57. Per-turn watchdog
+- §58. Orphan attachment cleanup
+
+### Part VIII — Conformance items
+- §59. Conformance items (C-1 … C-N)
+
+### Part IX — References
+- §60. References
+
+### Part X — Appendices (informative)
+- §A. Language-mapping cheat sheet
+- §B. Representative full task record
+- §C. Steering sequence (append → cancel → drain → result)
+- §D. Cold-start recovery sequence
+
+---
+## Part I — Orientation
+
+### §1. Purpose and design goals
+
+The durable-task primitive turns a single async agent function into a
+**crash-resilient, steerable, long-running** unit of work backed by a
+durable task store. It exists to close the gap between:
+
+- **What the platform sees.** A unit of work it can place, restart,
+  liveness-check, and reclaim.
+- **What the application owns.** A plain function the developer writes
+  once, that survives container crashes, OOM kills, redeployments, and
+  cooperative cancellation without hand-rolling lease, heartbeat,
+  checkpoint, recovery, or steering plumbing.
+
+The streaming primitive (`azure.ai.agentserver.core.streaming`) is a
+**peer** to the durable primitive — it does *not* nest under
+`@task`. It exists to give every async producer/consumer pair in the
+agentserver family a single Protocol to program against (in-memory live
+fan-out, in-memory replay with cursor, file-backed crash-recoverable
+replay), independent of whether the producer happens to be a `@task`.
+
+Five design goals constrain every decision in this document:
+
+1. **Single invariant for the durable primitive.** For any given
+   `task_id`, at most one handler runs at a time. Every other behavior
+   falls out of this invariant.
+2. **Crash-recovery is first-class, not a feature.** Every API
+   decision is evaluated against the question "what does this look
+   like after a crash?" A primitive that disappears at the crash
+   boundary (a per-call kwarg, an in-memory listener, a closure-only
+   state) is not acceptable; it must be reified into the durable
+   record or it must be on the developer.
+3. **Cooperative everywhere.** The framework signals; it does not
+   preempt. Cancellation, timeout, and steering all reduce to "set
+   `ctx.cancel`; let the handler decide the terminal shape." Forced
+   teardown belongs to the platform layer, not the primitive.
+4. **Storage shape is the public contract.** The framework writes a
+   structured task record. The shape of that record (which
+   payload keys are reserved, what attachments look like, what tags
+   are stamped) is part of the spec — implementers in other languages
+   MUST produce byte-compatible records so a recovery scan from one
+   process can pick up a task created by another.
+5. **Pay only for what you use.** Streaming is decoupled because
+   handlers that do not stream pay nothing. Attachments are
+   thresholded because small inputs pay only the inline cost.
+   Steering is opt-in because non-steerable tasks pay no queue
+   overhead.
+
+### §2. Non-goals
+
+The primitive is intentionally narrow. The following are explicit
+non-goals — they will NOT be added to the spec without explicit
+re-scoping:
+
+1. **Not deterministic replay.** No record-and-replay of effects.
+   After a crash the handler is re-invoked from the top; only
+   durable state (`ctx.input`, `ctx.metadata`, framework counters)
+   survives. Determinism inside the handler is the developer's
+   responsibility — the standard at-most-once side-effect pattern in
+   §10 covers the common case.
+2. **Not a workflow engine.** No fan-out/fan-in, no child workflows,
+   no signals or timers as first-class primitives. Use Temporal /
+   Durable Functions / Orleans for that — `@task` can live inside
+   such an engine but does not replace it.
+3. **Not a bulk-data store.** `ctx.metadata` is small (tens of KB
+   per namespace; the whole task payload caps at 1 MB). It is a
+   watermark / dedup-token store, not a chat-log store. Per-input
+   payloads up to 2 MB are accepted via the attachments mechanism
+   (§23) but anything larger MUST be externalized by the caller.
+4. **Not a competing-consumer queue.** A `task_id` identifies one
+   logical unit of work owned by one current lifetime. N workers
+   pulling jobs off a shared queue is the wrong fit; use a queue.
+5. **Not multi-process streaming.** The streaming primitive's bundled
+   backings are single-process. A future remote-backed implementation
+   could plug into the same protocol but is out of scope here.
+6. **No exactly-once side-effect guarantee.** The framework provides
+   at-most-once via a developer-issued dedup token (the at-most-once
+   pattern). Anything stronger requires external transactionality.
+7. **Single wire shape.** The framework reads and writes exactly
+   the shapes documented in this spec. The primitive is in private
+   preview; there is no version-skew compatibility to maintain.
+
+### §3. Architecture overview
+
+The framework's runtime decomposes into the following components.
+Boxes are types/objects; arrows show the dominant call direction.
+
+```
+                    ┌──────────────────────────────┐
+                    │       application code        │
+                    │   (user-written @task funcs)  │
+                    └──────────────┬───────────────┘
+                                   │  decorator registration
+                                   ▼
+   ┌─────────────┐    .start /   ┌─────────────────┐    create / get /
+   │   caller    │ ─ .run ────▶  │  Task (handle)  │ ─  update / list  ──▶ ┌──────────────┐
+   │ (HTTP,etc.) │ ◀─ TaskRun ─  │                 │                       │ TaskProvider │
+   └─────────────┘    Output     └─────────┬───────┘                       └──────┬───────┘
+                                            │                                     │
+                                  invokes user fn                          ┌──────┴──────┐
+                                            │                              │ Hosted via  │
+                                            ▼                              │ HTTP +      │
+                                   ┌─────────────────┐                     │ classifier  │
+                                   │   TaskContext   │                     └──────┬──────┘
+                                   │  (ctx.input,    │                            │
+                                   │   ctx.metadata, │                            │
+                                   │   ctx.cancel,…) │                            ▼
+                                   └────────┬────────┘                  ┌──────────────────┐
+                                            │ flush / suspend /         │   Foundry Task   │
+                                            │ exit_for_recovery         │  Storage (HTTP)  │
+                                            ▼                            └──────────────────┘
+                                   ┌─────────────────┐                                ▲
+                                   │   TaskManager   │ ──── lease_renewal_loop ──────┤
+                                   │  (singleton)    │ ──── periodic_recovery_loop ─┤
+                                   │                 │ ──── timeout_watchdog ───────┤
+                                   └─────────────────┘                              │
+                                                                                    │
+                                  ┌────────────────────────────────────────┐        │
+                                  │  Local file provider (dev/test only)   │ ◀──────┘
+                                  │  (~/.durable-tasks/<agent>/<sess>/…)   │
+                                  └────────────────────────────────────────┘
+
+   ┌──────────────────────────────────────────────────────────────────┐
+   │ Streaming subpackage (PEER — not nested under @task)              │
+   │                                                                   │
+   │   ┌───────────────────┐    get_or_create(id)   ┌──────────────┐  │
+   │   │  streams registry │ ──────────────────────▶│  EventStream │  │
+   │   │  (process-level)  │ ◀───────────────────── │  (3 backings)│  │
+   │   └───────────────────┘     delete(id)          └──────┬───────┘  │
+   │            │                                            │         │
+   │            │                              emit / subscribe        │
+   │            ▼                                            ▼         │
+   │  use_in_memory_live() /                       producers /         │
+   │  use_in_memory_replay() /                     consumers           │
+   │  use_file_backed_replay()                                         │
+   └──────────────────────────────────────────────────────────────────┘
+```
+
+**Key relationships:**
+
+- The `Task` handle is the developer-facing object created by the
+  `@task` decorator; the singleton `TaskManager` is the *runtime*
+  that owns the active-task table, the periodic recovery loop, and
+  the provider.
+- The `TaskProvider` is an abstraction over the durable store. Two
+  concrete providers ship: `HostedTaskProvider` (HTTP-backed, used
+  when the platform is detected) and `LocalFileTaskProvider`
+  (JSON-on-disk under `~/.durable-tasks/<agent>/<session>/<task>.json`
+  by default; used otherwise). The framework auto-selects.
+- The `TaskContext` is what the handler receives; it is wired by the
+  manager and exposes both inputs (`input`, `metadata`, `entry_mode`)
+  and signals (`cancel`, `shutdown`, cause booleans).
+- Three background loops run while the manager is up: the periodic
+  recovery scan (default 300s), one lease-renewal loop per active
+  task (half the lease duration), and one timeout watchdog per
+  active execution (when the task declares a timeout).
+- The streaming subpackage is independent. Handlers that want to
+  stream do `await streams.get_or_create(id)` and `emit` / `close`
+  on the returned object; the HTTP layer attaches `subscribe(after=…)`
+  consumers. The framework never touches a stream from the durable
+  path.
+
+### §4. Glossary (forward-referenced)
+
+| Term | Meaning |
+|---|---|
+| **Task** | A unit of durable work, identified by `task_id`, persisted in the task store. |
+| **Lifetime** | One contiguous in-memory execution of a task by a particular process. A task can have multiple lifetimes over its life (each crash starts a new lifetime). |
+| **Turn** | One handler invocation. A fresh task with no resume/recover is one turn. A suspend/resume cycle is two turns. A steering-driven re-entry is the next turn. |
+| **Generation / sequence number** | Monotonic counter inside the steering queue used to derive attachment keys; never reused (see §23). |
+| **Lease** | The fenced ownership record on the task. While a process holds the lease, no other lifetime is allowed to run the task. |
+| **Entry mode** | The framework's signal to the handler about WHY this turn started: `fresh` (first), `resumed` (after suspend or steering drain), `recovered` (previous lifetime crashed). |
+| **Steering** | A new caller `.start()` against an already-running steerable task: the new input is queued, the current turn is cancelled cooperatively, and on the next turn the queued input is consumed. |
+| **Attachment** | Per-task secondary storage slot for values larger than a payload-friendly inline threshold (§23). |
+| **Ref / attachment ref** | The sentinel value the framework writes into `payload` to indicate "this slot has been promoted to `attachments[<key>]`" (§23.3). |
+| **Cause boolean** | A read-only field on `TaskContext` (`timeout_exceeded`, `cancel_requested`) or counter (`pending_input_count`) that explains why `ctx.cancel` was set. |
+| **Promotion** | The framework's act of moving an oversized input from inline `payload` into `attachments`, replacing the inline value with a ref (§23). |
+| **Drain** | Popping a single steering input off the queue and re-entering the handler with it (§52). |
+| **Reclaim** | A different lifetime taking over a task whose lease has expired (§54). |
+
+---
+
+
+## Part II — Programming model
+
+This part is the developer-facing mental model. It is normative for
+behavior visible to handler code, but the *wire-level realization* of
+each concept lives in Part III.
+
+### §5. The durable task primitive
+
+A durable task is created by decorating a single async function:
+
+```
+@task(name="my_task")              # decorator
+async def my_task(ctx) -> Out:     # exactly one parameter: TaskContext[Input]
+    return ...
+```
+
+The decoration registers the function with the process-wide
+descriptor table (consulted at recovery time). The returned object —
+the *task handle* — is what callers invoke (`.run()` / `.start()`).
+
+The framework guarantees one invariant: **for a given `task_id`, at
+most one handler runs at a time in any process owning the active
+lease.** Every higher-level behavior in this spec is derived from
+that invariant.
+
+### §6. Lifecycle and entry mode
+
+The task store records each task in one of four statuses:
+
+| Status | Meaning |
+|---|---|
+| `pending` | Created, not yet picked up by a handler. (Rarely observed by handler code — the framework moves through it atomically.) |
+| `in_progress` | A handler is currently executing this task (or claims to be — a stale lease may need to be reclaimed). |
+| `suspended` | (Multi-turn only.) Handler's turn ended with `return X`; the chain is parked between turns awaiting the next `.run()` / `.start()` to drive the next turn. |
+| `completed` | Terminal. The handler is finished (success, raise, cancel) and will not run again. The *outcome* (success / failure / cancelled) is communicated via the typed exceptions (§39) — **NOT encoded in the status field**. |
+
+Every time the framework invokes the handler, it computes an entry
+mode from the persisted state and exposes it as `ctx.entry_mode`:
+
+| Persisted state at entry | `entry_mode` | What it means |
+|---|---|---|
+| No task / status `pending` | `"fresh"` | First invocation. No prior state. |
+| `suspended` | `"resumed"` | Caller provided new input; resume from where we suspended. |
+| `in_progress` (previous lifetime died) | `"recovered"` | We are the new lifetime; check your watermark. |
+| `in_progress` (steerable, mid-flight, steering drain) | `"resumed"` (with `ctx.is_steered_turn = True`) | Another input was queued; we are the next-turn re-entry. |
+
+The handler is REQUIRED to be safe to enter in any of these modes.
+Branching on `ctx.entry_mode` at the top is the canonical pattern.
+
+`entry_mode` and `is_steered_turn` are orthogonal. The combination
+`(entry_mode="recovered", is_steered_turn=True)` is legal: a previous
+process crashed mid-drain and the recovered handler is taking over.
+
+### §7. Identity
+
+A task is identified by three independent strings:
+
+| Field | Source | Lifetime | Purpose |
+|---|---|---|---|
+| `task_id` | Caller-supplied at `.start()` / `.run()`. | Identical across resume / recovery / steering. | The conversation / unit-of-work key. |
+| `agent_name` | Platform-supplied (env `FOUNDRY_AGENT_NAME`); fallback `"unknown-agent"`. | Fixed per process. | Scoping; multiple agents share a store. |
+| `session_id` | Platform-supplied (env `FOUNDRY_AGENT_SESSION_ID`). | Fixed per process. | Scoping; multiple sessions share an agent. |
+
+The framework derives the **lease owner** string from both
+`agent_name` AND `session_id`:
+
+```
+lease_owner = "<agent_name>|session:<session_id>"
+```
+
+Deriving the owner from BOTH components (not session alone) prevents
+silent cross-agent ownership collisions in topologies where two
+different agents happen to share a session identifier.
+
+Each *process* generates a fresh **instance id** at startup:
+
+```
+lease_instance_id = "worker-<pid>-<rand8hex>-<unix_seconds>"
+```
+
+The `(owner, instance_id)` pair lets recovery distinguish:
+
+- **Same-owner same-instance** = my own running task (renew, do not reclaim).
+- **Same-owner different-instance** = a previous lifetime of mine that
+  is gone (reclaim immediately on cold start; no expiry wait).
+- **Different-owner** = someone else's task; do not touch.
+
+#### `task_id` validation
+
+Implementers MUST reject `task_id` values that:
+
+- Are empty.
+- Exceed 256 characters.
+- Contain characters outside `[a-zA-Z0-9\-_.:]`.
+
+Rejection is at the call site (`.start()` / `.run()` raise) before
+any network is touched.
+
+### §8. Inputs, outputs, and the per-input size limit
+
+A task carries exactly one **input** value at any time — the value
+passed to `.start(input=...)` or `.run(input=...)`. The input is JSON-
+serialized for persistence and is re-hydrated into `ctx.input` on
+every handler entry (fresh, resumed, recovered).
+
+The handler's return value (or the value passed to
+(the handler's `return X`) is the **output**, also JSON-serialized.
+
+| Bound | Limit | Raised as |
+|---|---|---|
+| Per-input maximum size | **2 MB** after JSON serialization, for the function input AND each individual queued steering input. | `InputTooLarge` from `.start()` / `.run()` — pre-network, at the call site. |
+| Concurrent queued steering inputs | **9** | `SteeringQueueFull` from `.start()` against a steerable task whose queue is full. |
+
+Inputs and outputs that fit easily in the inline payload budget stay
+inline. Inputs whose JSON size exceeds a per-channel threshold are
+**promoted** into the task's `attachments` slot transparently —
+developers do not configure or opt in. See §23 for the wire
+mechanism; the per-input ceiling above is the only developer-visible
+limit.
+
+The framework uses JSON canonicalization rules (`sort_keys=True`,
+separators `(",", ":")`) when computing serialized sizes and content
+hashes (§23.6). Implementers MUST use the same canonicalization for
+both, or hashes will not match across implementations.
+
+If the handler's input or output cannot be JSON-serialized (e.g. it
+contains non-JSON-native types), the framework raises before the
+HTTP call. Implementations using a richer model (Pydantic-style)
+SHOULD attempt model-aware serialization (`model_dump`) first.
+
+### §9. Persistence ownership
+
+The framework persists:
+
+- The current `ctx.input` value (inline or as an attachment ref).
+- A snapshot of every touched `ctx.metadata` namespace at every
+  terminal-of-turn boundary (suspend, complete, cancel, raise,
+  steering drain, `exit_for_recovery`) and at every explicit
+  `metadata.flush()` call.
+- Lifecycle counters: `retry_attempt`, `recovery_count` (the
+  `expiry_count` of the lease record), `_last_input_id` (the
+  optional caller-provided chain head — see §11).
+- A per-turn `_turn_started_at` ISO-8601 UTC timestamp used by the
+  watchdog (§14) to compute remaining budget across crashes.
+- Steering state (`pending_inputs` queue, `cancel_requested`,
+  `drain_in_progress`, `active_input`, `next_input_seq`) for
+  steerable tasks (§12).
+- The handler's terminal outcome: a structured `error` dict on
+  failure (when persisted by the layer above the primitive),
+  `suspension_reason` on suspend. The handler's `return X` value
+  is NOT persisted in the record — it resolves the in-process
+  caller's `TaskRun.result()` future and is then no longer
+  reachable from the persisted record.
+
+The framework does NOT persist:
+
+- Handler-local variables.
+- In-memory closures over the handler's body.
+- Caller-provided callbacks or futures (those are bound to a single
+  lifetime; a crash discards them).
+- Streaming events (those live in the streaming subpackage, which has
+  its own backings; see Part VI).
+- Any bulk data the developer chooses to compute. The developer is
+  responsible for that — typically through a sibling framework
+  (LangGraph checkpoint, custom DB, blob storage) with only a small
+  reference token in `ctx.metadata`.
+
+The dividing line is "what does the framework need to decide
+`entry_mode` and reproduce `ctx`?" — that is what it persists; nothing
+more.
+
+### §10. Crash recovery
+
+Recovery is **framework-managed**. There is no developer-tunable
+threshold and no opt-in.
+
+**When recovery happens:**
+
+1. **Cold start** of a new process. The manager's `startup()` scans
+   the task store for tasks owned by `(agent_name, session_id)`
+   whose lease has expired OR whose lease is owned by a different
+   instance of the same owner (a previous dead lifetime). Each is
+   reclaimed inline.
+2. **Periodic scan.** While the manager is up, a background loop
+   re-runs the same scan every 300 seconds (default; see §31). This
+   catches tasks that became reclaimable AFTER cold start — typically
+   leases that expired during this process's lifetime because a sibling
+   process died.
+3. **Inline reclaim.** When a caller `.start()`s a `task_id` whose
+   current record shows an `in_progress` status with an expired or
+   foreign-instance lease, the lifecycle resolver reclaims it inline
+   (no waiting for the periodic scan).
+
+**What recovery does:**
+
+The reclaiming process:
+
+1. Issues a PATCH that re-takes the lease atomically: new
+   `lease_owner` (always self), new `lease_instance_id` (always
+   self), new `lease_expires_at`, bumps the lease's `expiry_count` IF the
+   previous lease had actually expired (not bumped for same-owner
+   dead-instance handoff). This PATCH MUST be guarded by the read
+   `etag` for CAS safety.
+2. Reads the (now self-owned) record, looks up the registered
+   resume callback by `source.name` (§21), invokes the handler
+   with `ctx.entry_mode="recovered"` and the persisted `ctx.input`
+   re-hydrated.
+3. From the handler's perspective, the recovery looks identical to
+   a fresh entry except that `entry_mode == "recovered"` and any
+   `ctx.metadata` writes from the previous lifetime are already
+   present.
+
+**Crash-recovery does NOT consume the retry budget** (§15). A
+lifetime that died before the handler raised does not advance
+`retry_attempt`.
+
+**Pattern — at-most-once side effect across recovery:**
+
+```python
+if ctx.metadata.get("dedup_token") is None:
+    token = uuid4().hex
+    ctx.metadata["dedup_token"] = token
+    await ctx.metadata.flush()      # fence
+    await do_side_effect(idempotency_key=token)
+# crash-recovered lifetimes re-issue the call with the SAME token,
+# letting the downstream system de-dupe.
+```
+
+This pattern is the standard answer to "I crashed mid-effect; how
+do I avoid duplicate effects?" The framework does NOT provide
+exactly-once semantics — the developer issues the dedup token and
+fences it before the effect.
+
+### §11. Suspend, resume, and multi-turn
+
+Multi-turn chains end every turn with a bare `return X` from the
+handler. The framework treats this **return-is-implicit-suspend**:
+
+1. Transitions the stored status from `in_progress` to `suspended`
+   with `suspension_reason="run_completion"`.
+2. Persists a snapshot of every touched metadata namespace.
+3. Does NOT persist `X` anywhere in the task record. `X` resolves
+   the caller's `await run.result()` in-process and is then gone.
+4. Clears `payload["input"]` (and the corresponding attachment if
+   the input was promoted) — the consumed input is no longer needed
+   and would inflate the next payload write.
+5. Clears `_steering["active_input"]` (mechanism state lives, but
+   the consumed input value goes).
+6. Clears `payload["_retry_attempt"]` so the next turn starts with
+   a fresh retry budget.
+7. Preserves `payload["_last_input_id"]` so the next
+   `if_last_input_id` precondition can be evaluated.
+
+The caller's `await run.result()` resolves to `X` directly (typed
+as the handler's `Output`). No wrapper class.
+
+The next `.run(task_id=same, input=new)` or
+`.start(task_id=same, input=new)` transitions the status back to
+`in_progress` and re-invokes the handler with
+`ctx.entry_mode="resumed"`, `ctx.input=new`, and `ctx.metadata`
+re-hydrated.
+
+The same machinery is what multi-turn conversations and
+human-in-the-loop approval flows ride.
+
+One-shot tasks do NOT use this mechanism. A one-shot `@task`
+handler's `return X` is a terminal completion: the framework
+resolves the caller's `.result()` with `X` and then deletes the
+record (one-shot is always ephemeral).
+
+#### Multi-turn raise semantics
+
+If a multi-turn handler RAISES (an unhandled exception other than
+`asyncio.CancelledError`), the chain still transitions to
+`suspended` (NOT `completed` / `failed`) so subsequent turns can
+continue:
+
+1. Transitions to `suspended` with
+   `suspension_reason="run_completion"`.
+2. NO `payload["error"]` is written — the chain record does not
+   carry the per-turn failure diagnostic.
+3. The framework emits a structured ERROR log named
+   `durable_task_handler_failure` with `task_id`, `input_id`,
+   `error_type`, `error_message`.
+4. The caller's `await run.result()` raises
+   `TaskFailed(error=TaskErrorDict(...))`.
+5. Queued steerers (multi-turn `steerable=True`) promote per §12:
+   the next queued input becomes the next turn's input, and the
+   handler re-invokes with `ctx.entry_mode="resumed"`,
+   `ctx.is_steered_turn=True`.
+
+#### Chain identity: `input_id` and `if_last_input_id`
+
+Both `.run()` and `.start()` accept two optional keyword arguments
+that thread caller-supplied chain identity through the persisted
+record:
+
+- **`input_id`** — record-only. The framework writes
+  `payload["_last_input_id"] = input_id` after accepting the input;
+  no precondition is checked.
+- **`if_last_input_id`** — precondition. The framework requires the
+  stored `_last_input_id` to equal `if_last_input_id` (the
+  predecessor the caller claims to be extending). Mismatch raises
+  `LastInputIdPreconditionFailed(actual_last_input_id=<stored>)`.
+
+For multi-turn, `input_id` is the per-turn identity. For one-shot,
+`input_id` defaults to `task_id` (the 1:1 invariant `task_id ==
+input_id`).
+
+Implementations MUST reject `if_last_input_id` provided without
+`input_id` (`TypeError` at the call site). The pair is orthogonal:
+`input_id` alone is idempotency / chain-head tracking;
+`(input_id, if_last_input_id)` together is HTTP-`If-Match`-style
+chain extension.
+
+### §12. Steering primitive
+
+`@multi_turn_task(steerable=True)` upgrades a multi-turn chain from
+"one turn at a time" to "callers can queue a new input while a turn
+is mid-flight."
+
+Steering is exclusive to multi-turn chains. One-shot `@task` does
+not support steering (the one-shot lifecycle is one input one run);
+`@multi_turn_task` without `steerable=True` accepts concurrent
+`.start` calls only as `TaskConflictError`.
+
+#### What `.start()` does on an in-flight steerable chain
+
+`.start(task_id=<chain-id>, input=NEW)` against an in-flight
+steerable chain:
+
+1. The new input is **queued** at the tail of an internal
+   pending-inputs FIFO.
+2. The cancel signal is raised on the currently-executing turn —
+   `ctx.cancel.is_set()` becomes True for the handler that is
+   running right now. `ctx.pending_input_count` flips from 0 to
+   the live backlog size.
+3. A new `TaskRun` handle is returned to the caller. Its
+   `.result()` resolves with **whatever the next turn emits** —
+   the caller is the *steerer* of the next turn.
+
+If the steering queue is at its cap (9), `.start()` raises
+`SteeringQueueFull`.
+
+#### What the first turn's caller sees
+
+The first turn's caller observes the natural multi-turn outcome of
+the in-flight turn:
+
+| Handler ends turn 1 with... | First caller's `await run.result()` |
+|---|---|
+| `return X` (clean return) | Resolves with `X` (typed as `Output`). The chain transitions to `suspended` (return-is-implicit-suspend). The framework then promotes the queued steering input as the next turn. |
+| `raise SomeError` (non-CancelledError) | Raises `TaskFailed(error=...)`. The chain stays alive in `suspended` with no `payload["error"]` written; the queued steerer is promoted as the next turn. |
+| `raise asyncio.CancelledError()` | Raises `TaskCancelled()`. The chain stays alive in `suspended`; the queued steerer is promoted as the next turn. |
+| Handler calls `ctx.exit_for_recovery()` (shutdown only) | Raises `TaskDeferred()`. The chain stays `in_progress`; the recovery scanner re-invokes the handler in a future lifetime. The queued steerer remains queued. |
+
+The handler's `return X` value is delivered **unconditionally** to
+the first caller; it is never replaced by what a later turn
+produces.
+
+#### Cooperative cancellation in steering
+
+`ctx.cancel` is advisory. The framework sets it when a steering
+input arrives (alongside the cause counter
+`ctx.pending_input_count`), but does not preempt the handler. The
+handler decides:
+
+- **A — Yield immediately.** Check `ctx.cancel.is_set()` (or
+  `ctx.pending_input_count > 0`) at the next boundary and `return`
+  with whatever you have.
+- **B — Wind down to a safe checkpoint.** Finish the current tool
+  call / token batch, persist a clean checkpoint, then `return`
+  with the final value.
+- **C — Ignore cancel and finish.** Do not read `ctx.cancel`; let
+  the handler complete. The chain still transitions to
+  `suspended` and the queued steerer is promoted as the next
+  turn.
+
+#### Steering observability fields
+
+On a steering-driven re-entry, `TaskContext` exposes:
+
+- `ctx.is_steered_turn: bool` — `True` iff this turn was
+  constructed by the steering-drain code path. False for every
+  other entry path. Orthogonal to `entry_mode`:
+  `(entry_mode="recovered", is_steered_turn=True)` is legal.
+- `ctx.pending_input_count: int` — live count of currently queued
+  steering inputs. Reads as 0 for non-steerable chains. Useful for
+  "I am three turns behind, I should short-circuit even harder"
+  decisions. It is derived from the **in-process observed** steering
+  state (the property is synchronous — it does NOT issue a store read
+  per access), and is **failure-tolerant** (any compute failure reads
+  as 0). It is recorded *before* `ctx.cancel` is set (see §13 ordering
+  invariant) by both the same-process enqueue and the cross-process
+  steering poll, and is decremented as the drain consumes inputs, so a
+  handler that observes `ctx.cancel.is_set()` for a steering cause
+  already sees `pending_input_count >= 1`. It must be backed by a
+  settable runtime field (historically it was read from an attribute
+  that was never storable, so it was stuck at 0).
+
+#### Force delete
+
+`MultiTurnTask.delete(task_id)` is the only API that force-removes
+a chain. It cancels the in-flight turn (active caller's
+`.result()` resolves with `TaskCancelled`), resolves all queued
+steerer callers' `.result()` futures with `TaskCancelled`, and
+force-deletes the record. Idempotent (no-op on a missing chain).
+
+### §13. Cancellation and cause booleans
+
+`ctx.cancel` is a bare event (e.g. `asyncio.Event` in Python). The
+framework sets it from multiple causes; a handler observing the bare
+event does NOT know *why* it was set. Three independent **cause
+booleans** answer the why:
+
+| Cause | Set when | Reset? |
+|---|---|---|
+| `ctx.timeout_exceeded: bool` | Per-turn timeout watchdog has fired for this turn. | Never within a turn. |
+| `ctx.cancel_requested: bool` | `TaskRun.cancel()` was invoked against this run from external caller code. | Never within a turn. |
+| `ctx.pending_input_count: int` (read as a count, not boolean) | Live count of queued steering inputs >= 1. | Decrements as drains consume inputs. |
+
+**Causes accumulate.** Multiple cause booleans can be `True`
+simultaneously (e.g., timeout AND external cancel AND steering).
+
+**Ordering invariant.** Each cause is set BEFORE the framework sets
+`ctx.cancel`. A handler observing `ctx.cancel.is_set() == True` is
+guaranteed to see at least one cause already set (cause booleans
+or pending_input_count > 0).
+
+Canonical reaction pattern:
+
+```python
+while not ctx.cancel.is_set():
+    await do_a_unit_of_work()
+# Branch on cause:
+if ctx.timeout_exceeded:
+    return "(timed out — partial result)"
+if ctx.cancel_requested:
+    raise asyncio.CancelledError()           # caller observes TaskCancelled
+if ctx.pending_input_count > 0:
+    return "(pre-empted by queued steering input)"
+raise RuntimeError("ctx.cancel set with no recognised cause")
+```
+
+The handler's choice of terminal shape (`return X` / `raise`)
+controls what the caller observes. The framework does NOT pick
+the terminal shape on the handler's behalf. For multi-turn,
+`return X` is the implicit-suspend boundary (chain stays alive,
+caller's `.result()` resolves to `X`); for one-shot, `return X`
+ends the run (record is deleted).
+
+### §14. Timeout (per-turn, cooperative)
+
+`@task(timeout=...)` is **cooperative-only**. When the budget elapses,
+the framework:
+
+1. Sets `ctx.timeout_exceeded = True`.
+2. Sets `ctx.cancel`.
+3. Exits the watchdog.
+
+It does **NOT** force-stop the handler, end the task, or cancel
+the lease renewal. An ignoring handler runs until process exit or
+external `TaskRun.cancel()`.
+
+The budget is **per-turn** and **wall-clock**:
+
+- Each handler turn (fresh entry, suspended-to-resume) gets a
+  fresh budget.
+- A process crash mid-turn does NOT reset the budget. When the
+  recovered handler enters, the watchdog computes
+  `remaining = max(0, timeout - (now - turn_started_at))` from the
+  persisted `_turn_started_at` and fires immediately if elapsed.
+- Clock skew is clamped to `[0, timeout]` in both directions.
+- **Known gap on steering drain re-entry:** the canonical Python
+  implementation spawns the watchdog ONCE per `_execute_task`
+  invocation; steering drain re-enters in-place inside
+  `_execute_task_loop` without spawning a fresh watchdog. The
+  steered turn inherits whatever budget remained on the original
+  watchdog. The persisted `_turn_started_at` IS stamped per drain
+  (§52 Phase 1), so a CRASH-then-recover from a drained turn
+  correctly honors the new turn's budget; the in-process drain
+  path itself does not. Other-language implementers SHOULD spawn
+  a fresh watchdog per drain to honor the design intent.
+
+The framework MUST persist `payload["_turn_started_at"]` (ISO-8601
+UTC) at every turn-start boundary: fresh entry, suspended -> in_progress
+resume, steering drain re-entry. It is NOT re-stamped on crash
+recovery — that is precisely what allows the watchdog to honor the
+original budget across crashes.
+
+### §15. Retry
+
+`@task(retry=RetryPolicy(...))` and
+`@multi_turn_task(retry=RetryPolicy(...))` configure the framework's
+retry behavior for handler-raised exceptions.
+
+`RetryPolicy` parameters:
+
+| Field | Default | Meaning |
+|---|---|---|
+| `max_attempts` | `3` | Total failure-retry budget across all lifetimes. Counts the original try. |
+| `initial_delay` | `1 second` | Delay before the first retry. |
+| `backoff_coefficient` | `2.0` | Multiplier for exponential backoff. |
+| `max_delay` | `60 seconds` | Cap on per-retry delay. |
+| `jitter` | `True` | Add randomized jitter to delays. |
+| `retry_on` | `None` (all exceptions) | Tuple of exception types to retry; others propagate. A bare exception class is accepted as a single-element tuple. |
+
+Presets: `exponential_backoff()`, `fixed_delay(delay)`,
+`linear_backoff()`, `no_retry()`.
+
+Semantics:
+
+- **`retry_attempt` is the cross-lifetime counter.** Persisted as
+  `payload["_retry_attempt"]`. Re-hydrated on every handler entry
+  via `ctx.retry_attempt`. Increments only when the handler raises
+  (not on crash). Cleared on every turn-start boundary so each new
+  turn (multi-turn) or each new run (one-shot) gets a fresh budget.
+- **Crash recovery does NOT consume the budget.** A lifetime that
+  is gone before the handler raised does not advance
+  `retry_attempt`. The recovered handler sees the same
+  `ctx.retry_attempt` value the crashed lifetime saw.
+- **`return X` bypasses retry.** A handler that returns
+  (multi-turn = implicit suspend; one-shot = terminal completion)
+  is not a failure; the retry counter is unaffected.
+- When `retry_attempt >= max_attempts`, the framework gives up:
+  it stops re-invoking, and the awaiting caller observes
+  `TaskFailed(error=TaskExhaustedRetriesErrorDict(...))` carrying
+  `attempts`, `last_error`, `last_error_type`, `traceback`.
+
+#### Interim retry persistence
+
+Between every failed attempt and the next retry the framework
+PATCHes only `payload["_retry_attempt"] = <attempt + 1>`. NO
+`payload["error"]` is written between attempts — the per-turn
+failure diagnostic is not projected onto the record. The status
+stays `in_progress` throughout.
+
+When the budget is exhausted (or the exception is non-retryable),
+the failure handler runs:
+
+- **One-shot (`@task`)**: the record is DELETED entirely; nothing
+  survives on disk. The caller observes `TaskFailed` raised from
+  `.result()`.
+- **Multi-turn (`@multi_turn_task`)**: the chain transitions to
+  `suspended` with `suspension_reason="run_completion"`; NO
+  `payload["error"]` is written; queued steerers promote per §12.
+  The caller of the failing turn observes `TaskFailed` raised
+  from `.result()`. The chain stays alive — a future
+  `.run()`/`.start()` against the same `task_id` resumes the
+  chain with a fresh retry budget.
+
+The framework emits a structured ERROR log named
+`durable_task_handler_failure` on every handler raise (including
+non-final attempts). Observers learn "what just failed, which
+attempt am I on" from logs, NOT from a persisted `error` field on
+the record.
+
+`TaskFailed.error` is one of two `TypedDict` shapes:
+
+```python
+class TaskErrorDict(TypedDict):
+    type: str            # exception class name, e.g. "ValueError"
+    message: str         # str(exc)
+    traceback: str       # traceback.format_exc()
+
+class TaskExhaustedRetriesErrorDict(TypedDict):
+    type: Literal["exhausted_retries"]
+    attempts: int
+    last_error: str
+    last_error_type: str
+    traceback: str
+```
+
+Type-checkers can discriminate on the `type` literal.
+
+### §16. Shutdown and `exit_for_recovery`
+
+The container can be shut down at any time (deployment, rolling
+restart, eviction). The framework sets `ctx.shutdown` when it
+receives the shutdown signal. The handler has three legitimate
+responses:
+
+| Shape | When to use | Stored outcome | Caller observes |
+|---|---|---|---|
+| `await ctx.exit_for_recovery()` | Container shutting down AND you want this turn re-entered later. | `in_progress` (preserved across shutdown). | `TaskDeferred`. |
+| `return X` (multi-turn) | Handler reached a clean checkpoint AND wants to expose `X` to the caller. | `suspended` (caller can `.run()` again to drive the next turn). | `X` (typed as `Output`). |
+| `raise asyncio.CancelledError()` | Handler decided to abort. | One-shot: record deleted. Multi-turn: chain transitions to `suspended` (stays alive). | `TaskCancelled()`. |
+
+`ctx.exit_for_recovery()` is the durable-deferral primitive. The
+method:
+
+1. Flushes all touched metadata namespaces.
+2. **Releases ownership** of the persisted record so the next
+   process can take over (force-expires the lease).
+3. Leaves status as `in_progress` (NOT `suspended`).
+4. Raises `TaskDeferred()` upward — the caller of `.result()`
+   sees this. Semantically distinct from `TaskCancelled`: the
+   task is not cancelled; this lifetime is just deferring to the
+   next.
+5. Preserves any queued steering inputs — they are NOT drained
+   during shutdown; on recovery they remain queued.
+
+When the recovery scanner re-acquires the deferred task, the
+handler re-enters with `ctx.entry_mode="recovered"` and the
+persisted `payload["input"]` — exactly as if the lifetime had
+crashed.
+
+Misuse: calling `ctx.exit_for_recovery()` when
+`ctx.shutdown.is_set() == False` MUST raise `RuntimeError` at the
+call site. This makes misuse loudly visible to operators (the task
+ends in error, not silently `in_progress`).
+
+### §17. Metadata namespaces
+
+`ctx.metadata` is a **callable namespace facade** for the small,
+durable, per-task state the handler owns:
+
+- `ctx.metadata["key"] = value` — read/write the **default**
+  namespace, persisted at `payload["metadata"]`.
+- `ctx.metadata("session")["upstream_id"] = sid` — read/write a
+  **named** sibling namespace, persisted at
+  `payload["metadata:session"]`.
+
+Each namespace is independent: a write to one does not dirty the
+other; `flush()` on one persists only that namespace's data.
+
+`metadata.flush()` is the fence the developer uses to make
+at-most-once side-effect patterns work across a crash. The framework
+**auto-flushes** all touched namespaces at every terminal-of-turn
+boundary, so writes the developer forgets to flush are still durable
+across a graceful boundary. Explicit `flush()` is for mid-handler
+fence semantics.
+
+**Naming convention:** namespaces and top-level metadata keys
+starting with `_` are RESERVED for the framework. The primitive
+treats this as a convention at the API surface; layers built on top
+(e.g. the responses framework's `_responses` namespace) MAY enforce
+it more strictly.
+
+`TaskMetadata` MUST expose dict-like semantics
+(`__getitem__`/`__setitem__`/`__contains__`/`__iter__`/`.get()`/`.to_dict()`)
+plus:
+
+- `flush()` — persist this namespace only.
+- `increment(key)` — in-memory atomic numeric increment **on the
+  metadata namespace object** (read/modify/write under an in-
+  memory lock). The change is NOT pushed to the store until the
+  next `flush()` / auto-flush. This is NOT a store-level
+  compare-and-swap; concurrent processes incrementing the same
+  key would race at the store level. Use for handler-local
+  counters that get flushed at clean boundaries; for cross-
+  process atomic counters, use the store's CAS protocol directly
+  via the provider.
+- `append(key, value)` — append to a list-valued key. Same
+  in-memory semantics as `increment`: atomic within the namespace
+  object, NOT atomic against the durable record.
+
+Flush failures are logged, not raised — a failed flush should not
+crash a handler. The framework retries on the next flush call or
+auto-flush boundary.
+
+---
+
+
+## Part III — Storage contract (wire-level)
+
+This part documents how the framework projects the programming model
+onto the durable task record. The HTTP routes, request/response
+envelopes, and server-side merge rules themselves are defined by the
+*Foundry Task Storage Protocol* specification; this section names which
+fields the framework reads/writes and what the framework-reserved
+keys mean.
+
+### §18. Reference to the Foundry Task Storage Protocol
+
+The hosted task store's transport-level contract — routes
+(`POST /tasks`, `GET /tasks`, `GET /tasks/{id}`, `PATCH /tasks/{id}`,
+`DELETE /tasks/{id}`), authentication, activation, payload PATCH merge
+semantics, attachment PATCH merge semantics, ETag/CAS rules,
+classification of 409/412 responses — is specified by
+`foundrysdk_specs/specs/hosted-agents/container-spec/docs/foundry-task-storage-protocol-spec.md`.
+
+This document does **not** restate that contract. Implementers MUST
+conform to the protocol spec for any hosted-provider implementation.
+The conformance items in §59 reference both this document and the
+protocol spec.
+
+Where this spec uses terms like "PATCH" or "etag", it does so under
+the protocol spec's definitions.
+
+### §19. The framework's view of the task record
+
+The framework writes/reads the following fields on every task record.
+Field meanings beyond this table are defined in the protocol spec.
+
+| Field | Type | Owned by | Set on |
+|---|---|---|---|
+| `id` | string | caller | `create`. |
+| `agent_name` | string | framework | `create`. |
+| `session_id` | string | framework | `create`. |
+| `status` | `pending` / `in_progress` / `suspended` / `completed` | framework | `create`, status transitions (§24). |
+| `title` | string \| null | caller | `create` (optional). |
+| `description` | string \| null | caller | `create` (optional). |
+| `lease` | LeaseInfo (§22) | framework | `create`, every renewal, every reclaim. |
+| `payload` | object | framework + developer | almost every transition (§20). |
+| `tags` | map of string -> string | framework + caller | `create` (framework stamps `_task_name`); caller-set tags allowed. |
+| `error` | object \| null | framework | on handler raise. |
+| `suspension_reason` | string \| null | framework | on suspend. |
+| `source` | object | framework | `create` (§21). |
+| `attachments` | object \| null | framework + developer | on input promotion / drain / suspend / orphan cleanup (§23). |
+| `etag` | string | server | every server-issued response. |
+| `created_at` | ISO-8601 string | server | `create`. |
+| `updated_at` | ISO-8601 string | server | every PATCH. |
+| `started_at` | ISO-8601 string \| null | server | **set once on first `in_progress` transition; never updated thereafter** (lease re-acquisition, recovery scanner takeover, and suspend/resume cycles do NOT reset). |
+| `completed_at` | ISO-8601 string \| null | server | terminal transition. |
+
+Caller-controlled fields (`tags` keys NOT starting with `_task_`,
+`title`, `description`) are passed through verbatim. Framework-owned
+fields MUST NOT be set by caller code.
+
+### §20. Framework-reserved payload keys
+
+`payload` is the JSON object that holds both the framework's
+runtime state and the developer's metadata. The framework reserves
+the following top-level keys, all starting with `_` or named
+`input`/`metadata`/`output`:
+
+| Key | Type | Lifetime | Meaning |
+|---|---|---|---|
+| `input` | any JSON value, or a ref dict (§23) | Set on every `in_progress` transition; cleared at suspend; cleared by drain after consumption. | The current input value (or a ref to its attachment). |
+| `metadata` | object | Persisted at boundaries; auto-flushed. | The DEFAULT user metadata namespace. |
+| `metadata:<ns>` | object | Same as above. | NAMED user metadata namespace `<ns>`. |
+| `_last_input_id` | string \| null | Set when caller supplies `input_id`. | Chain-head tracking (§11). |
+| `_turn_started_at` | ISO-8601 UTC string | Set at every turn-start boundary; NEVER re-stamped on recovery. | Source of truth for the per-turn watchdog (§14). |
+| `_retry_attempt` | integer | Incremented on handler raise; reset to 0 on steering drain. (Not also reset on success in the canonical Python implementation.) | Durable retry counter (§15). |
+| `_steering` | object (see below) | Only present on steerable tasks. | Steering mechanism state (§12). |
+
+The framework does NOT persist the handler's return value in the
+task record. There is no `payload["output"]` key and no `_output`
+attachment. The handler's return value resolves the in-process
+caller's `TaskRun.result()` future and is then no longer reachable
+from the persisted record. Per-turn outputs that need to survive
+crashes are the handler's responsibility — write them through
+your own storage (e.g., LangGraph checkpoint, your own DB) before
+returning.
+
+Likewise, `error` from a handler raise is NOT persisted. The
+framework emits a structured ERROR log (named
+`durable_task_handler_failure`) on every handler raise, but the
+chain record itself does not carry the per-turn diagnostic.
+
+`_steering` object shape:
+
+| Sub-key | Type | Meaning |
+|---|---|---|
+| `pending_inputs` | array of input values OR refs (§23) | FIFO of queued steering inputs. |
+| `next_input_seq` | integer | Monotonic counter for promoted-attachment key allocation (NEVER reused). |
+| `cancel_requested` | boolean | Durable cancel signal; set on steering append; cleared after drain when pending is empty. |
+| `drain_in_progress` | boolean | True between the start of a drain PATCH and the next turn-start; protects against partial drain on crash. |
+| `active_input` | any JSON value OR ref | The single input being drained (mirror copy used by the race-recovery contract). Cleared at suspend / terminal. |
+
+Implementers in other languages MUST use these exact key names. A
+process built in language X must be able to recover a task created
+by language Y.
+
+Keys NOT in this table are caller-controlled (e.g. user metadata
+namespaces); the framework leaves them alone.
+
+### §21. Framework-reserved tag keys and `source` shape
+
+#### Reserved tag keys
+
+The framework stamps the following `tags` entries on `create`:
+
+| Tag key | Value | Purpose |
+|---|---|---|
+| `_task_name` | The decorator's `name` (or `fn.__qualname__` fallback). | Server-side `LIST` filtering by task name. |
+
+Tag keys starting with `_task_` are RESERVED. Caller-supplied tags
+using this prefix are stripped at the call site with a warning;
+the framework does not pass them to the server.
+
+#### `source` shape
+
+The framework stamps `source` on `create`:
+
+```
+{
+   "type":           "agentserver.task",
+   "name":           "<the decorator's name (or fn.__qualname__)>",
+   "server_version": "<sdk_name>/<sdk_version> (<runtime>/<version>)"
+}
+```
+
+`source.name` is the **canonical identity anchor** for recovery
+routing — the framework looks up the registered handler callback
+by matching `source.name` against the decorator-supplied names.
+`source.type` is currently a single fixed string but is reserved
+for future namespacing.
+
+### §22. Lease structure and ownership semantics
+
+`lease` is a sub-object with the following fields:
+
+| Field | Type | Meaning |
+|---|---|---|
+| `owner` | string | `<agent_name>\|session:<session_id>` (§7). Stable across process lifetimes. |
+| `instance_id` | string | `worker-<pid>-<rand8hex>-<unix_seconds>`. Fresh per process. |
+| `generation` | integer | Increments each time the lease is re-acquired with a different `instance_id`. Mirrored to `ctx.recovery_count`. The local provider AND the hosted task store both bump this. |
+| `expires_at` | ISO-8601 UTC string | When the lease expires (and another process may reclaim). |
+| `expiry_count` | integer | Number of times ownership has changed via **actual expiry** (i.e. lease was reclaimed because the prior lease's `expires_at` passed, NOT because the same owner restarted). **Server- / provider-only counter** — the framework never writes this field (it is not on `TaskPatchRequest`). The hosted task store bumps it; the local file provider also bumps it on actual-expiry reclaim for parity (so local-mode tests can assert expiry-counter behavior). Surfaced on the framework's internal `TaskInfo`; NOT projected onto the public `TaskRun` handle (lease bookkeeping is framework-internal). |
+| `heartbeat_at` | ISO-8601 UTC string | Wall time of the most recent lease write (acquisition, renewal, or force-expire). Stamped by the provider on every lease-touching PATCH. **Provider-only field** — the framework never writes this; consumers and observability tooling read it to distinguish "fresh lease" from "lease that hasn't expired yet". NOT projected onto the public `TaskRun` handle — it's a framework / operator concern, not a developer one. |
+
+The framework's interaction with the lease:
+
+- On `create`, the framework sets `lease_owner = self.owner`,
+  `lease_instance_id = self.instance_id`, and
+  `lease_duration_seconds = 60` (the framework default).
+- The lease renewal loop (§56) renews at half the lease duration
+  (every 30s by default), but its next tick is computed
+  DYNAMICALLY from the per-task last-refresh time, NOT a fixed
+  cadence. So a PATCH within the last `interval` seconds fully
+  shadows the next heartbeat.
+- **Every PATCH the framework issues** (renewal, metadata,
+  steering, suspend, drain, complete, fail, reclaim) MUST
+  piggyback (`lease_owner`, `lease_instance_id`,
+  `lease_duration_seconds`) to refresh the lease as a side effect.
+  See §25.4.
+- On reclaim (§54), the framework PATCHes the lease to itself with
+  `if_match: <last-seen etag>` for CAS. BOTH the inline reclaim
+  AND the cold-start/periodic scan reclaim use `if_match` (closes
+  the prior known gap).
+- On `ctx.exit_for_recovery()` (§16), the framework force-expires
+  the lease so the next process can reclaim immediately.
+
+The framework recognizes three lease states for a foreign-instance
+or expired record:
+
+1. **Live and same-instance** — my own running task; do nothing.
+2. **Live and different-instance, same-owner** — a previous lifetime
+   of mine. RECLAIM immediately (no expiry wait). `expiry_count` is
+   NOT bumped (the server only bumps on actual-expiry handoff, and
+   this isn't one).
+3. **Expired (any owner)** — RECLAIM. `expiry_count` IS bumped
+   (server-side, in the hosted store; AND in the local provider
+   for parity — see the table above).
+
+**Important: the framework never writes `expiry_count`.** It is not
+a field on `TaskPatchRequest` (only `lease_owner`,
+`lease_instance_id`, `lease_duration_seconds` are writable). The
+hosted task store and the local file provider both increment it
+server-side / provider-side on actual-expiry ownership change; the
+framework only reads it.
+
+#### 22.1 Lease write rules (provider-enforced, identical for hosted and local)
+
+These rules MUST be enforced by **both** providers identically.
+Violations raise the internal `_HostedConflict` (§39) which the
+framework translates to public exceptions per the translation table
+(also §39). Local file provider raises the same logical conditions
+directly, with the same internal classification, so the framework
+behaves identically against either backing.
+
+| # | Rule | When violated |
+|---|---|---|
+| LSE-W-1 | `lease_duration_seconds` MUST be `0` (force-expire) OR in the range `10..3600` (renewal). | Reject as `invalid_request` (400). |
+| LSE-W-2 | The triplet `(lease_owner, lease_instance_id, lease_duration_seconds)` is all-or-nothing. Supplying any one without all three is rejected. | Reject as `invalid_request` (400). |
+| LSE-W-3 | Lease acquisition / renewal against a record whose lease is currently held by a **different** owner AND not yet expired is rejected. | Raise `_HostedConflict(_code="lease_held_by_another")` → `TaskConflictError(current_status="in_progress")`. |
+| LSE-W-4 | When transitioning a task from `in_progress` → `pending`, the supplied `(lease_owner, lease_instance_id)` MUST match the record's current lease. | Raise `_HostedConflict(_code="lease_held_by_another")`. |
+| LSE-W-5 | Lease renewal (no status change, `lease_duration_seconds > 0`) is only valid when the current status is `in_progress`. Renewing on `pending` / `suspended` / `completed` is rejected. | Reject as `invalid_request` (400). |
+| LSE-W-6 | `lease_duration_seconds = 0` (force-expire) cannot be combined with a status transition in the same PATCH. | Reject as `invalid_request` (400). |
+| LSE-W-7 | Force-expire (`lease_duration_seconds = 0`) requires the caller's `(lease_owner, lease_instance_id)` to match the current lease UNLESS the lease is already expired (in which case any caller may force-expire). | Raise `_HostedConflict(_code="lease_held_by_another")` if mismatched and lease is still live. |
+| LSE-W-8 | `started_at` is **immutable** after the first `in_progress` transition. Lease re-acquisition (including expired-lease takeover by a different owner OR same-owner restart) MUST NOT update `started_at`. The original wall-clock time of the first turn-start is preserved across recovery, restarts, and suspend/resume cycles. | (Behavioral — observable via the task manager's provider; not on the public `TaskRun` handle.) |
+| LSE-W-9 | On lease handoff to a different owner where the prior lease was **expired**, `expiry_count` MUST be incremented. Same-owner different-instance handoff before expiry does NOT bump. | (Behavioral — observable via the task manager's provider; not on the public `TaskRun` handle.) |
+| LSE-W-10 | On every successful lease write (acquisition, renewal, force-expire), the provider MUST stamp the lease's `heartbeat_at` field to "now". This field exists on `LeaseInfo` so consumers and observability tooling can distinguish a fresh lease from one that simply hasn't expired yet. | (Behavioral — observable through `LeaseInfo.heartbeat_at` in the internal `TaskInfo`. Not on the public surface.) |
+
+### §23. Attachments and input promotion
+
+The hosted task store provides a second per-task storage slot,
+`attachments`, alongside `payload`. The two stores have different
+budgets:
+
+| Slot | Per-task cap | Per-value cap | Entry count cap |
+|---|---|---|---|
+| `payload` | 1 MB | n/a (shared) | unlimited keys |
+| `attachments` | n/a (per-entry only) | 2 MB per attachment | 20 attachments max |
+
+`attachments` lets the framework lift the per-input ceiling from
+"however much fits in payload alongside everything else" to
+**2 MB per input** without evicting metadata budget.
+
+#### 23.1 PATCH merge semantics
+
+The hosted store's merge semantics for `attachments` mirror `tags`:
+
+- Key present with non-null value -> **upsert** (new) or **replace** (existing).
+- Key present with `null` -> **delete** that entry.
+- Key absent -> **unchanged**.
+- `attachments` field absent entirely -> no attachment changes.
+
+PATCHes that include BOTH `payload` and `attachments` are atomic
+across both stores. This is load-bearing: every promote, drain,
+suspend, and orphan-cleanup write co-PATCHes payload + attachments
+in a single round trip.
+
+#### 23.2 Thresholds + always-attachment for output (framework-owned)
+
+The framework treats different channels differently. Inputs use a
+size threshold; output ALWAYS uses an attachment (no threshold,
+no inline shape).
+
+| Channel | Promotion rule | Attachment key |
+|---|---|---|
+| Function input (`payload["input"]`) | > 200 KiB serialized → ref; otherwise inline. | `_input` |
+| Each steering input (entry in `_steering["pending_inputs"]`) | > 20 KiB serialized → ref; otherwise inline. | `_steering_input_<seq>` |
+
+Different rules because:
+
+- The function input is set once per turn-start. A 200 KiB inline
+  budget keeps small inputs cheap and only spills clearly-large ones.
+- Steering inputs may accumulate (up to 9 queued). A 20 KiB
+  threshold caps the worst-case inline payload contribution from
+  steering at ~180 KiB even when the queue is full.
+
+There is no `_output` channel and no output promotion. The
+framework does not persist handler return values; outputs resolve
+the in-process caller's `TaskRun.result()` future directly and are
+never projected onto the task record.
+
+Sizes are measured in bytes of canonical JSON
+(`sort_keys=True`, separators `(",", ":")`).
+
+Worst-case framework attachment usage:
+`_input` (1) + `_steering_input_*` (up to 9) =
+**10 of 20** per-task attachment slots. Leaves 10 slots free for
+future use.
+
+#### 23.3 Wire shapes — two only
+
+A slot that would hold an input (`payload["input"]`, an entry in
+`_steering["pending_inputs"]`) is represented in exactly one of two
+shapes:
+
+**Inline** (size <= threshold): the raw JSON value, verbatim.
+
+**Ref** (size > threshold): a single-magic-key dict pointing at the
+attachment:
+
+```json
+{
+   "__attachment_ref__": {
+      "key":  "<attachment-key>",
+      "hash": "sha256:<64 lowercase hex chars>"
+   }
+}
+```
+
+**Detection rule** (used everywhere the framework reads a slot):
+the slot is a ref iff (1) it is a JSON object, (2) it has exactly
+one key, (3) that key is `__attachment_ref__`, (4) the value is an
+object with both `key` and `hash`. Everything else is inline.
+
+The inline + ref shapes are **disjoint**: a developer-supplied
+inline value cannot accidentally be misread as a ref because the
+detection rule's 4-step structure is too specific to occur
+incidentally.
+
+#### 23.4 Single wire shape
+
+The framework reads and writes exactly the inline + ref shapes
+documented in §23.3. The primitive is in private preview; there is
+no version-skew compatibility to maintain.
+
+#### 23.5 Sequence number invariants (steering)
+
+`payload["_steering"]["next_input_seq"]` is the monotonic counter
+the framework uses to derive `_steering_input_<seq>` keys. Critical
+invariants:
+
+- **Advances ONLY on promotion.** Inline steering appends do not
+  bump `next_input_seq`.
+- **Never reused.** A drained-and-deleted key is never re-allocated;
+  the next promoted append always uses the current
+  `next_input_seq`, then `next_input_seq += 1`.
+- **Stable for surviving entries.** A drain pops the head of
+  `pending_inputs` and (if it was a ref) deletes the corresponding
+  `_steering_input_<seq>` attachment. It does NOT renumber any
+  other entry. A queue of `[ref_3, ref_4]` becomes `[ref_4]` after
+  one drain; `ref_4` keeps its key.
+
+This invariant is what allows the framework to drain without
+re-uploading attachments — a property that would be impossible if
+keys encoded queue position.
+
+#### 23.6 Content hash
+
+Every ref carries `hash: "sha256:<hex>"` where the hex is the
+SHA-256 of the canonical JSON bytes
+(`sort_keys=True`, separators `(",", ":")`) of the attachment
+value. The framework writes the hash on promotion.
+
+**Hash validation (known gap).** The canonical Python
+implementation today writes the hash on every promotion but does
+NOT validate it on read — `_read_input_value()` resolves the ref
+key against `attachments` and returns the value without
+recomputing the hash. Other-language implementers SHOULD validate
+on read (recompute hash from the attachment value, compare against
+the ref's hash, raise on mismatch) to detect store-side
+corruption. Cross-implementation byte-compatibility requires using
+the SAME canonicalization rules so a write from one language can
+be validated by another.
+
+The hash is sufficient for ref validity once validated (no separate
+write-timestamp is needed): SHA-256 birthday-bound collision
+probability at fleet trillion/sec × 100 years is < 1 in 10^33.
+
+#### 23.7 Caps and pre-network enforcement
+
+Caps:
+
+- Per-attachment value: **2 MB** serialized.
+- Per-task attachment count: **20**.
+
+The framework enforces (pre-network) and surfaces developer-facing
+exceptions based on which channel the violation occurs on:
+
+| Cap | Where enforced | Developer-facing exception |
+|---|---|---|
+| Per-value (2 MB) on `_input` | Create + PATCH, both providers | `InputTooLarge` (the framework remaps an internal `_AttachmentTooLarge` based on attachment-key prefix) |
+| Per-value (2 MB) on `_steering_input_<seq>` | Steering append site (always reads state first to count) | `InputTooLarge` |
+
+| Per-task count (20) on `create` | Create path | `_AttachmentLimitExceeded` (internal) — reachable only via direct provider use, which is unsupported |
+| Per-task count (20) on `patch` | Local provider (cheap count); hosted PATCH relies on server-side check | `_AttachmentLimitExceeded` (internal) |
+
+Internal exceptions `_AttachmentTooLarge` and
+`_AttachmentLimitExceeded` are **provider-internal** — they are
+NOT exported from `durable/__init__.py`. The framework catches
+`_AttachmentTooLarge` and re-raises the appropriate developer-
+facing exception based on the attachment key prefix (`_input` /
+`_steering_input_*` → `InputTooLarge`).
+`_AttachmentLimitExceeded` is unreachable in normal framework
+operation (worst case is 11 of 20 slots; see §23.2) and if it ever
+propagates indicates a framework bug — caught at the boundary and
+converted to `RuntimeError`.
+
+#### 23.8 Atomic co-writes
+
+These transitions MUST be single PATCHes carrying BOTH `payload` and
+`attachments`:
+
+1. **Promote on `.start()` (fresh)**: `attachments["_input"] = <value>`
+   + `payload["input"] = {ref}` (CREATE on the hosted store).
+2. **Promote on resume**: same fields, but PATCH.
+3. **Suspend (multi-turn turn-end via `return X`)**:
+   - `payload["input"] = null`
+   - `payload["_steering"]["active_input"] = null`
+   - `payload["_retry_attempt"] = null` (fresh budget for the next turn)
+   - `attachments["_input"] = null` (delete) IF the input was a ref
+4. **Steering append (promoted)**: `payload["_steering"]["pending_inputs"]
+   += [{ref}]`, `attachments["_steering_input_<seq>"] = <value>`,
+   `payload["_steering"]["next_input_seq"] += 1`,
+   `payload["_steering"]["cancel_requested"] = true`.
+5. **Steering drain (promoted entry, Phase 1)**:
+   `payload["_steering"]["pending_inputs"]` without the popped
+   head, `attachments["_steering_input_<seq>"] = null`,
+   plus the new turn's `_turn_started_at`.
+6. **One-shot completion**: the record is deleted (one-shot is
+   always ephemeral).
+7. **Failure**: one-shot → record deleted; multi-turn → status="suspended"
+   with `suspension_reason="run_completion"`. No `payload["error"]`
+   is written; the per-turn failure surfaces to the caller via
+   `TaskFailed(error=...)` and via the structured log
+   `durable_task_handler_failure`.
+8. **Resume (suspended → in_progress)**: status="in_progress",
+   `_turn_started_at` re-stamped, `_retry_attempt` reset to 0.
+   New input written (inline or as ref + attachment per §23.2).
+
+Splitting any of these into multiple PATCHes opens a crash window
+where the attachment exists without its ref (or vice versa). The
+framework treats this as a single-PATCH invariant.
+
+#### 23.9 Attachment key validation
+
+Attachment keys MUST match the regex `^[a-zA-Z0-9_.\-]{1,64}$` and
+MUST NOT be empty after trimming whitespace. Both providers enforce
+this on every CREATE / PATCH write. The framework's reserved keys (`_input`, `_steering_input_<seq>`) all conform.
+Developer-supplied attachment keys (none exist today — attachments
+are framework-owned per §23.7) would also be validated against this
+regex if the surface is ever expanded.
+
+#### 23.10 Clear-all gesture
+
+In addition to per-key null-as-delete (§23.1), the provider accepts a
+top-level "clear all attachments" gesture:
+
+- Wire form: `PATCH ... { "attachments": null }`.
+- Effect: deletes every attachment on the task, regardless of which
+  keys currently exist. Per-key entries supplied in the same PATCH
+  are NOT applied (the clear takes precedence).
+- Typed-API form: `TaskPatchRequest.clear_attachments = true`. When
+  set, the hosted provider serializes `attachments: null`; the local
+  provider clears the attachments dict directly. Mutually exclusive
+  with `attachments={...}` (per-key patch) in the same request — the
+  combination is rejected as `invalid_request`.
+- The framework today never emits this gesture; per-key delete
+  covers all current needs. It is documented for parity with the
+  service and for future internal callers (e.g. orphan-attachment
+  cleanup post-recovery).
+
+DELETE on a task removes all attachments along with the task. The
+local provider achieves this trivially (attachments live in the
+same JSON file as the task record; unlinking the file removes
+both). The hosted provider relies on the service's blob-cleanup
+hook.
+
+### §24. Status state machine
+
+The framework drives the following transitions:
+
+```
+            create()                                handler returns
+              │                                    or raises
+              ▼                                    ┌──────────────┐
+        ┌──────────┐    auto-start  ┌──────────────│  completed   │
+        │ pending  │ ──────────────▶│ in_progress  │ (terminal)   │
+        └──────────┘                │              │              │
+                                    │              └──────────────┘
+                                    │  return X (multi-turn)
+                                    ▼              ▲
+                              ┌──────────┐         │
+                              │suspended │ ────────┘
+                              └──────────┘ .run/.start with new input
+                                    ▲
+                                    │
+                                    │ reclaim (same status,
+                                    │ new lease)
+                                    │
+                                    └─── in_progress (foreign lease)
+```
+
+Notes:
+
+- The framework usually creates with `status = in_progress` directly
+  (the `pending` state is rarely externally observed).
+- `in_progress -> in_progress` is the most-traversed transition
+  (every lease renewal, every reclaim, every steering drain, every
+  successful retry).
+- `completed` is terminal; the *outcome* (success / failure /
+  cancel) is communicated through the typed exceptions, not via a
+  separate status value.
+- `ctx.exit_for_recovery()` preserves `in_progress` and force-expires
+  the lease — it is the only way to release ownership without moving
+  to a different status (§16).
+
+#### 24.1 Allowed transition matrix (provider-enforced)
+
+The provider rejects PATCHes whose declared `status` transition is
+not in this table. Internal classification `_HostedConflict(_code="invalid_state_transition")`,
+translated to a generic framework error at the boundary (this
+condition should never escape to developer code — the framework
+chooses transitions, not the developer; if it ever does escape it's
+a framework bug per Workstream C).
+
+| From → To | `pending` | `in_progress` | `suspended` | `completed` |
+|---|---|---|---|---|
+| `pending` | n/a | ✅ | ❌ | ✅ |
+| `in_progress` | ✅ (with matching lease) | ✅ (lease renewal) | ✅ | ✅ |
+| `suspended` | ✅ | ✅ | ✅ | ✅ |
+| `completed` | ❌ (terminal) | ❌ | ❌ | ✅ (no-op only — see §24.2) |
+
+#### 24.2 Terminal immutability
+
+A PATCH against a task whose current status is `completed` is
+rejected UNLESS the PATCH is a no-op `completed → completed` AND
+carries no other field changes (no `payload`, no `tags`, no
+`error`, no `suspension_reason`, no lease). The no-op pass-through
+returns the existing record without modification — this lets
+idempotent retry-loops behave predictably.
+
+Any other PATCH against a completed task raises
+`_HostedConflict(_code="task_immutable")` → translated to
+`TaskConflictError(current_status="completed")`.
+
+#### 24.3 Delete force semantics
+
+DELETE on a task in any **non-terminal** status (`pending`,
+`in_progress`, `suspended`) requires `force=true`. Without it the
+provider rejects the delete as `invalid_request` (400) — note this
+is **NOT** a conflict (409); the service's PR 2135250 explicitly
+moved this from 409 → 400 with code `invalid_request`.
+
+DELETE on a **terminal** (`completed`) task always succeeds (no
+force required).
+
+DELETE additionally honors `If-Match`: when supplied, the
+provider rejects the delete with `_HostedConflict(_code="etag_mismatch")`
+→ `EtagConflict` if the supplied etag does not match the current
+record.
+
+### §25. ETag (optimistic concurrency) + in-process write serialization
+
+The framework uses the hosted store's ETag/CAS protocol per the
+Foundry Task Storage Protocol spec.
+
+#### 25.1 Etag tracking — always-on after the first read/create
+
+After the first successful read/create on a `task_id`, **every
+subsequent PATCH MUST carry `If-Match` with the latest known etag**
+for that task. The framework tracks the latest etag in the
+in-memory active-task entry, updating it from every PATCH/GET
+response. `delete()` is the only operation that MUST NOT carry
+`if_match` — deletion is intentionally unconditional and tolerates
+a concurrent winner.
+
+**No blind writes.** This applies to *every* PATCH-issuing site,
+including those that hold the per-task write lock and call the
+provider directly to avoid re-entrant lock acquisition (e.g. the
+queued-steering-cancel path): such sites MUST go through the
+lock-held update helper that selects `If-Match` from the tracked
+etag, never a bare `provider.update` with no `if_match`.
+
+The service-returned `etag` value is passed verbatim as `If-Match`
+on the next PATCH. The framework does NOT strip surrounding quotes,
+normalize whitespace, or otherwise rewrite it.
+
+#### 25.2 Per-task in-process write queue
+
+Without coordination, the framework has multiple concurrent
+PATCH-issuing code paths against the same task: lease renewal
+heartbeats, metadata flushes (handler-issued AND auto-flush at
+turn boundaries), steering append, steering drain Phase-1/3,
+suspend, complete, fail, output writes, and reclaim. All of these
+race in-process for the same etag and can produce avoidable 412
+conflicts in steady state.
+
+The framework MUST serialize these writes through a **per-task
+asyncio lock** held for the read-state + compute-PATCH + apply
+cycle. Reads (e.g., `Task.get(task_id)`) do NOT take this lock —
+they're snapshot operations that don't move the etag.
+
+The read MUST happen **inside** the lock for any read-modify-write
+sequence (steering drain, queued-steering-cancel, etc.), so the
+record read and the PATCH are atomic with respect to other
+in-process writers (notably the lease-renewal heartbeat). A site
+that reads the record (or pins an etag) *before* acquiring the lock
+can have its etag invalidated by the heartbeat between the read and
+the write, which under contention starves the retry budget. Because
+the per-task lock is a **non-reentrant** `asyncio.Lock`, the
+framework provides two helpers: a lock-acquiring update (for callers
+that do not hold the lock) and a lock-held update (for callers that
+already hold it, e.g. the drain); both select `If-Match` from the
+tracked etag and refresh it on success.
+
+Lock lifecycle:
+
+- Per-`task_id` `asyncio.Lock` allocated lazily on first write.
+- Released after the PATCH response is recorded (etag updated).
+- Removed from the in-memory lock table when the local active-task
+  entry is torn down (no leaked locks).
+
+In-process contention now serializes; cross-process contention
+(another worker reclaimed the lease) still surfaces as 412 because
+the queue is in-process only.
+
+#### 25.3 412 (etag conflict) resolution — per-operation policy
+
+When a PATCH inside the queue gets a 412, the appropriate response
+depends on the operation's INTENT. There is no single retry rule:
+
+| Operation | On 412, do what |
+|---|---|
+| Metadata flush | re-read state, overwrite the addressed namespace with local value (last-write-wins), retry (up to 5 attempts). |
+| Steering append | re-read `_steering`, append to the NEW state's `pending_inputs`, bump `next_input_seq` from the NEW state, retry (up to 5 attempts). Idempotent when `input_id` is supplied. |
+| Steering drain (Phase 1) | re-read `_steering`, drain the NEW head, retry (up to 5 attempts). |
+| Steering drain (Phase 3) | re-read, retry (up to 5 attempts). |
+| Lease renewal heartbeat | re-read lease; if still ours, retry; otherwise signal eviction. |
+| Suspend / complete / fail terminal writes | **RE-READ + decide.** A 412 here means our etag is stale — that's all we know on its own. Re-read the record, then choose: (a) if the lease is **no longer ours** (`lease.owner` differs OR `lease.instance_id` differs OR `lease.expiry_count` bumped past our cached value) → ABANDON and signal awaiters via the eviction path (C-LSE-4 / C-ERR-2); the new owner is authoritative and our terminal would clobber their in-flight recovery. (b) If `status` is already terminal (`completed`) → ABANDON; another actor already wrote the terminal. (c) Otherwise (lease still ours, status still `in_progress`) → retry the terminal PATCH against the new etag, up to 5 attempts. Steering inputs that another process appended between our read and our retry are silently superseded by the terminal write — that is correct behavior because the steerer's `.result()` MUST then raise `TaskConflictError(current_status="completed")` per C-STR-6, which is how cross-process steering-after-terminate is supposed to surface. |
+| Output write (part of suspend/complete) | inherits the parent operation's policy. |
+| Resume-clear-output (part of resume) | re-read, retry (up to 5 attempts). |
+| Recovery reclaim (inline) | ABANDON. The 412 IS the race-detection — another process beat us to the reclaim. Let the next caller / scan re-evaluate. |
+| Recovery reclaim (cold-start / periodic) | ABANDON. Same reasoning. |
+
+Default retry budget is 5 attempts unless noted. Each retry
+re-acquires the per-task lock before the re-read + re-merge + re-write
+cycle. `LastInputIdPreconditionFailed` (for `if_last_input_id`) and
+`EtagConflict` (for low-level callers) propagate as today.
+
+#### 25.4 Auto-extension piggyback on every PATCH
+
+Every PATCH the framework issues — renewal, metadata, steering,
+suspend, etc. — MUST include the lease-extension trio
+(`lease_owner`, `lease_instance_id`, `lease_duration_seconds`) so
+the lease is refreshed as a side effect. The renewal loop's next
+tick is computed dynamically from the per-task last-refresh time
+(NOT a fixed cadence), so a PATCH within the last `interval`
+seconds fully shadows the next heartbeat. See §56.
+
+**Lease renewal requires `in_progress`.** The task store accepts the
+lease-extension trio as a *renewal* only when the record is already
+`in_progress`, and as a *claim* only when the same PATCH transitions
+the record INTO `in_progress` (e.g. reclaim, or the steering-drain
+Phase-1 PATCH per §52). A PATCH that carries the lease trio against a
+`suspended`/`pending`/terminal record WITHOUT a status flip to
+`in_progress` is rejected ("lease renewal is only supported for
+in_progress tasks"). Therefore any framework path that writes to a
+record left `suspended` by a prior turn (notably the steering drain)
+MUST set `status='in_progress'` in the same PATCH. The local provider
+enforces this same rule so the conflict is reproducible without a
+hosted deployment.
+
+### §26. Recovery — internal lifecycle, no public HTTP endpoint
+
+There is no HTTP route for resume. Resume is initiated from
+caller code via the normal `Task.start` / `Task.run` (one-shot)
+or `MultiTurnTask.start` / `MultiTurnTask.run` (multi-turn) entry
+points. The framework's lifecycle state machine transitions a
+`suspended` task back to `in_progress` and re-enters the handler
+without exposing a server-side endpoint.
+
+Crash recovery for tasks that died mid-`in_progress` is handled
+internally by the periodic recovery scanner described in §55:
+the scanner detects abandoned leases and re-invokes the handler
+with the persisted `payload["input"]` and
+`entry_mode="recovered"`.
+
+---
+
+## Part IV — Provider abstraction (storage backends)
+
+> **Visibility:** Everything in this part is **framework-internal**.
+> The `TaskProvider` interface and the two concrete providers
+> (`HostedTaskProvider`, `LocalFileTaskProvider`) are NOT part of
+> the public surface defined in Part V — in the canonical Python
+> implementation, all of these live in `_`-prefixed modules
+> (`_provider.py`, `_client.py`, `_local_provider.py`) and are
+> NOT re-exported from `durable/__init__.py`'s `__all__`. The
+> abstraction exists to keep the manager testable and to let the
+> framework swap hosted vs. local backends — but framework
+> consumers are not expected (and not supported) to construct or
+> consume providers directly. This part documents the contract a
+> re-implementer (in another language) MUST satisfy when writing
+> the provider layer.
+
+### §27. `TaskProvider` interface
+
+The framework abstracts over the storage backend via a single
+async interface. Two providers ship: hosted (HTTP-backed) and local
+(file-backed); a third (in-memory) is conceptually possible.
+
+```
+class TaskProvider:
+    async def create(request: TaskCreateRequest) -> TaskInfo: ...
+    async def get(task_id: str) -> TaskInfo | None: ...
+    async def update(task_id: str, patch: TaskPatchRequest) -> TaskInfo: ...
+    async def delete(task_id: str, *, force: bool = False, cascade: bool = False) -> None: ...
+    async def list(*, agent_name: str | None = None,
+                       session_id: str | None = None,
+                       status: TaskStatus | None = None,
+                       tag: dict[str, str] | None = None,
+                       source_type: str | None = None) -> list[TaskInfo]: ...
+```
+
+Semantic requirements:
+
+- `get(task_id)` MUST return `None` for missing tasks (not raise).
+- `update()` MUST honor the `if_match` field on the patch for CAS.
+- `update()` payload MUST shallow-merge.
+- `update()` tags MUST null-as-delete merge.
+- `update()` attachments MUST null-as-delete merge (§23.1).
+- `delete()` MUST be idempotent at the SCHEDULING level (multiple
+  `.delete()` calls do not error). The provider's lower-level
+  `provider.delete(task_id)` MAY raise `TaskNotFound` for already-
+  deleted records; callers of the provider directly MUST handle
+  this. The canonical Python implementation's hosted provider
+  raises on 404 and the local provider raises on missing files;
+  `MultiTurnTask.delete(task_id)` shields user code from these by catching
+  "not found" substring matches and re-raising as `TaskNotFound`
+  the first time, and being a no-op only at the user-facing
+  `Task` surface.
+- `list(...)` MUST filter server-side; framework relies on it.
+
+`TaskCreateRequest` and `TaskPatchRequest` are simple structs
+mirroring the writable subset of `TaskInfo` (plus `if_match`,
+`lease_owner`, `lease_instance_id`, `lease_duration_seconds`).
+
+### §28. Hosted provider (HTTP)
+
+The hosted provider implements `TaskProvider` over HTTP against the
+Foundry Task Storage service. Selected when the platform-supplied
+environment variable `FOUNDRY_HOSTING_ENVIRONMENT` is set.
+
+Key implementation notes:
+
+- **API version:** Pinned at framework build time. The framework
+  carries one `_API_VERSION` constant (current canonical value:
+  `"v1"`) and passes it as the `api-version` query parameter on
+  every request.
+- **Authentication:** Bearer token from a `TokenCredential`
+  resolved at request time. Scope is `https://ai.azure.com/.default`.
+- **User-Agent:** Identifies the framework + version + runtime
+  (`ai-agentserver-core/<version>`).
+- **Custom error classification:** The provider classifies every
+  non-success response into one of four labels and raises a typed
+  `TransportClassifiedError(classification=<label>)`. The full
+  classifier matrix:
+
+| Condition | Label | Notes |
+|---|---|---|
+| HTTP 409 with body `error.code == "binding_mismatch"` | `evicted` | The agent's binding does not match the platform's view (orphan sandbox). Triggers the local-cleanup sequence. |
+| HTTP 409 with any other body (or malformed body) | `conflict` | Generic lifecycle conflict. |
+| HTTP 412 | `conflict` | Precondition / ETag mismatch. |
+| HTTP 408, 429 | `transient` | Request timeout / rate limited — retryable. |
+| HTTP 5xx | `transient` | Server-side error — retryable. |
+| Network failure, socket timeout, connection reset | `transient` | Transport-level errors. |
+| Body parse error (decode/JSON) on otherwise-success response | `transient` | Treated as transport-level. |
+| HTTP 4xx other than 408/409/412/429 | `permanent` | Caller bug; do not retry. |
+
+`evicted` is the most-load-bearing label: it gates the
+local-cleanup sequence that prevents split-brain when the platform
+has already evicted this sandbox in favor of another.
+
+- **Body parsing (defensive):** The provider parses response bodies
+  defensively — incomplete or non-JSON bodies do NOT crash the
+  framework. Gzip decompression is performed manually (the SDK
+  pipeline's `ContentDecodePolicy` is intentionally excluded so the
+  provider controls decode error handling). When the body cannot be
+  decoded or parsed, the provider raises a
+  `TransportClassifiedError` carrying a `body_prefix` truncated to
+  256 characters (`_BODY_PREFIX_LIMIT`) for operator triage. The
+  prefix never contains bearer tokens or full response bodies.
+- **ETag tracking on every write.** The provider remembers the
+  most recent ETag returned by the server (from any GET, POST, or
+  PATCH response) per task and includes it as `if_match` on every
+  subsequent PATCH. This is what makes per-op 412 policy (§25.3)
+  enforceable from the framework: the framework never has to ask
+  the provider to "go fetch and then PATCH"; the provider already
+  knows the current ETag. The hosted provider's local ETag cache
+  is in-memory and per-process; cross-process correctness is
+  provided by the server-side check itself (412 on mismatch).
+- **Lease-extension piggyback.** Every PATCH carries an updated
+  `lease.expires_at` (computed by the framework as `now +
+  lease_duration`). The framework computes the renewal cadence
+  dynamically by tracking when the last successful PATCH ran
+  (§22 / §31).
+- **Logging policy:** A custom `TaskApiLoggingPolicy` logs
+  request/response method + URL + status + the same 256-char body
+  prefix, with secrets redacted.
+- **Required dependency:** A `TokenCredential` factory must be
+  installed (e.g. via `azure-identity` in the Python implementation).
+  The hosted provider does not function without a credential
+  source.
+
+### §28a. Field validation (shared between providers)
+
+Every PATCH and CREATE write touches the same input-validation
+surface, enforced identically by **both** providers. These rules are
+the wire contract — the service rejects on the wire, the local
+provider rejects pre-write so a developer running locally observes
+the same failures they would observe deployed.
+
+Violations raise an `invalid_request`-coded error (the framework
+classifies these as `_HostedConflict` or a structured
+`TaskPreconditionFailed` — see §39).
+
+#### 28a.1 Field length and format
+
+| Field | Constraint | Required on CREATE? |
+|---|---|---|
+| `id` | regex `^[a-zA-Z0-9_-]{1,128}$` | optional (provider generates if absent) |
+| `agent_name` | length 1..128 after trim | yes |
+| `session_id` | length 1..128 after trim | yes |
+| `title` | length 1..256 after trim | yes |
+| `description` | length 1..1024 after trim | optional |
+| `suspension_reason` | length 1..256 after trim | only when status=suspended |
+| Tag key | regex `^[a-zA-Z0-9_.\-]{1,64}$` | n/a |
+| Tag value | length ≤ 256 chars | n/a |
+| Tag entry count | ≤ 16 total entries | n/a |
+| Attachment key | regex `^[a-zA-Z0-9_.\-]{1,64}$`, non-empty after trim | n/a (see §23.9) |
+
+#### 28a.2 JSON-byte budgets
+
+Sizes measured as UTF-8 byte length of canonical JSON
+(`sort_keys=True`, separators `(",", ":")`).
+
+| Bucket | Max bytes |
+|---|---|
+| `payload` (inline JSON) | 1 MB (1024 × 1024) |
+| `error` (JSON dict) | 64 KB (64 × 1024) |
+| `source` (JSON dict) | 4 KB (4 × 1024) |
+| `attachments` per-value | 2 MB (2 × 1024 × 1024) — see §23.7 |
+| `attachments` total entries | 20 — see §23.7 |
+
+Note: `payload` at 1 MB is intentionally narrower than the per-
+input ceiling. The framework offloads large inputs / outputs into
+`attachments` (§23) to lift each developer-observable input or
+output to the 2 MB per-attachment cap without consuming the
+payload budget. The developer never sees this offload; they
+observe an effective 2 MB limit on `ctx.input` /
+the handler's `return X` for the turn.
+
+#### 28a.3 Source field validation
+
+When `source` is supplied on CREATE, it MUST be a JSON object AND
+contain a non-empty `type` field. Optional structured fields
+(`routine_name`, `routine_run_id`, `dispatch_id`,
+`action_correlation_id`, `created_at`, `updated_at`) are passed
+through verbatim. Unknown fields are preserved (extension data).
+
+`source` is immutable after CREATE (§24, immutable-fields list).
+
+#### 28a.4 Error field validation
+
+When `error` is supplied (PATCH), it MUST be a JSON object. The
+provider requires `message` and `type` as non-empty strings; both
+are part of the developer-observable structured-error envelope
+(§39 — `TaskFailed.error`). The `code` field defaults to `"error"`
+if not supplied.
+
+#### 28a.5 Reserved-on-input status values
+
+- Status `"failed"` is rejected on input. Failures are represented
+  as `status="completed"` with a non-null `error` per §24 / §39.
+- Status `"done"` is a legacy alias for `"completed"` — accepted on
+  read and in list filters; the provider normalizes it to
+  `"completed"` everywhere else. New code uses `"completed"`.
+
+#### 28a.6 Immutable fields on PATCH
+
+These fields are set at CREATE and reject any PATCH that includes
+them:
+
+`id`, `agent_name`, `session_id`, `title`, `description`, `source`.
+
+PATCHes that include any of the above raise `invalid_request`. The
+framework never patches them (they're set at create-time).
+
+### §29. Local provider (file-backed)
+
+Selected when `FOUNDRY_HOSTING_ENVIRONMENT` is NOT set (i.e. local
+dev, tests). State lives under
+`~/.durable-tasks/<agent_name>/<session_id>/<task_id>.json` by
+default; override with `AGENTSERVER_DURABLE_TASKS_PATH`.
+
+Implementation MUST:
+
+- **Enforce every field-validation rule in §28a.** Local rejects on
+  write the same way the service rejects on the wire — same
+  regexes, length caps, byte budgets. A developer running locally
+  must observe the same accept / reject decisions they would
+  observe deployed.
+- **Enforce the state-transition matrix (§24.1), terminal
+  immutability (§24.2), and delete force semantics (§24.3).**
+- **Enforce all lease write rules (§22.1)** — duration bounds,
+  all-or-nothing triplet, conflict on different-owner takeover,
+  EnsureLeaseMatches on `in_progress → pending`, lease renewal only
+  on `in_progress`, force-expire mutual-exclusion with status
+  transition, force-expire ownership check, expiry_count bump on
+  expired-takeover, **`started_at` immutability across lease
+  re-acquisition (set once on first `in_progress`; never updated by
+  expired-lease reclaim, recovery takeover, or suspend/resume)**,
+  `heartbeat_at` stamp on every lease write.
+- **Enforce attachment validation (§23.9) and support the clear-all
+  gesture (§23.10).**
+- **Support list-filter parity (§31a)** — `has_error`, `lease_expired`,
+  pagination via `after` cursor (plain `task_id` for local; opaque
+  service token for hosted), `limit` (default 20, max 100), `order`
+  asc/desc by `created_at`, reject `before`, normalize "done" →
+  "completed" in the status filter, `agent_name` + `session_id`
+  optional.
+- Generate fresh ETags on every write (e.g. SHA of the JSON bytes).
+- Reject `update()` calls whose `if_match` does not match the
+  current ETag and raise `_HostedConflict(_code="etag_mismatch")` —
+  the SAME internal classification the hosted provider produces on
+  412.
+- Apply `payload` PATCH semantics per §F1: when the patch value is
+  a JSON object, shallow-merge into the current payload; for any
+  other JSON type (array, string, number), full-replace; explicit
+  `null` is a no-op (matches the service's `JsonValueKind.Null`
+  branch).
+- Apply `tags` null-as-delete merge, `attachments` null-as-delete
+  merge (per-key) plus top-level clear-all per §23.10 — identical
+  to the hosted provider's semantics.
+- Apply status-transition side effects (§24.x); specifically:
+  - `→ pending` clears the lease AND clears `suspension_reason`.
+  - `→ in_progress` sets `started_at` if null AND clears
+    `suspension_reason` AND clears `completed_at`.
+  - `→ completed` clears the lease AND clears `suspension_reason`
+    AND sets `completed_at` if null.
+  - `→ suspended` clears the lease AND sets `suspension_reason`
+    AND clears `completed_at`.
+- Validate attachment size + count BEFORE writing (raise the
+  internal `_AttachmentTooLarge` / `_AttachmentLimitExceeded` so
+  the framework can re-raise as the developer-facing
+  `InputTooLarge` per §39).
+- Treat missing/corrupt files as `get() -> None`.
+- Detect lease expiry against `expires_at` (UTC) and refuse renewal
+  when an `if_match` mismatch indicates a competing process.
+- **Bump the lease's `expiry_count` on every real lease handoff** (any
+  reclaim where the prior lease's `expires_at` was past) — parity
+  with the hosted server's behavior (§22). Without this, the
+  developer-observable `LeaseInfo.expiry_count` is permanently
+  stuck at 0 in local mode and tests asserting recovery behavior
+  cannot use the local provider. The bump is part of the reclaim
+  PATCH (it does NOT happen on a passive `get()` — `get()` is
+  read-only).
+
+The local provider does NOT spawn HTTP; it does NOT need an event
+loop beyond the framework's; it has no network failure modes. It
+has no concurrency: single-process operation means writes are
+naturally serialized; `_HostedConflict(_code="lease_ownership_changed")`
+(the service's Cosmos-race recovery code) is not reachable in
+local and need not be raised by it.
+
+### §30. Provider auto-selection
+
+The framework decides at TaskManager construction time:
+
+```
+if env.get("FOUNDRY_HOSTING_ENVIRONMENT"):
+    provider = HostedTaskProvider(...)
+else:
+    provider = LocalFileTaskProvider(...)
+```
+
+No developer opt-in / opt-out flag. This is intentional — code is
+identical between local and hosted; the only thing that changes is
+the storage backend selected.
+
+### §31. Background loops
+
+The framework runs THREE classes of background loops while the
+manager is up:
+
+| Loop | Cadence | Scope | Purpose |
+|---|---|---|---|
+| `_periodic_recovery_loop` | Every 300s (framework constant `_PERIODIC_RECOVERY_INTERVAL_SECONDS`). | Process-wide (one per manager). | Reclaim tasks that became reclaimable after cold-start. The `provider.list(...)` call passes `source_type=_SOURCE_TYPE` to scope to framework-owned tasks only. |
+| `lease_renewal_loop` | Dynamic — half the lease duration (default 30s) computed against the per-task last-refresh time so a recent PATCH within the interval fully shadows the next tick. NOT a fixed cadence. | One per active task. | Renew the lease before expiry. |
+| `_timeout_watchdog` | One-shot sleep for `min(remaining, timeout)` seconds. | One per active task that declares a timeout. | Set `ctx.timeout_exceeded` then `ctx.cancel` when budget expires. |
+
+All loops are interruptible via cancel events and MUST exit cleanly
+on `TaskManager.shutdown()`. The lease renewal loop additionally:
+
+- **Computes its next tick dynamically** from the per-task
+  last-refresh time recorded after every PATCH (renewal, metadata,
+  steering, suspend, etc.). If a PATCH refreshed the lease 2s ago
+  and the interval is 30s, the next tick is at +28s, not +30s
+  from loop start. This makes the renewal loop's heartbeat
+  PATCH-count drop to 0 in steady state when the task has any
+  write traffic.
+- After successful renewal (or when the heartbeat is shadowed),
+  invokes an optional steering-poll callback that reads the
+  steering queue and short-circuits the current turn if a new
+  input has arrived since last drain.
+- Signals an external cancel-event on 3 consecutive failures OR
+  immediately on `evicted` classification.
+
+The periodic recovery loop additionally:
+
+- Passes `source_type=_SOURCE_TYPE` to `provider.list(...)` so the
+  scan returns only framework-owned tasks. Foreign-typed records
+  in the same `(agent_name, session_id)` scope are not picked up.
+- Walks `task_info.attachments` for `_steering_input_*` keys whose
+  ref slot is no longer present in `pending_inputs` and PATCHes
+  them away (orphan cleanup — defense in depth against a partial
+  crash between an attachment add and the queue append).
+
+### §31a. List filter parity (internal `list()`)
+
+`Task._list()` is internal — not exported, no developer-facing
+surface. Framework-internal callers (recovery scans, observability
+shims) use `manager.list_tasks(...)` directly. The list operation's
+filter and pagination surface MUST be identical between hosted and
+local so internal call sites compose correctly across the two
+backings.
+
+**Filters** (every implementation MUST support these):
+
+| Filter | Type | Semantics |
+|---|---|---|
+| `agent_name` | string \| None | Match exact. Optional — when null, no agent-scope filter applied. |
+| `session_id` | string \| None | Match exact. Optional — when null, no session-scope filter applied. |
+| `status` | string \| None | Match exact (after legacy `"done"` → `"completed"` normalization per §28a.5). |
+| `source_type` | string \| None | Match `source.type` exact. |
+| `tag` | list[(key, value)] \| None | Match all pairs (AND semantics). Each pair tested as exact equality. |
+| `has_error` | bool \| None | When set, filter to (`true`) tasks with non-null `error` or (`false`) tasks with null `error`. |
+| `lease_expired` | bool \| None | When set, filter to (`true`) tasks whose `lease.expires_at <= now` or (`false`) the opposite. |
+| `lease_owner` | string \| None | Match `lease.owner` exact. |
+| `omit_attachment_values` | bool | When true, returned tasks carry attachment keys with `None` values (skip per-row blob reads for paging through many tasks). Default false. |
+
+**Pagination**:
+
+- `limit` defaults to 20, max 100 (provider clamps over-cap to 100).
+- `after` is an opaque cursor string. The local provider uses
+  plain `task_id` (no Cosmos continuation-token concept). The
+  hosted provider round-trips whatever opaque token the service
+  returns (up to 4096 chars). Internal callers treat it as opaque
+  regardless of which provider is underneath.
+- `before` is **rejected** (forward-only cursor pagination — matches
+  the service's explicit rejection per PR 2122040).
+- `order` accepts `"asc"` or `"desc"`. Default `"desc"`. Sorts by
+  `created_at`.
+
+**Response**:
+
+- `Data` — the page of tasks (or DTOs).
+- `LastId` — the opaque continuation cursor to pass back as `after`
+  on the next call; `None` when no more pages.
+- `HasMore` — `true` when more pages remain.
+
+---
+
+## Part V — Public API surface
+
+This part defines the language-agnostic shapes every implementation
+MUST expose. Names are given in the Python style; idiomatic naming
+in other languages is acceptable but the *behavior* and *parameters*
+MUST match.
+
+### §32. `task` and `multi_turn_task` decorators
+
+The framework exposes **two decorators**. Each wraps an
+`async def fn(ctx: TaskContext[Input]) -> Output` function and
+returns a typed handle of a **distinct class**.
+
+```
+@task(
+    name:    str,                       # REQUIRED
+    title:   str | None = None,         # static; no callable factory
+    timeout: timedelta | None = None,
+    retry:   RetryPolicy | None = None,
+)
+async def one_shot(ctx: TaskContext[I]) -> O: ...
+# → Task[I, O]
+
+@multi_turn_task(
+    name:      str,                     # REQUIRED
+    title:     str | None = None,
+    timeout:   timedelta | None = None,
+    retry:     RetryPolicy | None = None,
+    steerable: bool = False,            # opt-in steering queue
+)
+async def chain(ctx: TaskContext[I]) -> O: ...
+# → MultiTurnTask[I, O]
+```
+
+Both decorators accept ONLY the kwargs listed. Unknown kwargs raise
+`TypeError` at decoration time. `title` is a static string — the
+callable-factory form is not accepted (rarely used, simpler surface,
+cleaner type).
+
+Per-decorator kwarg semantics:
+
+| Kwarg | Meaning |
+|---|---|
+| `name` | Stable identity for recovery routing — written to `source.name` and the `_task_name` tag. Changing it strands existing tasks. |
+| `title` | Human-readable title written to `TaskInfo.title`. |
+| `timeout` | Per-turn cooperative wall-clock watchdog (§14). When elapsed, the framework sets `ctx.timeout_exceeded` then `ctx.cancel`. |
+| `retry` | `RetryPolicy` for handler-raised exceptions (§15). `None` (default) = no retry. |
+| `steerable` | (`@multi_turn_task` only.) Enables `.start()` against an in-flight chain to queue a steering input instead of raising `TaskConflictError` (§12). |
+
+There is no `ephemeral` kwarg. One-shot `@task` is **always**
+ephemeral — the record is deleted on terminal exit. Multi-turn
+`@multi_turn_task` is **never** ephemeral — the chain stays alive
+in `suspended` between turns and is removed only via
+`MultiTurnTask.delete(task_id)` (§35).
+
+All decorator options are recovery-safe: after a crash the framework
+only knows about the registered decorator's view. Per-call option
+overrides are deliberately not supported.
+
+The handler's first parameter MUST be named `ctx`. The framework
+binds positionally, but it validates the name at decoration time so
+guide examples and call sites stay consistent.
+
+The two return classes (`Task[I, O]` and `MultiTurnTask[I, O]`)
+are deliberately distinct (NOT a subclass relationship). The type
+checker can therefore enforce "no `.delete()` on one-shot" and
+"multi-turn `get_active_run` requires `(task_id, input_id)`"
+statically.
+
+#### Framework-owned constants exposed on this surface
+
+| Constant | Value | Where it shows up |
+|---|---|---|
+| `_DEFAULT_LEASE_SECONDS` | `60` | Default lease TTL on `create`. |
+| `_DEFAULT_MAX_PENDING_STEERING` | `9` | Maximum concurrent queued steering inputs. Hard-coded; not developer-tunable. |
+| `_PERIODIC_RECOVERY_INTERVAL_SECONDS` | `300` | Cadence of the periodic recovery loop (§55). |
+| `_INPUT_THRESHOLD_BYTES` | `200 * 1024` | Function-input promotion threshold (§23.2). |
+| `_STEERING_THRESHOLD_BYTES` | `20 * 1024` | Steering-input promotion threshold (§23.2). |
+| `_MAX_ATTACHMENT_SIZE_BYTES` | `2 * 1024 * 1024` | Per-attachment serialized cap (§23.7). |
+| `_MAX_ATTACHMENTS` | `20` | Per-task attachment-entry cap (§23.7). |
+| `_MAX_TASK_ID_LENGTH` | `256` | Max characters in `task_id` (§7). |
+| `_VALID_TASK_ID_RE` | `^[a-zA-Z0-9\-_.:]+$` | Valid `task_id` regex (§7). |
+
+These are framework invariants. Implementations in other languages
+MUST use these exact values for byte-compatibility with the canonical
+Python implementation; any value change would silently change
+recovery / overflow behavior across processes that share a store.
+
+### §33. `Task` (one-shot) and `MultiTurnTask` (multi-turn) handles
+
+The two decorators produce two distinct classes. Their entry-point
+signatures differ in identifier rules: one-shot `task_id` is
+OPTIONAL (auto-generated as a GUID when omitted, per the 1:1
+one-shot invariant `task_id == input_id`); multi-turn `task_id` is
+MANDATORY (it identifies the chain).
+
+```
+class Task(Generic[Input, Output]):
+    name: str
+
+    async def run(
+        self, *,
+        input:            Input,
+        task_id:          str | None = None,
+        input_id:         str | None = None,
+        if_last_input_id: str | None = None,
+    ) -> Output: ...
+
+    async def start(
+        self, *,
+        input:            Input,
+        task_id:          str | None = None,
+        input_id:         str | None = None,
+        if_last_input_id: str | None = None,
+    ) -> TaskRun[Output]: ...
+
+    async def get_active_run(
+        self, task_id: str,
+    ) -> TaskRun[Output] | None: ...
+
+
+class MultiTurnTask(Generic[Input, Output]):
+    name: str
+
+    async def run(
+        self, *,
+        task_id:          str,
+        input:            Input,
+        input_id:         str | None = None,
+        if_last_input_id: str | None = None,
+    ) -> Output: ...
+
+    async def start(
+        self, *,
+        task_id:          str,
+        input:            Input,
+        input_id:         str | None = None,
+        if_last_input_id: str | None = None,
+    ) -> TaskRun[Output]: ...
+
+    async def get_active_run(
+        self, task_id: str, input_id: str,
+    ) -> TaskRun[Output] | None: ...
+
+    async def delete(self, task_id: str) -> None: ...
+```
+
+`.run()` blocks until the run / turn reaches a terminal-for-this-
+caller state and returns the handler's `Output` directly, or raises
+a typed exception (§39).
+
+`.start()` returns immediately with a `TaskRun[Output]` handle the
+caller can `await` (sugar for `.result()`), `await .result()` on,
+or `.cancel()`. The handle's public surface is described in §35.
+
+Both `.run` and `.start` accept the same `input_id` /
+`if_last_input_id` chain primitives (§11). Implementations MUST
+raise `TypeError` at the call site when `if_last_input_id` is
+provided without `input_id`.
+
+`get_active_run` looks up the currently-running run / turn:
+
+- One-shot (`Task.get_active_run(task_id)`): (1) checks the
+  in-process active-task table; if found, returns the bound
+  `TaskRun`. (2) Otherwise consults the store via
+  `provider.get(task_id)`. If the record exists with status
+  `in_progress` and the lease is dead (per `_lease_is_dead`,
+  §22), this method INLINE-RECLAIMS the task — same code path
+  as `.start()`'s "reclaim sub-case" — and returns a `TaskRun`
+  bound to the newly-spawned recovery execution. If the record
+  does not exist OR status is not reclaimable from this
+  process's perspective, returns `None`. Implementers SHOULD
+  make this method idempotent against a recently-completed
+  reclaim.
+- Multi-turn (`MultiTurnTask.get_active_run(task_id, input_id)`):
+  returns the in-flight handle iff the chain is currently
+  running with the **exact** `input_id`; otherwise `None`. The
+  required `input_id` argument prevents accidental cross-turn
+  attach.
+
+`MultiTurnTask.delete(task_id)` force-removes the chain: cancels
+the in-flight turn (active caller's `.result()` resolves with
+`TaskCancelled()`), resolves all queued steerer callers' futures
+with `TaskCancelled()`, and force-deletes the record. Idempotent
+(no-op if the chain is already gone).
+
+There is no per-call override for `title` / `retry` / `steerable` /
+`timeout` — all of those are decorator-configured for recovery
+safety.
+
+The `Task` class has **no** `.delete()` method. One-shot tasks
+are always ephemeral; the framework deletes the persisted record
+on terminal exit.
+
+### §34. `TaskContext`
+
+The single argument every handler receives. Read-only properties:
+
+| Property | Type | Description |
+|---|---|---|
+| `input` | `Input` | The typed input value. |
+| `task_id` | `str` | Task identity. |
+| `input_id` | `str` | Per-turn input identity. For one-shot, defaults to `task_id` (1:1 invariant). For multi-turn, the framework auto-generates a GUID per turn unless the caller supplied one. |
+| `entry_mode` | `"fresh" \| "resumed" \| "recovered"` | Why this turn started (§6). |
+| `metadata` | `TaskMetadata` | Callable namespace facade (§17). |
+| `cancel` | event-like (`asyncio.Event` in Python) | Set when cancellation is requested for any reason. |
+| `shutdown` | event-like | Set when the container is shutting down. Precondition for `exit_for_recovery()`. |
+| `timeout_exceeded` | `bool` | True once the per-turn timeout fired. Set BEFORE `cancel` (§13 ordering invariant). Never reset within a turn. |
+| `cancel_requested` | `bool` | True once external `TaskRun.cancel()` was called. Set BEFORE `cancel`. Never reset within a turn. |
+| `pending_input_count` | `int` | Live count of currently queued steering inputs (multi-turn `steerable=True` only). Reads as `0` for non-steerable tasks AND for any provider failure (failure-tolerant). Computed on every access so it reflects inputs queued mid-handler. |
+| `is_steered_turn` | `bool` | True iff this turn was constructed by the steering-drain code path. False otherwise. |
+| `retry_attempt` | `int` | Cross-lifetime retry counter (§15). |
+
+Public method:
+
+```
+async def exit_for_recovery() -> None: ...
+```
+
+`exit_for_recovery()` — see §16. MUST raise `RuntimeError` if
+`shutdown.is_set() == False`; otherwise releases the lease without
+writing a terminal status, leaves the task `in_progress`, and raises
+`TaskDeferred` upward to the caller of `.result()`. The recovery
+scanner re-invokes the handler with the persisted `payload["input"]`
+in a future process lifetime.
+
+`TaskContext` has NO `suspend()` method. Multi-turn handlers end a
+turn with bare `return X`; the framework treats the return as an
+implicit suspend (chain stays alive in `suspended`; caller's
+`await run.result()` resolves to `X`).
+
+The handler's first parameter MUST be named `ctx`. The framework
+binds positionally, but it validates the name at decoration time so
+guide examples and call sites stay consistent.
+
+Implementations MUST NOT expose public setters for any cause boolean
+or counter. They are framework-owned read-only fields.
+
+### §35. `TaskRun`
+
+The handle returned by `.start()`. Slim public surface:
+
+| Member | Type | Description |
+|---|---|---|
+| `run.task_id` | `str` | Task identity. |
+| `run.input_id` | `str` | Per-turn input identity. |
+| `run.metadata` | `TaskMetadata` | Live reference to the run's metadata facade (the same instance the handler sees as `ctx.metadata`). |
+| `await run.result()` | `Output` | Block until terminal-for-this-caller; returns the handler's typed return value directly OR raises a typed exception (§39). |
+| `await run.cancel()` | `None` | Signal cooperative cancellation. MUST set `ctx.cancel_requested = True` BEFORE setting `ctx.cancel` (ordering invariant — handler observing `ctx.cancel` is guaranteed to see at least one cause boolean already True). The handler picks the terminal shape. |
+| `await run` | `Output` | Awaiting the run directly is sugar for `await run.result()`. |
+
+That is the entire surface. The handle deliberately has NO
+`status` / `delete` / `refresh` / `lease_expiry_count`:
+
+- Chain-level deletion uses `MultiTurnTask.delete(task_id)`.
+- Read-only inspection of the persisted record goes through
+  the task manager's provider (`await manager.provider.get(task_id)`
+  returns the internal `TaskInfo`).
+- Lease bookkeeping is framework-internal — developers don't
+  observe it.
+
+**`TaskRun` is NOT an async iterable.** It does not implement
+`__aiter__` / `__anext__`; there is no `async for chunk in run`
+syntax. Incremental streaming is a peer subpackage
+(`azure.ai.agentserver.core.streaming`, Part VI), NOT a property
+of the task handle. Producers emit to a `streams` registry id;
+consumers attach via `streams.get(id).subscribe(after=...)`.
+
+The two surfaces are decoupled because a stream may span multiple
+task turns, multiple functions writing to the same id, or a
+non-`@task` producer. Coupling stream iteration to `TaskRun`
+would re-couple lifetime in ways the SOT intentionally avoids. Other-
+language implementers MUST NOT add task-handle iteration as
+"syntactic sugar" — it would re-introduce the very coupling we
+removed. If a developer wants a single `await run` plus an
+incremental stream, they explicitly attach to the streaming
+registry (Part VI).
+
+
+### §35a. Read-only inspection — internal
+
+There is no `TaskSnapshot` type and no `Task.get(task_id)` method
+on the public surface. Read-only inspection of a persisted task
+record is done through the task manager's provider directly —
+`await manager.provider.get(task_id)` returns the internal
+`TaskInfo` envelope, which is the framework's own storage shape
+(see §19). The public decorator surface stays small and
+write-shaped on purpose: anything an external observer wants
+about a task record is available on `TaskInfo`, and the framework
+does not project a parallel "snapshot wrapper" onto the public
+surface.
+
+For active-execution inspection (attach to an in-flight run from
+a different coroutine or request handler), use
+`Task.get_active_run(task_id)` / `MultiTurnTask.get_active_run(task_id,
+input_id)` — both return a `TaskRun` handle bound to the live
+execution (or `None` if the task is not currently in flight in
+this process and cannot be reclaimed inline).
+
+### §36. `TaskRun.result()` returns `Output` directly
+
+`await TaskRun.result()` (and equivalently `await task_run`)
+resolves to the handler's typed return value of type `Output` —
+no wrapper class, no envelope. Failure / cancellation /
+deferral conditions surface as typed exceptions raised at the
+`await` site (see §39).
+
+There is no `TaskResult` wrapper class and no `Suspended` sentinel
+on the public surface. Multi-turn handlers use a bare `return X`
+to end a turn; the chain implicit-suspends and the caller's
+`await run.result()` resolves to `X` directly. The framework does
+not persist `X` anywhere in the task record — `X` lives only in
+the in-process future the caller is awaiting.
+
+
+### §37. `TaskMetadata`
+
+Mutable mapping-like type returned by `ctx.metadata` and
+`ctx.metadata(name)`. See §17 for semantics.
+
+Required surface:
+
+```
+metadata["key"]                # __getitem__
+metadata["key"] = value        # __setitem__
+"key" in metadata              # __contains__
+for k in metadata: ...         # __iter__
+metadata.get("key", default)   # MutableMapping behavior
+metadata.to_dict()             # plain dict snapshot
+await metadata.flush()         # persist this namespace only
+await metadata.increment(key)  # atomic numeric increment
+await metadata.append(key, v)  # append to a list-valued key
+```
+
+**Note: `_flush_all()` is framework-internal.** The framework's
+internal "persist every dirty namespace in one pass" helper is
+named with a leading underscore (`_flush_all`) on every public
+surface — both as a method on `TaskMetadata` and anywhere the
+framework calls it. The manager invokes `_flush_all` at suspend
+/ complete / fail / drain / `exit_for_recovery` boundaries to
+make every namespace the handler touched durable in one PATCH.
+
+The underscore prefix is the Python-canonical signal for
+"package-private; not part of the documented developer surface."
+It is NOT exported from `durable/__init__.py`, has no developer
+guide entry, and has no documented use case at the developer
+layer: per-namespace `metadata.flush()` is the only fence pattern
+developers should reach for (to commit a specific namespace before
+a side-effect operation). Other-language implementers MUST surface
+the equivalent helper at package-private visibility (or omit it
+from the public API entirely) — never as a documented developer
+API.
+
+#### Namespace facade behavior
+
+`TaskMetadata` is implemented as a **callable namespace facade**:
+
+- **Default namespace.** `ctx.metadata` itself binds to
+  `payload["metadata"]`. All dict-like operations on `ctx.metadata`
+  directly target this namespace.
+- **Named namespaces.** `ctx.metadata(name)` returns a sibling
+  `TaskMetadata` instance bound to `payload["metadata:<name>"]`.
+- **Auto-vivification.** A named namespace does NOT have to exist
+  in the persisted record before access — calling
+  `ctx.metadata("ns")` creates an in-memory empty namespace that is
+  persisted on first flush. The corresponding `payload["metadata:ns"]`
+  key materializes only when there is something to write.
+- **Sibling-independence.** A write to one namespace does NOT dirty
+  any other namespace. `metadata.flush()` on namespace `A` does NOT
+  persist namespace `B`.
+- **Restoration.** On every handler entry, the framework constructs
+  the root `TaskMetadata` instance via a restoration helper (e.g.
+  `TaskMetadata.from_payload(payload)`) that walks every
+  `metadata[:...]` key in the payload and pre-populates each
+  namespace with its persisted contents. Handler reads from any
+  named namespace see the post-restoration state without an
+  additional round-trip.
+
+#### Flush semantics
+
+- `metadata.flush()` persists the namespace it is called on, atomically
+  against the lease (the framework piggybacks lease ownership on
+  the PATCH so a flush also acts as a heartbeat).
+- **Framework-only auto-flush** at every terminal-of-turn boundary
+  walks every dirty namespace (the internal `_flush_all` helper
+  described in §37). Handlers do not need explicit flushes for
+  durability across a graceful boundary; explicit `flush()` is
+  for mid-handler fence semantics across a CRASH.
+- Flush failures are logged at WARN, not raised. A failed flush
+  retries on the next flush call or the next auto-flush boundary.
+- Flush is **safe to call from a finished handler** (no-op if the
+  namespace has been auto-flushed and not subsequently dirtied).
+
+### §38. `RetryPolicy`
+
+```
+class RetryPolicy:
+    initial_delay:        timedelta = timedelta(seconds=1)
+    backoff_coefficient:  float     = 2.0
+    max_delay:            timedelta = timedelta(seconds=60)
+    max_attempts:         int       = 3
+    retry_on:             tuple[type[Exception], ...] | None = None
+    jitter:               bool      = True
+
+    # Presets:
+    @classmethod
+    def exponential_backoff(cls, ...) -> RetryPolicy: ...
+    @classmethod
+    def fixed_delay(cls, delay: timedelta, ...) -> RetryPolicy: ...
+    @classmethod
+    def linear_backoff(cls, ...) -> RetryPolicy: ...
+    @classmethod
+    def no_retry(cls) -> RetryPolicy: ...
+```
+
+`max_attempts` counts total tries including the first (so
+`max_attempts=3` means 1 original + 2 retries). `retry_on=None`
+means retry every exception type; pass a tuple to scope. The delay
+calculation is exponential by default; if `jitter=True`,
+implementations MUST add randomized fractional jitter to avoid
+synchronized retries across instances.
+
+### §39. Error taxonomy
+
+The public exception surface is seven types. Every developer-observable
+condition the framework can signal surfaces through one of these. Each
+carries only **new information** the caller doesn't already have (the
+caller already knows the `task_id` they passed, and has `task_id` /
+`input_id` on the `TaskRun` handle they hold); exceptions do not
+redundantly carry `task_id`.
+
+#### Outcome exceptions (raised from `.run()` / `TaskRun.result()`)
+
+| Exception | Fields | When |
+|---|---|---|
+| `TaskFailed` | `error: TaskErrorDict \| TaskExhaustedRetriesErrorDict` | Handler raised an unhandled exception (or retries were exhausted). Inspect `error` for the structured diagnostic. |
+| `TaskCancelled` | — (bare) | This run / turn was cancelled: cooperative `TaskRun.cancel()` honoured by the handler raising `CancelledError`; per-turn `timeout=` watchdog honoured the same way; queued steerer cancelled before promotion; `MultiTurnTask.delete()` invalidated an in-flight run. Multi-turn chains stay alive (queued steerers promote per §11); one-shot is gone. |
+| `TaskDeferred` | — (bare) | Handler called `ctx.exit_for_recovery()` during shutdown. This lifetime is deferring — the task stays `in_progress` and the recovery scanner re-invokes the handler in a future process lifetime. Semantically DISTINCT from `TaskCancelled`. |
+
+`TaskCancelled` MUST NOT inherit `asyncio.CancelledError` —
+generic `except CancelledError` handlers would swallow it
+silently, which is the wrong behavior for a task-level signal.
+
+`TaskCancelled` and `TaskDeferred` carry **no fields**. Cancellation
+causes can compound (e.g., `cancel_requested` AND `timeout_exceeded`
+fire together) and the framework cannot deterministically pick a
+single "reason" string. Causes are observable via the structured
+failure log (§structured-logs) and via the handler-side cause
+booleans on `TaskContext` (§34). For deferral, the meaning is
+uniform — there is nothing to disambiguate.
+
+`TaskFailed.error` is a `TypedDict`. The framework constructs one
+of two shapes:
+
+```
+class TaskErrorDict(TypedDict):
+    type: str         # exception class name, e.g. "ValueError"
+    message: str      # str(exc)
+    traceback: str    # traceback.format_exc()
+
+class TaskExhaustedRetriesErrorDict(TypedDict):
+    type: Literal["exhausted_retries"]
+    attempts: int
+    last_error: str
+    last_error_type: str
+    traceback: str
+```
+
+The `TaskFailed.error` field union is `TaskErrorDict |
+TaskExhaustedRetriesErrorDict`; type-checkers can discriminate on
+the `type` literal.
+
+#### Pre-resolution exceptions (raised from `.run()` / `.start()`)
+
+| Exception | Fields | When |
+|---|---|---|
+| `TaskConflictError` | `current_status: str` | `.run` / `.start` against a task in a state that can't accept the call: one-shot in_progress or completed; non-steerable multi-turn in_progress. `current_status` lets the caller distinguish in-flight (attach via `get_active_run`) vs. terminal (need a new `task_id` or accept the existing outcome). |
+| `LastInputIdPreconditionFailed` | `actual_last_input_id: str \| None` | The `if_last_input_id` precondition does not match. Caller already knows what they passed via `if_last_input_id=`; `actual` is the new info. |
+| `SteeringQueueFull` | — (bare) | Multi-turn `steerable=True` only. Steering queue at capacity. Caller backs off / surfaces 429. |
+| `InputTooLarge` | — (bare) | Input write rejected because the serialized input exceeds the per-input cap. Caller shrinks or chunks the input. |
+
+#### Net surface
+
+Seven exceptions: `TaskFailed`, `TaskCancelled`, `TaskDeferred`,
+`TaskConflictError`, `LastInputIdPreconditionFailed`,
+`SteeringQueueFull`, `InputTooLarge`. Plus two `TypedDict`s
+(`TaskErrorDict`, `TaskExhaustedRetriesErrorDict`) and the public
+type alias `JSONValue` for the metadata value space.
+
+#### Internal exceptions (NOT part of the public surface)
+
+| Exception | Purpose |
+|---|---|
+| `TaskNotFound` | Internal classifier raised by the manager / provider when a record is missing. The public surface absorbs this: `MultiTurnTask.delete` is idempotent (no-op on missing record), `get_active_run` returns `None` on missing, and there is no `.get()` / `.refresh()` on `TaskRun`. Developers never catch `TaskNotFound`. |
+| `TaskPreconditionFailed` | Internal precondition-failure base. Specific precondition failures get their own typed subclass (e.g., `LastInputIdPreconditionFailed`); the bare base is not exported. |
+| `EtagConflict` | Optimistic concurrency conflict at the provider boundary. Framework retries internally; only escapes for low-level callers manipulating etags directly. |
+| `_HostedConflict(_code: str, status_code: int, ...)` | Single internal type the hosted provider's response classifier raises for service responses with a structured error code. The framework matches on `_code` to dispatch (see §39.1). The local provider raises the same type with the same `_code` directly, so internal call-site code is provider-agnostic. |
+| `_AttachmentTooLarge` / `_AttachmentLimitExceeded` | Provider-internal cap-violation signals. Framework catches at attachment-write sites and re-raises as `InputTooLarge` (input writes) based on the attachment-key prefix. |
+| `TransportClassifiedError(classification: "transient" \| "evicted" \| "conflict" \| "permanent")` | Hosted provider's classification wrapper around lower-level HTTP failures. Internal to hosted provider; framework dispatches on `classification`. |
+
+The underscore prefix on `_AttachmentTooLarge` /
+`_AttachmentLimitExceeded` / `_HostedConflict` is the Python-canonical
+signal for "package-private; never imported by developer code." Other-
+language implementations MUST place the equivalent exceptions at
+package-private visibility — never as documented developer-facing
+types.
+
+#### 39.1 Service error codes → internal `_HostedConflict` → developer-facing
+
+The hosted task service emits distinct error codes per condition.
+The hosted provider's response classifier wraps each in
+`_HostedConflict(_code=...)`. The framework's lifecycle code then
+matches on `_code` and either retries silently or translates into
+a developer-facing exception. The local provider raises the same
+`_HostedConflict(_code=...)` directly so the framework's dispatch
+table works against either backing.
+
+| Service `code` | HTTP | When emitted | Framework action |
+|---|---|---|---|
+| `task_immutable` | 409 | PATCH on a `completed` task (except no-op completed → completed) | Translate → `TaskConflictError(current_status="completed")`. |
+| `invalid_state_transition` | 409 | PATCH whose declared status transition is not in §24.1 matrix | **Framework bug** — the framework drives transitions, not the developer. Log + raise `RuntimeError`. |
+| `lease_held_by_another` | 409 | Lease acquisition / renewal against a record whose lease is held by a different owner (and not expired) | Translate → `TaskConflictError(current_status="in_progress")`. |
+| `task_already_exists` | 409 | CREATE on an existing `task_id` | Framework's lifecycle resolution branches on existing task; this only escapes if the framework's `.start()` race-resolution path is broken. Translate → `TaskConflictError(current_status=<observed status>)`. |
+| `lease_ownership_changed` | 409 | Service Cosmos race: between read and write, another owner stole the lease | Hosted-only. Treat as `lease_held_by_another`. |
+| `etag_mismatch` | 412 | If-Match precondition failure | **Retry** with re-read (transparent to developer); after bounded retries exhausted, escape as `EtagConflict` (internal — only escapes to low-level callers). |
+| `invalid_request` | 400 | Any field-validation violation (§28a) or lease-rule violation (§22.1) or delete-without-force on non-terminal (§24.3) | Translate → internal `TaskPreconditionFailed`. For the specific `if_last_input_id` mismatch, translate → `LastInputIdPreconditionFailed(actual_last_input_id=<stored>)`. |
+
+**Zero new developer-visible exception types from this table.**
+All translation targets above are either in the seven-name public
+surface or are internal types absorbed before reaching developer
+code. The internal `_HostedConflict._code` strings never appear in
+developer code, error messages, docstrings, or exported names —
+they are pure dispatch keys.
+
+---
+
+## Part VI — Streaming primitive
+
+### §40. Why streaming is decoupled from `@task`
+
+Streaming is a **separate, peer subpackage** of
+`azure-ai-agentserver-core` — it does not nest under `@task`. Three
+reasons:
+
+1. **Lifecycle.** A stream can span multiple `@task` invocations
+   (multi-turn / multi-function fan-in); coupling its lifetime to a
+   single handler's body breaks reconnection on multi-turn UIs.
+2. **Polymorphism.** The same protocol is used by handlers that
+   are not `@task` decorated (plain handlers, HTTP layer, ad-hoc
+   producers).
+3. **Pay-only-for-what-you-use.** Handlers that don't stream pay
+   nothing: no buffer, no factory, no registry tombstone.
+
+The decorator carries NO streaming-related kwarg. `TaskContext`
+has NO streaming attribute. Handlers that want to stream do this
+explicitly:
+
+```python
+from azure.ai.agentserver.core.streaming import streams
+
+stream = await streams.get_or_create(stream_id)
+await stream.emit({"event": "progress"})
+...
+await stream.emit(final_chunk, close=True)
+```
+
+### §41. `EventStream` protocol
+
+The data-flow surface (lifecycle is the registry's job, §42).
+
+```
+class EventStream(Protocol):
+    async def emit(payload: Any, *, close: bool = False) -> None: ...
+    async def close() -> None: ...
+    def subscribe(*, after: int | None = None) -> AsyncIterator[Any]: ...
+    async def last_cursor() -> int | None: ...
+```
+
+Method contracts:
+
+- **`emit(payload, close=False)`** — multicast `payload` to all
+  currently-attached subscribers. The framework never inspects,
+  validates, or rewrites the payload. If `close=True`, the emit
+  and the close-of-stream are **observably atomic for currently-
+  attached subscribers**: every subscriber attached BEFORE this
+  call sees BOTH the payload AND the end-of-stream signal.
+  Late-subscriber behavior depends on backing:
+  - **Live-only backings** (`BroadcastEventStream`): late
+    subscribers see neither the payload nor any earlier history.
+  - **Replay backings** (`ReplayEventStream`,
+    `FileBackedReplayEventStream`): late subscribers may replay
+    the buffered payload (including the one delivered with
+    `close=True`) AND then terminate cleanly, subject to TTL
+    eviction (§46).
+
+  Raises `EventStreamClosedError` if already closed,
+  `EventStreamNotFoundError` if destroyed.
+
+- **`close()`** — transition active -> closed. **Idempotent**:
+  calling on already-closed or destroyed stream is a no-op (never
+  raises). Subscribers attached at close drain remaining items
+  then their iterators terminate cleanly.
+
+- **`subscribe(after=N)`** — return an `AsyncIterator` over
+  payloads. NOT a coroutine: do not `await` it; immediately use it
+  with `async for` / `aiter()` / `anext()`. If `after=N` is
+  supplied AND the active backing supports cursored replay,
+  yield only payloads whose cursor value is strictly greater than
+  `N`; backings without cursor support silently ignore non-`None`
+  values. Raises `EventStreamNotFoundError` synchronously at the
+  call site if the stream is destroyed.
+
+- **`last_cursor()`** — return the highest cursor seen so far, or
+  `None`. While active: highest persisted cursor (`None` if zero
+  emits or backing has no cursor support). After close: the last
+  cursor seen even if those events have since been TTL-evicted —
+  this is load-bearing for the file-backed replay's rehydration
+  path. After destroy: raises `EventStreamNotFoundError`.
+
+  `last_cursor()` is a **read-only watermark query**. It does NOT
+  trigger the destroy transition (which is driven by the TTL-since-
+  close clock, §46). Implementations MUST keep it side-effect-free.
+
+  `last_cursor()` is the EMITTER's recovery primitive. It is NOT
+  the workflow-recovery primitive — workflow watermarks (what work
+  is done) belong in `ctx.metadata`, batched per side-effecting
+  operation, NEVER in stream cursors.
+
+### §42. The `streams` registry
+
+A process-level singleton that owns the lifecycle of all SDK-bundled
+`EventStream` instances:
+
+```
+streams.use_in_memory_live()                                    # configurator (sync)
+streams.use_in_memory_replay(cursor_fn=..., ttl_seconds=...)    # configurator (sync)
+streams.use_file_backed_replay(storage_dir=..., cursor_fn=...,
+                               ttl_seconds=..., serializer=...,
+                               deserializer=...)                # configurator (sync)
+
+await streams.get(id)                  # raises NotFound if never registered
+await streams.get_or_create(id)        # atomic per id
+await streams.delete(id)               # idempotent; installs tombstone
+```
+
+Six methods total: three sync configurators + three async
+lifecycle methods.
+
+Atomicity: `get_or_create(id)` MUST be safe under concurrent
+callers. The implementation uses a per-id lock to prevent
+split-brain construction when two coroutines race to create the
+same id. The lock is acquired only on the slow path (first
+access for an id); subsequent `get_or_create` calls return the
+cached instance without taking the lock.
+
+Tombstones: `delete(id)` causes the next `get(id)` against that
+id to raise `EventStreamNotFoundError`. The registry uses an
+internal "destroyed" marker to remember the deletion (the
+"delete is symmetric with `rm -f` but still leaves a marker"
+rule), but the **error surface is unified**: every "the id is
+not currently a live stream" condition raises
+`EventStreamNotFoundError`. This covers all three paths
+into the missing-stream state:
+
+- the id was never registered;
+- the id was registered and then explicitly `delete(id)`d;
+- the id was registered, then transitioned to Closed, then the
+  TTL-since-close clock elapsed (§46) and the registry
+  auto-tombstoned the id.
+
+The next `get_or_create(id)` against a tombstoned id clears the
+tombstone and constructs a fresh stream.
+
+Note: `get(id)` does NOT itself install a tombstone — only
+`delete(id)` and the TTL-since-close auto-transition do.
+
+Why this is one error type:
+
+The previous design distinguished `EventStreamGoneError` (the
+resource once existed and is destroyed) from
+`EventStreamNotFoundError` (the resource was never registered).
+That distinction has no actionable value at the consumer:
+either way, the right behavior is the same (subscribe to a new
+id, or treat this id as missing). It also leaked the registry's
+internal bookkeeping (tombstone vs no-tombstone) into the
+developer-facing API. Collapsing into a single
+`EventStreamNotFoundError` makes the rule one-line: "any
+attempt to use an id that is not currently a live stream raises
+`EventStreamNotFoundError`."
+
+#### Process-wide factory selection
+
+Each `use_*` configurator replaces the registry's stream factory
+**globally for the process**. Subsequent `get_or_create(id)` calls
+use the new factory; existing stream instances are unaffected.
+Configurators are synchronous and idempotent. The default factory
+(if no configurator is called) produces `BroadcastEventStream`
+instances.
+
+This makes "configure once at app startup, use everywhere"
+trivial: a single `streams.use_in_memory_replay(ttl_seconds=600)`
+at process init is the complete configuration step. There is no
+per-stream factory override on `get_or_create`.
+
+### §43. Stream lifecycle states
+
+Every concrete `EventStream` instance has exactly **two** states:
+
+```
+              emit*
+            ┌──────────┐
+            │          │
+            ▼          │
+┌──────────────────┐   │   ┌─────────────────┐
+│      Active      │ ──┴── │      Closed     │
+└──────────────────┘       └─────────────────┘
+        │                          │
+        │                          │
+        │                          │  (then: registry tombstones
+        │                          │   the id on delete() or
+        │                          │   TTL-since-close elapse —
+        │                          │   see §42, §46. The next
+        │                          │   get(id) raises
+        │                          │   EventStreamNotFoundError.)
+        └─── delete() ─────────────┘
+```
+
+State semantics:
+
+- **Active.** Accepts `emit` and `subscribe`. Always-the-initial
+  state on construction. `close()` -> Closed (idempotent on
+  already-closed). `delete()` removes the instance from the
+  registry and tombstones the id; subsequent `get(id)` raises
+  `EventStreamNotFoundError`.
+- **Closed.** `emit` raises `EventStreamClosedError`.
+  `subscribe()` still works for replay backings (yields drained
+  history, then terminates cleanly when buffer is exhausted or
+  TTL-since-close elapses). `last_cursor()` still works.
+  `close()` is a no-op. `delete()` removes the instance from
+  the registry and tombstones the id.
+
+There is **no per-instance "destroyed" state** — destruction
+happens at the registry level. The framework tracks an instance
+as Active or Closed; once the registry tombstones the id, the
+instance reference is dropped and any cached reference held by
+a caller is stale (further operations on it raise
+`EventStreamNotFoundError` because the registry routes the call
+to a tombstoned id).
+
+The TTL-since-close auto-transition (§46) governs when the
+registry decides to tombstone a Closed stream's id. For replay
+backings constructed with `ttl_seconds`: once the stream is
+closed, the framework starts a `close_time + ttl_seconds`
+clock; when it elapses, the registry tombstones the id. This is
+deterministic (time-based, not buffer-state-based) and works
+whether or not anyone is currently subscribed.
+
+`BroadcastEventStream` (live-only) and any other backing
+constructed without `ttl_seconds` do NOT auto-tombstone; they
+only tombstone via explicit `delete(id)`.
+
+### §44. Concrete backings
+
+Three SDK-bundled implementations:
+
+| Backing | Use case | Behavior |
+|---|---|---|
+| `BroadcastEventStream` | Live consumers attach before the producer starts. | No buffer. `subscribe(after=...)` is accepted but the `after` argument is silently ignored. Late subscribers miss earlier events. `subscribe()` returns an iterator over events emitted AFTER attach. Multi-subscriber (each gets a private cursor/queue). Goes away ONLY via explicit `delete(id)` — no TTL auto-tombstone. |
+| `ReplayEventStream` | Late subscribers need history. | Per-stream buffer retains all events. `subscribe(after=N)` is honored iff `cursor_fn` was supplied to the configurator; otherwise `after` is ignored. `ttl_seconds`, if supplied, drives per-event eviction (regardless of Active/Closed — events older than `now - ttl_seconds` are evicted from the buffer; see §46). When Closed AND `close_time + ttl_seconds` elapses, the registry auto-tombstones the id. |
+| `FileBackedReplayEventStream` | Crash-recoverable history (multi-turn UIs, durable response streaming). | Persists each emit to `storage_dir/<id>.jsonl`. **Constructor rehydrates** from an existing file if present — restart-safe. Same per-event TTL + close-clock semantics as `ReplayEventStream`. Optional `serializer: Callable[[Any], bytes]` and `deserializer: Callable[[bytes], Any]` for non-JSON payloads (default JSON). `delete()` (and TTL-since-close auto-tombstone) clean up the file BEFORE the registry tombstones the id. |
+
+Per-backing TTL + tombstone matrix:
+
+| Backing | Per-event TTL eviction | Close-clock tombstone |
+|---|---|---|
+| `BroadcastEventStream` | N/A (no buffer) | Never (no `ttl_seconds`) |
+| `ReplayEventStream` (no `ttl_seconds`) | Never (events live forever in buffer) | Never (no clock) |
+| `ReplayEventStream` (with `ttl_seconds=T`) | Active OR Closed: events older than `now - T` evicted from buffer | Closed AND `now > close_time + T` -> registry tombstones id |
+| `FileBackedReplayEventStream` (no `ttl_seconds`) | Never | Never |
+| `FileBackedReplayEventStream` (with `ttl_seconds=T`) | Same as above; file truncated when events evicted | Same as above; file removed BEFORE tombstone |
+
+Constructor selection happens through the registry's
+configurators (`use_in_memory_live()`, etc.) — application code at
+startup picks the backing once and `streams.get_or_create(id)`
+constructs that kind of stream from then on.
+
+Switching backings mid-flight is allowed (configurator calls are
+idempotent; subsequent `get_or_create` uses the new factory) but
+existing stream instances are unaffected.
+
+### §45. Cursor and `subscribe(after=...)`
+
+A cursor is a strictly increasing integer extracted from each
+payload via a developer-supplied `cursor_fn: Callable[[payload], int]`
+passed to the configurator. The framework:
+
+- Never assumes the payload has any particular field
+  (`sequence_number`, `event_id`, etc.).
+- **Designed for `int` cursors** (string cursors introduce the
+  silent-wrong-comparison footgun — `"10" > "9"` is False).
+  **Known gap (canonical Python implementation):** the registry
+  does NOT validate the return type of `cursor_fn` at construction
+  or use time; an implementation that returns non-int values will
+  silently mis-compare. Other-language implementers SHOULD add the
+  validation (`cursor_fn(sample) is int`) at configurator time so
+  the failure is loud, not silent.
+- Uses `cursor_fn` lazily: only when `subscribe(after=...)` is
+  called or `last_cursor()` is asked.
+
+Replay backings without a `cursor_fn` accept `subscribe(after=N)`
+calls but silently ignore the `after` argument and yield the full
+retained history.
+
+### §46. TTL eviction and the close-clock (replay backings)
+
+When constructed with `ttl_seconds=T`, replay backings:
+
+**Per-event eviction** (runs regardless of Active/Closed):
+
+- Stamp each emitted event with an `emit_time`.
+- Evict events whose age >= `T`, on `emit()` and `subscribe()`.
+  The buffer never holds events older than `T` once an operation
+  triggers an eviction sweep.
+
+This rule is what bounds long-running active streams that emit
+continuously for hours or days — the buffer's memory footprint is
+proportional to the emit-rate × `T`, not to the total duration.
+Without per-event TTL on active streams, a multi-day producer
+would buffer indefinitely.
+
+**Close-clock auto-tombstone** (Closed only):
+
+- When the stream transitions to Closed, the framework records
+  `close_time` and starts a wall-clock countdown for `T`.
+- When `now >= close_time + T`, the registry tombstones the id
+  (file-backed: removes the file FIRST). The next `get(id)` raises
+  `EventStreamNotFoundError`.
+
+Why a close-clock, not "buffer empty + at least one emit":
+
+- The previous design ("Closed AND buffer empty AND
+  `total_emit_count > 0`") was observer-driven (the check fired
+  on `emit()` or `subscribe()`), required `total_emit_count > 0`
+  to avoid a fast-path on never-emitted streams, and explicitly
+  excluded `last_cursor()` from the check. All of that complexity
+  came from trying to derive a destroy moment from buffer state.
+- The close-clock is **time-deterministic**: from
+  `close_time + T` onward, the id is tombstoned regardless of
+  who is observing. There is no "buffer briefly not empty when
+  the destroy fires" corner case to reason about, because for
+  every event in the buffer, `emit_time <= close_time`, so
+  `emit_time + T <= close_time + T`. By the time the close-clock
+  fires, every per-event TTL has already elapsed and every event
+  has been evicted on the next eviction sweep. The two rules are
+  consistent by construction.
+- It eliminates the `total_emit_count > 0` carve-out: a stream
+  that was created, closed, and never emitted to behaves like
+  any other Closed stream — it tombstones at `close_time + T`.
+  No special-case for empty-emit streams.
+- Subscribers attached just before close drain naturally (their
+  iterators terminate when the buffer is exhausted), and any
+  late subscriber arriving between `close_time` and
+  `close_time + T` can still replay the (possibly TTL-thinned)
+  history. After `close_time + T`, the id is gone.
+
+Implementation note: implementations MAY drive the close-clock
+either via a wall-clock timer (best for hosted/long-lived
+processes) or via an opportunistic check on `get(id)` / `emit()`
+/ `subscribe()` (best for tests). Either approach yields the same
+observable behavior: subscribers always raise
+`EventStreamNotFoundError` at or after `close_time + T`.
+
+`last_cursor()` continues to work in the Closed state even after
+all events have been evicted — it returns the last cursor the
+backing ever saw, NOT the current buffered max. This is required
+for the rehydration path (a process restarting picks up the
+high-water mark for resuming a not-yet-tombstoned stream).
+
+### §47. Streaming error taxonomy
+
+```
+EventStreamError                     # base
+  ├── EventStreamClosedError         # emit on closed stream
+  └── EventStreamNotFoundError       # any "id is not currently a
+                                     #   live stream" condition —
+                                     #   never registered, deleted,
+                                     #   or close-clock elapsed
+```
+
+Wire mapping (informative — HTTP plumbing is in callers, not the
+framework):
+
+| Exception | Suggested HTTP status |
+|---|---|
+| `EventStreamClosedError` | 5xx (this is a server-side bug — the producer kept emitting after closing). |
+| `EventStreamNotFoundError` | 404 Not Found. |
+
+#### Consolidated: when is `EventStreamNotFoundError` raised?
+
+`EventStreamNotFoundError` is the single error type for every
+"the id is not currently a live stream" condition. It fires for
+**three independent reasons**, all surfaced as the same
+exception:
+
+| Path to NotFound | Broadcast (live) | Replay (in-memory) | Replay (file-backed) |
+|---|---|---|---|
+| 1. `get(id)` for an id that was never registered. | ✓ | ✓ | ✓ |
+| 2. Explicit `streams.delete(id)` → instance removed + registry tombstones the id. Works in ANY state (Active or Closed). | ✓ | ✓ | ✓ (file removed before tombstone) |
+| 3. Closed stream's close-clock elapses (`now >= close_time + ttl_seconds`) → registry tombstones the id. Requires the backing to have been constructed with `ttl_seconds`. | ✗ (no TTL) | ✓ | ✓ (file removed before tombstone) |
+
+Key invariants to take away:
+
+- `BroadcastEventStream` NEVER auto-tombstones — it has no TTL
+  machinery. The ONLY path is explicit `delete()`.
+- For replay backings, the close-clock fires deterministically at
+  `close_time + ttl_seconds`. There is no `total_emit_count > 0`
+  carve-out and no buffer-state condition; a stream created,
+  closed, and never emitted to behaves like any other Closed
+  stream — tombstoned at `close_time + ttl_seconds`.
+- Per-event TTL runs regardless of Active/Closed, on `emit()` and
+  `subscribe()`. This is what bounds buffer memory for long-lived
+  active streams.
+- `last_cursor()` is side-effect-free — it does not trigger the
+  close-clock check, does not evict events, and does not
+  tombstone. It returns the high-water mark seen so far.
+- Once the registry tombstones an id, any stale instance
+  reference held by a caller raises `EventStreamNotFoundError`
+  on the next operation (the operation is routed through the
+  registry, which sees the tombstone).
+
+### §48. Third-party stream-impl pattern
+
+The `streams` registry owns ONLY the three SDK-bundled backings.
+Third-party `EventStream` implementations ship their OWN peer
+registry (don't try to plug into `streams`). This keeps each
+registry's tombstone/factory state local.
+
+Consumers can hold references to any `EventStream`-shaped instance
+— the registry-vs-not distinction is invisible to consumers.
+
+The `EventStream` Protocol does NOT include a destructive method
+(no `destroy` / `dispose` on the Protocol itself); destruction
+lives on the registry. Third-party registries SHOULD follow the
+same pattern: keep destruction off the data-flow Protocol.
+
+---
+
+## Part VII — Implementation guidance (algorithms)
+
+This part sketches the framework's load-bearing algorithms in
+language-agnostic pseudocode. Implementations MAY structure the
+control flow differently as long as the externally-observable
+behavior matches. References in brackets are to the source files
+in the canonical Python implementation.
+
+### §49. Cold-start sequence
+
+On `TaskManager.startup()`:
+
+```
+1. Register every decorator-discovered function into the resume-callback
+   table, keyed by source.name. [_REGISTERED_DESCRIPTORS]
+2. Resolve self.owner and self.instance_id from env (§7).
+3. Call self._recover_stale_tasks() — list tasks via:
+       provider.list(agent_name = self.agent_name,
+                     session_id  = self.session_id,
+                     status      = "in_progress",
+                     lease_owner = self.owner,
+                     source_type = _SOURCE_TYPE)   # framework-only scope
+   For each result:
+     a. Look at lease.owner and lease.instance_id.
+     b. If lease.owner != self.owner: skip (not ours). [Practically
+        unreachable because the filter already restricts to our
+        owner; defensive.]
+     c. If lease.owner == self.owner AND lease.instance_id == self.instance_id:
+        skip (would be impossible in a fresh process; defensive).
+     d. Otherwise (same-owner different-instance OR expired):
+        — Call self._steering_cleanup_orphan_attachments(task_info)
+          (§58) to clean up any orphan _steering_input_* attachments
+          left by a partial crash.
+        — Call self._reclaim_one(task_info) — PATCH lease to self
+          with if_match=etag, then invoke the registered resume
+          callback with entry_mode='recovered', re-hydrated input,
+          and metadata. On 412: ABANDON (the next scan re-evaluates).
+4. Spawn _periodic_recovery_loop() as a background task.
+5. Return.
+```
+
+The cold-start scan blocks `startup()` until done — handlers
+intended to be recovered must be visible before any HTTP route goes
+live. Implementers exposing the framework over HTTP MUST gate
+route binding on `startup()` having returned.
+
+### §50. `.start()` lifecycle resolution
+
+The framework's most-complex decision tree. On `Task.start(task_id, input, ...)`:
+
+```
+1. Validate task_id (§7).
+2. Read task store for task_id (single GET).
+3. Compute lifecycle action:
+
+     - If GET returned None (task not found):
+         -> CREATE
+     - If status == 'pending':
+         -> ADOPT (rare; transition to in_progress)
+     - If status == 'suspended':
+         -> RESUME (transition to in_progress with new input;
+                    clears prior output — see §11, §23.8 item 8)
+     - If status == 'completed':
+         -> RAISE TaskConflictError(current_status='completed')
+     - If status == 'in_progress':
+         If lease is dead (expired OR same-owner different-instance):
+             -> RECLAIM-AND-INVOKE (transition to in_progress with same owner, new instance)
+         Else if task is steerable AND in-process active execution exists for task_id:
+             -> STEERING-APPEND (queue input; do NOT enter handler)
+         Else:
+             -> RAISE TaskConflictError(current_status='in_progress')
+
+4. Execute the chosen action via the appropriate transition PATCH.
+   For RESUME, the PATCH MUST be a single co-PATCH carrying:
+     - status: 'in_progress'
+     - payload['input']: new serialized input (inline or ref)
+     - payload['_turn_started_at']: utc_now_iso()
+     - payload['_retry_attempt']: 0   (fresh retry budget for the resumed turn)
+     - attachments['_input']: new value (or absent if inline)
+5. If action ∈ {CREATE, ADOPT, RESUME, RECLAIM-AND-INVOKE}:
+     Spawn lease_renewal_loop, watchdog (if timeout configured), execute_task_loop.
+     Return a TaskRun bound to this execution.
+6. If action == STEERING-APPEND:
+     Return a TaskRun whose .result() resolves with the NEXT-TURN outcome
+     (the queued steerer is bound to the next turn).
+```
+
+The reclaim sub-case includes input precondition validation
+(`if_last_input_id`) before the transition PATCH.
+
+The framework does NOT write `payload["output"]` on any
+transition. The handler's return value resolves the in-process
+caller's `TaskRun.result()` future and is never projected onto
+the chain record.
+
+### §51. Steering append (atomic)
+
+When `.start()` resolves to STEERING-APPEND, the framework
+executes this PATCH as a single round-trip:
+
+```
+1. Read current payload (already in memory from the lifecycle GET).
+2. steering   = payload.get('_steering', {})
+3. pending   = list(steering.get('pending_inputs', []))
+4. If len(pending) >= 9: raise SteeringQueueFull.
+5. serialized = canonical_json(input)
+6. If size(serialized) > 20 KiB:
+     next_seq = steering.get('next_input_seq', 0)
+     key      = f'_steering_input_{next_seq}'
+     ref      = {'__attachment_ref__': {'key': key, 'hash': sha256(serialized)}}
+     pending.append(ref)
+     steering['next_input_seq'] = next_seq + 1
+     attachments_patch = {key: input}
+   else:
+     pending.append(input)         # raw inline
+     attachments_patch = None
+7. steering['pending_inputs']   = pending
+   steering['cancel_requested'] = True
+8. payload_patch = {'_steering': steering}
+   if input_id provided: payload_patch['_last_input_id'] = input_id
+9. PATCH(task_id, payload=payload_patch, attachments=attachments_patch,
+        lease_owner=self.owner, lease_instance_id=self.instance_id,
+        lease_duration_seconds=60, if_match=etag)
+10. Locally: signal the active execution's ctx.cancel via the in-process
+    context registry (no remote signal needed — the active execution
+    is in this process).
+```
+
+The PATCH MUST carry both `payload` and `attachments` (when
+promoted) so the queue entry and its backing attachment are added
+in the same etag transaction.
+
+### §52. Steering drain (two-phase, two-PATCH)
+
+At every turn-end boundary (suspend, complete, raise), if there
+are queued steering inputs, the framework drains the head and
+re-enters the handler. The drain is two-phase AND two-PATCH to be
+crash-safe — `drain_in_progress=True` between the two PATCHes is
+the breadcrumb recovery uses to know "we are mid-drain":
+
+```
+Phase 1 — "Drain start" PATCH (atomic across payload + attachments):
+  1. Read current task record (we need etag, payload, attachments).
+  2. steering = dict(payload['_steering'])
+  3. pending  = list(steering['pending_inputs'])
+  4. If pending is empty: return None (no drain happens; caller
+     proceeds to suspend/complete normally).
+  5. next_entry  = pending.pop(0)
+  6. attachments_patch = {}
+  7. If next_entry is a ref (§23.3):
+        attachments_patch[ref_key(next_entry)] = None    # delete attachment
+        active_input_value = read attachment at ref_key  # resolve via _read_input_value
+     else:
+        active_input_value = next_entry
+  8. steering['active_input']      = active_input_value
+  9. steering['pending_inputs']    = pending
+ 10. steering['drain_in_progress'] = True
+ 11. steering['cancel_requested']  = len(pending) > 0     # more pending => keep advisory
+ 12. payload['_steering']          = steering
+ 13. payload['_turn_started_at']   = utc_now_iso()        # fresh turn-start boundary
+ 14. PATCH(task_id, status='in_progress', payload=payload,
+        attachments=attachments_patch, lease piggyback, if_match=etag)
+
+     [NB: status MUST be set to 'in_progress' in this PATCH. The turn-end
+      boundary that triggered the drain already wrote status='suspended'
+      (multi-turn return/raise => suspended; see §12). The drain starts a
+      NEW turn, so it reclaims the record suspended->in_progress. This is
+      ALSO required for correctness of the lease piggyback: the task store
+      rejects a lease *renewal* on a non-in_progress task ("lease renewal is
+      only supported for in_progress tasks") but ACCEPTS lease params as part
+      of a suspended->in_progress *claim*. Omitting the status flip makes the
+      Phase-1 PATCH 409 and the steered turn never runs.]
+
+     [NB: Phase 1 does NOT set payload['input'] or write a ref/attachment
+      for active_input. Only the in-memory ctx receives the value (Phase 2).
+      Recovery from a crash BETWEEN Phase 1 and Phase 3 reads
+      _steering['active_input'] as the source of truth for the input,
+      via the race-recovery contract.
+      No output co-clear is needed — the framework does not write
+      payload['output'] / _output attachments on any transition.]
+
+Phase 2 — Handler re-entry (in-memory only):
+ 15. Construct a fresh TaskContext with:
+       entry_mode='resumed', is_steered_turn=True,
+       input=active_input_value (deserialized via input_type),
+       metadata reused from previous ctx,
+       cancel_event=fresh (re-set if cancel_requested still True),
+       retry_attempt=0.
+ 16. Update the in-process _ActiveTask.context pointer.
+ 17. Invoke the handler with the new ctx.
+
+Phase 3 — "Drain end" PATCH (after handler re-entered):
+ 18. steering['drain_in_progress'] = False
+ 19. payload['_steering']          = steering
+ 20. payload['_retry_attempt']     = 0     # Drain resets retry budget durably
+ 21. PATCH(task_id, payload=payload, lease piggyback)
+     (No attachments touched in Phase 3.)
+
+Phase 4 — On the next turn-end:
+ 22. The handler returns/suspends/raises. The terminal handler clears
+     active_input as part of its suspend/complete PATCH (§53).
+```
+
+**Race-recovery contract.** If the process crashes:
+
+- **Between Phase 1 PATCH and Phase 2 handler entry:** recovery
+  reads `drain_in_progress=True` and `active_input != null` and
+  re-enters with `is_steered_turn=True` using `active_input` as
+  the input.
+- **Between Phase 2 handler entry and Phase 3 PATCH:** same — the
+  new ctx is in-memory only; recovery re-enters from `active_input`.
+- **After Phase 3 PATCH:** `drain_in_progress=False`. Recovery
+  treats the task as a normal mid-turn task; reads `payload['input']`
+  if set (typically null at this point — the handler has not yet
+  written a turn-start input) and re-enters as a normal recovery.
+
+**Atomicity note for Phase 1.** "Single PATCH" here means one
+HTTP round-trip carrying BOTH the payload and the attachment
+changes. The hosted store applies both atomically against the
+etag. There is no in-between state where the attachment is
+deleted but the queue still references it, OR vice-versa.
+
+**Conflict retry.** A 412 (etag conflict) on Phase 1 triggers a
+bounded retry (up to 5 attempts) that re-reads the record and
+replays the drain. Exhausting the retries raises `RuntimeError`
+to the caller.
+
+**Watchdog scope (known gap).** The per-turn timeout watchdog is
+spawned ONCE per execution in `_execute_task` and is NOT
+respawned on drain re-entry today. As a result, a steered turn
+shares the watchdog of the turn that drained it. Other-language
+implementers SHOULD spawn a fresh watchdog on drain re-entry to
+honor the design intent that every turn-start boundary gets a
+fresh per-turn budget (§14, §57). The canonical Python
+implementation has this as a known gap and is patched by relying
+on the persisted `_turn_started_at` only on RECOVERY.
+
+### §53. Suspend write
+
+When a multi-turn handler ends a turn with `return X`:
+
+```
+1. Read current task (we need etag and the input slot to know if it was promoted).
+2. payload_patch = {
+       'metadata': metadata.to_dict(),  # auto-flush of touched namespaces
+       'input': null,                   # consumed input goes away
+       '_retry_attempt': null,          # fresh retry budget for next turn
+   }
+3. If task.payload['_steering'] is set:
+       steering = dict(task.payload['_steering'])
+       steering['active_input'] = null
+       payload_patch['_steering'] = steering
+4. # NB: The framework does NOT persist X anywhere on the task record
+   # (§11, §20, C-OUT). The handler's return value is delivered to
+   # the in-process awaiter of TaskRun.result() ONLY. No payload['output']
+   # write, no '_output' attachment.
+   attachments_patch = {}
+5. If task.payload['input'] was a ref (§23.3):
+       attachments_patch[ref_key(task.payload['input'])] = null
+6. PATCH(task_id, status='suspended', suspension_reason='run_completion',
+        payload=payload_patch, attachments=attachments_patch,
+        lease piggyback, if_match=etag)
+```
+
+Properties this guarantees:
+
+- **No output persistence.** Whether the handler returns a value or
+  not, nothing about that value lands on the durable record. After
+  suspend the record reflects `status=suspended`, no `output` key.
+  Awaiters of `TaskRun.result()` receive the value in-process before
+  the chain enters its next turn; replay-after-crash returns to the
+  handler with no output replay path.
+- **Atomic input + steering + attachment clears.** Single PATCH
+  carries the `input` clear, the `_steering.active_input` clear, the
+  `_retry_attempt` reset, AND the deletion of the promoted `_input`
+  attachment (when applicable). There is no crash window where the
+  attachment exists without its ref or vice-versa.
+- **`_last_input_id` preserved.** Not touched here so the
+  `if_last_input_id` precondition on the next `start()` still resolves.
+
+### §54. Recovery + reclaim
+
+Both reclaim sites (inline and cold-start/periodic) MUST use
+`if_match` for CAS. There is no longer a difference between them
+in this respect.
+
+**Inline reclaim — `_reclaim_one(task_info)` (lifecycle resolver):**
+
+```
+1. Build a PATCH that re-takes the lease:
+      lease_owner            = self.owner       # always self
+      lease_instance_id      = self.instance_id # always self
+      lease_duration_seconds = 60
+      if_match               = task_info.etag   # CAS-guarded
+2. PATCH(task_info.id, ...)
+   On 412: ABANDON per §25.3 — the conflict IS the race-detection;
+   the next caller / scan re-evaluates.
+3. Re-read task_info (now with self as lease owner). Record the new etag.
+4. Look up the resume callback by source.name.
+5. If no callback found: log and skip (decorator not registered in
+   this process — the framework cannot recover what it does not know).
+6. Hydrate ctx.input from payload['input'] (resolving ref via
+   attachments if necessary).
+7. Compute entry_mode based on stored status:
+      in_progress => 'recovered'
+      suspended   => 'resumed'
+      pending     => 'fresh'
+8. If drain_in_progress is True: set is_steered_turn=True; use
+   active_input as ctx.input (NOT payload['input']).
+9. Spawn lease_renewal_loop, watchdog (with remaining-from-turn-start),
+   execute_task_loop with the recovered ctx.
+```
+
+**Cold-start / periodic reclaim — `_recover_stale_tasks()`:**
+
+```
+1. provider.list(agent_name, session_id, status="in_progress",
+                 lease_owner=self.owner,
+                 source_type=_SOURCE_TYPE)
+   The source_type filter scopes to framework-owned tasks ONLY;
+   foreign-typed records in the same scope are never picked up.
+2. For each task_info:
+   a. Build the same reclaim PATCH as inline reclaim, INCLUDING
+      if_match = task_info.etag.
+   b. PATCH(task_info.id, ...). On 412: ABANDON (the conflict IS
+      the race-detection — let the next scan or the next caller
+      re-evaluate).
+   c. Same handler dispatch as steps 3-9 of inline reclaim.
+```
+
+**Liveness predicate (`_lease_is_dead`).** The framework's
+"is this lease dead" check is:
+
+```
+1. If active_locally (this process has an _ActiveTask entry for
+   this id): NOT dead.
+2. If lease.owner == self.lease_owner AND not active_locally:
+   DEAD (previous lifetime of mine).
+3. If lease.owner != self.lease_owner AND lease.owner is set:
+   NOT dead (foreign owner — caller observes the live-elsewhere
+   conflict shape; do not reclaim).
+4. If lease.owner is empty: DEAD (no live executor claims it).
+```
+
+Note: the predicate does NOT directly consult `expires_at`. The
+hosted store enforces expiry server-side at PATCH time by
+rejecting an attempted reclaim against a still-live foreign
+lease; the framework relies on the server response (which the
+classifier turns into `evicted` / `conflict` labels) to handle
+the lost-race case. The local provider mirrors this behavior:
+attempting to reclaim a not-yet-expired foreign lease yields a
+classified conflict, and the local provider bumps `expiry_count`
+when the prior lease's `expires_at` (UTC) has actually passed
+(parity with the hosted store).
+
+### §55. Periodic recovery loop
+
+```
+loop:
+    await sleep(300 seconds) OR cancel_event
+    if cancel_event set: break
+    await self._recover_stale_tasks()   # same as cold-start scan
+```
+
+The interval is intentionally **NOT** developer-tunable: shortening
+it inflates list-bandwidth without improving recovery latency
+(inline reclaim already catches in-flight starts); lengthening it
+delays reclaim of expired-during-process-lifetime tasks beyond
+acceptable bounds.
+
+### §56. Lease renewal loop
+
+```
+interval = max(1, lease_duration_seconds // 2)
+failures = 0
+loop:
+    await sleep(interval) OR cancel_event
+    if cancel_event set: break
+
+    if last_refresh_provider() shows a recent piggyback refresh:
+        # Skip: a payload PATCH within the last interval already
+        # refreshed the lease as a side effect.
+        continue
+
+    try:
+        PATCH(task_id, lease_owner, lease_instance_id, lease_duration_seconds)
+        failures = 0
+        if steering_poll_callback: await steering_poll_callback()
+    except TransportClassifiedError as exc:
+        if exc.classification == 'evicted':
+            # Orphan-sandbox eviction. Stop renewing immediately;
+            # signal local cleanup callback to cancel execution,
+            # suppress pending terminal write, signal awaiters with
+            # TaskConflictError.
+            on_cancel_callback.set()
+            break
+        failures += 1
+        if failures >= 3 and on_cancel_callback:
+            on_cancel_callback.set()
+            break
+    except Exception:
+        failures += 1
+        ...
+```
+
+The `last_refresh_provider` optimization avoids an extra HTTP
+round-trip on every renewal when the framework already piggybacked
+lease ownership on a payload PATCH within the last interval.
+
+### §57. Per-turn watchdog
+
+```
+async def _timeout_watchdog(timeout_seconds, cancel_event, ctx,
+                            remaining_seconds=None):
+    if remaining_seconds is None:
+        sleep_for = timeout_seconds
+    else:
+        # Clamp to [0, timeout_seconds] for clock-skew safety.
+        sleep_for = max(0.0, min(remaining_seconds, timeout_seconds))
+
+    if sleep_for > 0:
+        await sleep(sleep_for)
+
+    # ORDERING INVARIANT: cause boolean BEFORE cancel event.
+    ctx.timeout_exceeded = True
+    cancel_event.set()
+```
+
+`remaining_seconds = None` is fresh-entry / drain-re-entry; the
+budget is the full timeout. `remaining_seconds = computed` is
+crash-recovery, where the manager computes
+`opts.timeout_seconds - (now - persisted_turn_started_at)` and
+passes it. A negative or zero value fires immediately so the
+recovered handler sees the cause from its first checkpoint.
+
+### §58. Orphan attachment cleanup
+
+```
+async def _steering_cleanup_orphan_attachments(task_info):
+    if not task_info.attachments:
+        return
+    steering_keys = {k for k in task_info.attachments
+                       if k.startswith('_steering_input_')}
+    if not steering_keys:
+        return
+    pending = task_info.payload.get('_steering', {}).get('pending_inputs', [])
+    referenced = {ref_key(e) for e in pending if is_ref(e)
+                                              and ref_key(e).startswith('_steering_input_')}
+    orphans = steering_keys - referenced
+    if not orphans:
+        return
+    PATCH(task_info.id, attachments={k: null for k in orphans},
+          if_match=task_info.etag)
+```
+
+This is **defense-in-depth**. The happy path (single-PATCH
+atomicity at append + drain) never produces orphans. A future
+code path that splits a write across multiple PATCHes could
+leave one; this cleanup runs once per recovery and closes the
+window for ~one extra PATCH per task per cold-start.
+
+Implementers MAY omit this if they can prove the single-PATCH
+invariant holds across all transitions (today's framework can).
+
+---
+
+## Part VIII — Conformance items
+
+This section enumerates the invariants every conformant implementation
+MUST satisfy. The items are testable; the canonical Python
+implementation has a regression test covering each (see
+`azure-ai-agentserver-core/tests/durable/` and `tests/streaming/`).
+
+Items are grouped by area. Each item is identified `C-AREA-N`
+(e.g. `C-LCM-1` = Lifecycle item #1).
+
+### C-LCM (lifecycle + state machine)
+
+- **C-LCM-1.** Status MUST be one of exactly four values:
+  `pending`, `in_progress`, `suspended`, `completed`. No other
+  value is legal in the store.
+- **C-LCM-2.** Unsuccessful outcomes (failure, cancellation) are
+  communicated via typed exceptions (NEVER via a fifth status
+  value). For one-shot (`@task`) tasks the record is deleted on
+  terminal exit (one-shot is always ephemeral). For multi-turn
+  (`@multi_turn_task`) tasks the chain transitions to `suspended`
+  with `suspension_reason="run_completion"` on either successful
+  `return X` or a handler raise — the chain stays alive and the
+  caller observes the per-turn outcome via the typed exception
+  (`TaskFailed` / `TaskCancelled`) or the returned `Output`.
+- **C-LCM-3.** `ctx.entry_mode` MUST be one of `fresh`, `resumed`,
+  `recovered`. The combination `(entry_mode=recovered,
+  is_steered_turn=True)` is legal and MUST be supported.
+- **C-LCM-4.** For any given `task_id`, at most one handler runs
+  at a time across the cluster of processes that share the
+  `(agent_name, session_id)` scope. The lease + ETag CAS
+  combination enforces this.
+- **C-LCM-5.** Status transitions MUST be enforced against the §24.1
+  matrix. Invalid transitions raise `_HostedConflict(_code="invalid_state_transition")`
+  — this is a framework bug (framework drives transitions, not the
+  developer) and at the boundary maps to `RuntimeError`.
+- **C-LCM-6.** Terminal-status tasks are immutable per §24.2. PATCH
+  on a `completed` task is rejected EXCEPT for the no-op
+  `completed → completed` with no other field changes. Violations
+  raise `_HostedConflict(_code="task_immutable")` →
+  `TaskConflictError(current_status="completed")`.
+- **C-LCM-7.** DELETE on a non-terminal task without `force=true`
+  MUST be rejected as `invalid_request` (400). DELETE on a terminal
+  task always succeeds without `force`. DELETE honors `If-Match`
+  when supplied (412 / `etag_mismatch` on mismatch). Per §24.3.
+- **C-LCM-8.** PATCHes that include any of `id`, `agent_name`,
+  `session_id`, `title`, `description`, `source` MUST be rejected
+  as `invalid_request` (§28a.6 / §24).
+
+### C-ID (identity)
+
+- **C-ID-1.** `task_id` validation MUST reject empty / length>256 /
+  characters outside `[a-zA-Z0-9\-_.:]` at the call site, before
+  any network is touched.
+- **C-ID-2.** `lease_owner` MUST be derived from BOTH
+  `agent_name` AND `session_id` (format
+  `<agent_name>|session:<session_id>`).
+- **C-ID-3.** `lease_instance_id` MUST be fresh per process; a
+  same-`(owner, instance_id)` lease record indicates "my own task";
+  same-owner-different-instance indicates "previous lifetime of
+  mine, RECLAIM."
+- **C-ID-4.** `source.name` MUST be the routing key for resume
+  callback discovery. Two tasks with the same `source.name` are
+  routed to the same callback on recovery; tasks with no matching
+  registered callback are skipped (logged, not raised) — the
+  framework cannot recover what it does not know.
+
+### C-LSE (lease)
+
+- **C-LSE-1.** Lease renewal MUST run at half the lease duration.
+  Default lease duration is 60 seconds; default renewal interval
+  is 30 seconds.
+- **C-LSE-2.** All reclaim PATCHes — inline (via `_reclaim_one`)
+  AND cold-start / periodic-scan reclaims — MUST be guarded by
+  `if_match=etag`. On `412`, the framework MUST treat the reclaim
+  as ABANDONED for this scan (another process beat us to it; do
+  not retry). This is the unified rule that closes the prior
+  known gap where periodic-scan reclaims wrote without
+  `if_match`.
+- **C-LSE-3.** `expiry_count` MUST be a server-side counter ONLY.
+  Implementations MUST NOT add it to the patch-request shape; the
+  framework MUST NOT write the field. The hosted store bumps it
+  on actual-expiry ownership change (not on same-owner
+  different-instance handoff). The local file provider MUST also
+  bump `expiry_count` on the reclaim write that completes a real
+  lease handoff (parity with the hosted store, so
+  the lease's `expiry_count` works in local mode and so tests
+  asserting recovery behavior can run against the local
+  provider).
+- **C-LSE-4.** Eviction (HTTP 409 + `error.code=binding_mismatch`)
+  classified as `evicted` MUST trigger the local cleanup sequence:
+  cancel local execution, suppress pending terminal write, signal
+  awaiters with `TaskConflictError`.
+- **C-LSE-5.** `ctx.exit_for_recovery()` MUST force-expire the lease
+  and leave status as `in_progress` (NOT `suspended`).
+- **C-LSE-6.** `lease_duration_seconds` MUST be `0` (force-expire) OR
+  in range `10..3600`. Other values MUST be rejected as
+  `invalid_request` by both providers (§22.1 LSE-W-1).
+- **C-LSE-7.** Lease params are an all-or-nothing triplet: supplying
+  any subset of `(lease_owner, lease_instance_id, lease_duration_seconds)`
+  without all three MUST be rejected as `invalid_request` (§22.1 LSE-W-2).
+- **C-LSE-8.** Lease acquisition / renewal against a record whose
+  lease is held by a different owner and not yet expired MUST be
+  rejected as `_HostedConflict(_code="lease_held_by_another")` →
+  developer-observable `TaskConflictError(current_status="in_progress")`
+  (§22.1 LSE-W-3).
+- **C-LSE-9.** `in_progress → pending` transition MUST verify the
+  supplied `(lease_owner, lease_instance_id)` matches the record's
+  current lease (`EnsureLeaseMatches` per §22.1 LSE-W-4).
+- **C-LSE-10.** Lease renewal (no status change, `duration > 0`) MUST
+  be rejected when the current status is anything other than
+  `in_progress` (§22.1 LSE-W-5).
+- **C-LSE-11.** Force-expire (`lease_duration_seconds=0`) MUST NOT be
+  combined with a status transition in the same PATCH (§22.1
+  LSE-W-6).
+- **C-LSE-12.** Force-expire MUST verify lease ownership unless the
+  lease is already expired (§22.1 LSE-W-7).
+- **C-LSE-13.** `started_at` MUST be set exactly once on the first `in_progress` transition and MUST NOT be updated thereafter — lease re-acquisition (different-owner takeover OR same-owner restart after expiry), recovery scanner takeover, and suspend/resume cycles MUST all preserve the original `started_at` value (§22.1 LSE-W-8).
+- **C-LSE-14.** On every successful lease write, the provider MUST
+  stamp `lease.heartbeat_at = now` (§22.1 LSE-W-10). The field is
+  on `LeaseInfo`; it is NOT exposed on the public surface.
+
+### C-INP (input + chain)
+
+- **C-INP-1.** `input_id` provided without `if_last_input_id` MUST
+  succeed; the framework records the id in `_last_input_id`.
+- **C-INP-2.** `if_last_input_id` provided without `input_id` MUST
+  raise `TypeError` at the call site.
+- **C-INP-3.** `if_last_input_id` mismatch MUST raise
+  `LastInputIdPreconditionFailed` (subclass of
+  `TaskPreconditionFailed`).
+
+### C-SUS (suspend / resume)
+
+- **C-SUS-1.** A multi-turn handler's `return X` MUST clear
+  `payload["input"]` AND `payload["_steering"]["active_input"]`
+  AND any promoted input attachment, in a single PATCH that also
+  transitions the chain to `suspended`.
+- **C-SUS-2.** The next `.run()` / `.start()` against a `suspended`
+  chain MUST re-invoke the handler with `entry_mode="resumed"`
+  and the NEW `input` (not the consumed one).
+- **C-SUS-3.** The handler's `return X` value MUST be delivered
+  unconditionally to the in-process caller awaiting
+  `TaskRun.result()` — even if steering inputs are queued. `X`
+  resolves the future and is then no longer reachable from the
+  persisted record (the framework does NOT write `payload["output"]`).
+- **C-SUS-4.** The framework MUST NOT write `payload["output"]`
+  and MUST NOT use the `_output` attachment slot. The suspend
+  PATCH writes `status="suspended"`, `suspension_reason="run_completion"`,
+  clears `payload["input"]` and `payload["_retry_attempt"]`, and
+  preserves `payload["_last_input_id"]`. No output / error
+  projection onto the chain record.
+
+### C-STR (steering)
+
+- **C-STR-1.** Steering queue cap MUST be 9; appending past it
+  MUST raise `SteeringQueueFull` from `.start()`.
+- **C-STR-2.** Append MUST set `_steering["cancel_requested"]=True`
+  and signal `ctx.cancel` on the in-process active execution.
+- **C-STR-3.** `next_input_seq` MUST be monotonic and advance ONLY
+  on promotion (inline appends do NOT bump it).
+- **C-STR-4.** A drain MUST NOT renumber any other queue entry's
+  attachment key. Surviving promoted entries keep their
+  original `_steering_input_<seq>` keys.
+- **C-STR-5.** A drain MUST be carried in a single PATCH that
+  removes the head from `pending_inputs`, deletes the
+  corresponding attachment (if any), and sets the new turn's
+  input / `_turn_started_at`.
+- **C-STR-6.** Multi-turn handler ending a turn with `return X`
+  MUST transition the chain to `suspended` and promote the next
+  queued steering input as the next turn's input. The queued
+  steerer's `.result()` resolves with whatever the promoted turn
+  emits.
+- **C-STR-7.** Multi-turn handler ending a turn with `raise` (any
+  non-CancelledError exception) MUST transition the chain to
+  `suspended` (NOT `completed` / `failed`) — the chain stays
+  alive — and promote the next queued steering input as the next
+  turn. The failing turn's caller observes `TaskFailed(error=...)`;
+  the queued steerer's `.result()` resolves with whatever the
+  promoted turn emits.
+- **C-STR-8.** First turn's caller MUST observe the natural
+  multi-turn outcome of the in-flight turn (the handler's
+  `return X` resolved to that caller; or the handler's `raise`
+  raised to that caller as `TaskFailed` / `TaskCancelled`). It
+  MUST NOT be replaced by what a later turn produces.
+
+### C-CAN (cancellation + cause booleans)
+
+- **C-CAN-1.** Cause booleans MUST be `timeout_exceeded`,
+  `cancel_requested`; plus the cause counter `pending_input_count`.
+- **C-CAN-2.** Each cause MUST be set BEFORE `ctx.cancel` is set
+  (ordering invariant). A handler observing
+  `ctx.cancel.is_set() == True` MUST be guaranteed to see at least
+  one cause already set (or `pending_input_count > 0`).
+- **C-CAN-3.** Causes MUST accumulate (never reset within a turn).
+- **C-CAN-4.** `TaskCancelled` MUST NOT inherit `asyncio.CancelledError`
+  (would be suppressed by generic handlers).
+- **C-CAN-5.** `TaskRun.cancel()` MUST set `ctx.cancel_requested =
+  True` BEFORE setting `ctx.cancel`.
+
+### C-TMO (timeout watchdog)
+
+- **C-TMO-1.** Timeout is **per-turn** and **wall-clock**.
+- **C-TMO-2.** `payload["_turn_started_at"]` MUST be re-stamped at
+  every turn-start boundary (fresh, resumed, drain re-entry — Phase 1
+  of §52). It MUST NOT be re-stamped on crash recovery.
+- **C-TMO-3.** Recovered watchdog MUST compute
+  `remaining = max(0, timeout - (now - _turn_started_at))` and
+  fire immediately if elapsed.
+- **C-TMO-4.** Clock skew MUST be clamped to `[0, timeout]` in
+  both directions.
+- **C-TMO-5.** Watchdog MUST set `ctx.timeout_exceeded = True`
+  BEFORE setting `ctx.cancel` (C-CAN-2 ordering).
+- **C-TMO-6.** Watchdog MUST be cooperative-only. It MUST NOT
+  force-stop the handler, terminate the task, or cancel lease
+  renewal.
+- **C-TMO-7.** A fresh watchdog SHOULD be spawned on every
+  turn-start boundary (fresh, resumed, drain re-entry). The
+  canonical Python implementation today only spawns on fresh /
+  resumed entries; drain re-entry inherits the original watchdog.
+  This is a known gap (see §14).
+
+### C-RET (retry)
+
+- **C-RET-1.** `retry=None` MUST mean "no retry" (the handler's
+  raise propagates directly to the caller as `TaskFailed`).
+- **C-RET-2.** `retry_attempt` MUST be exposed on
+  `TaskContext.retry_attempt` and persisted as
+  `payload["_retry_attempt"]`. Cleared at every turn-start
+  boundary.
+- **C-RET-3.** Crash recovery MUST NOT consume retry budget. A
+  lifetime that died before the handler raised MUST NOT advance
+  `_retry_attempt`.
+- **C-RET-4.** Between attempts, the framework MUST PATCH only
+  `payload["_retry_attempt"]` (the counter advance). NO
+  `payload["error"]` is written between attempts.
+- **C-RET-5.** When `retry_attempt >= max_attempts`, the framework
+  MUST raise `TaskFailed(error=TaskExhaustedRetriesErrorDict(...))`
+  to the awaiting caller. The dict's `type` MUST be the literal
+  `"exhausted_retries"`; `attempts`, `last_error`, `last_error_type`,
+  `traceback` MUST be present.
+- **C-RET-6.** No persisted `error` field on the chain record.
+  The framework's structured ERROR log (named
+  `durable_task_handler_failure`, with `task_id`, `input_id`,
+  `error_type`, `error_message`) is the durable failure
+  observability surface; the chain record itself does not
+  carry the per-turn diagnostic.
+
+### C-MET (metadata)
+
+- **C-MET-1.** Default namespace MUST persist at `payload["metadata"]`.
+- **C-MET-2.** Named namespace `ns` MUST persist at
+  `payload["metadata:<ns>"]`.
+- **C-MET-3.** Top-level keys / namespace names starting with `_`
+  are RESERVED for the framework.
+- **C-MET-4.** Auto-flush MUST persist all touched namespaces at
+  every terminal-of-turn boundary.
+- **C-MET-5.** Flush failures MUST be logged, not raised.
+
+### C-ATT (attachments + promotion)
+
+- **C-ATT-1.** Two wire shapes only: inline (raw value) OR ref
+  (`{"__attachment_ref__": {"key": ..., "hash": "sha256:..."}}`).
+- **C-ATT-2.** Detection rule: a slot is a ref iff it is a dict
+  with exactly one key `__attachment_ref__` whose value is a dict
+  with both `key` and `hash`.
+- **C-ATT-3.** Promotion thresholds: function input > 200 KiB;
+  steering input > 20 KiB. Outputs are not persisted at all
+  (§11, §20, C-OUT) — there is no `_output` attachment. Measured
+  in canonical-JSON bytes. Framework-reserved attachment keys:
+  `_input`, `_steering_input_<seq>`.
+  Worst-case framework attachment usage: 1 + 9 = 10 of 20 slots;
+  10 slots remain free.
+- **C-ATT-4.** Per-attachment cap: 2 MB serialized. Per-task
+  attachment count cap: 20. Per-value cap MUST be enforced
+  client-side on every write site (create + patch) in both
+  providers. Provider-level violations MUST surface as the
+  internal `_AttachmentTooLarge` / `_AttachmentLimitExceeded`
+  (underscore-prefixed; NOT exported). The framework MUST
+  re-raise as the developer-facing `InputTooLarge` (for `_input`
+  / `_steering_input_*` keys).
+  Per-task count cap MUST be enforced on `create` and SHOULD be
+  enforced on `patch` when current state is cheaply available;
+  the canonical Python implementation enforces count on
+  local-provider patches and on framework-orchestrated
+  steering-append patches (which fetch state anyway) but NOT on
+  the bare hosted PATCH (which would require an extra round-trip).
+  The server enforces in the gap.
+- **C-ATT-5.** Promotion / drain / suspend / orphan-cleanup
+  PATCHes MUST carry BOTH `payload` and `attachments` in a single
+  round-trip.
+- **C-ATT-6.** Hash algorithm MUST be SHA-256 over canonical
+  JSON bytes (`sort_keys=True`, separators `(",", ":")`), formatted
+  as `sha256:<64 lowercase hex chars>`.
+- **C-ATT-7.** Orphan attachment cleanup (§58) MUST run on
+  recovery for tasks with `_steering_input_*` keys not referenced
+  in `pending_inputs`.
+- **C-ATT-8.** Attachment keys MUST match `^[a-zA-Z0-9_.\-]{1,64}$`
+  and MUST be non-empty after trim. Validated on every CREATE and
+  PATCH write (§23.9).
+- **C-ATT-9.** Clear-all gesture: PATCH with `attachments: null`
+  (typed-API `TaskPatchRequest.clear_attachments = true`) MUST
+  delete every attachment on the task. Mutually exclusive with
+  per-key `attachments={...}` in the same request — combination
+  MUST be rejected as `invalid_request` (§23.10).
+- **C-ATT-10.** DELETE on a task MUST remove all attachments along
+  with the task. Local achieves this trivially via unlinking the
+  JSON file; hosted relies on the service's blob-cleanup hook
+  (§23.10).
+
+### C-VAL (field validation — shared between providers)
+
+- **C-VAL-1.** Task `id` MUST match `^[a-zA-Z0-9_-]{1,128}$`. Empty
+  or non-matching ids rejected as `invalid_request` (§28a.1).
+- **C-VAL-2.** `agent_name`, `session_id`, `title` MUST be required
+  on CREATE (length 1..128 / 1..128 / 1..256 after trim respectively).
+- **C-VAL-3.** `description` MUST be ≤ 1024 chars after trim.
+- **C-VAL-4.** `suspension_reason` MUST be ≤ 256 chars after trim,
+  AND only allowed when target status is `suspended` (§28a.1, §S5).
+- **C-VAL-5.** Tag keys MUST match `^[a-zA-Z0-9_.\-]{1,64}$`. Tag
+  values MUST be ≤ 256 chars. Total tag entries MUST be ≤ 16.
+- **C-VAL-6.** Byte budgets MUST be enforced per §28a.2: `payload`
+  ≤ 1 MB, `error` ≤ 64 KB, `source` ≤ 4 KB (canonical-JSON byte
+  measurement: `sort_keys=True`, separators `(",", ":")`).
+- **C-VAL-7.** `source` when supplied MUST be a JSON object with a
+  non-empty `type` field (§28a.3). Optional structured fields
+  pass through; unknown fields are preserved.
+- **C-VAL-8.** `error` when supplied MUST be a JSON object with
+  non-empty `message` and `type` strings (§28a.4). `code` defaults
+  to `"error"` when missing.
+- **C-VAL-9.** Status `"failed"` MUST be rejected on input. Status
+  `"done"` MUST be normalized to `"completed"` on read and in list
+  filters (§28a.5).
+- **C-VAL-10.** PATCHes including any of `id`, `agent_name`,
+  `session_id`, `title`, `description`, `source` MUST be rejected
+  as `invalid_request` (§28a.6).
+- **C-VAL-11.** Payload PATCH semantics per §F1: when the patch
+  value is a JSON object, shallow-merge into current payload; for
+  any other JSON type (array, string, number), full-replace; null
+  is no-op.
+
+### C-REC (recovery)
+
+- **C-REC-1.** Cold-start recovery MUST run as part of
+  `TaskManager.startup()` BEFORE any HTTP route binds. Implementers
+  MUST gate route binding on `startup()` returning.
+- **C-REC-2.** Periodic recovery loop MUST run every 300 seconds
+  (default `_PERIODIC_RECOVERY_INTERVAL_SECONDS`). It MUST share
+  the same `_recover_stale_tasks` implementation as the cold-start
+  scan (no divergence between cold-start filters and periodic-scan
+  filters). The shared filter MUST include
+  `source_type=<framework constant>` (C-FLT-1).
+- **C-REC-3.** Inline reclaim MUST be invoked on `.start()` against
+  an `in_progress` task whose lease is dead. The lifecycle resolver
+  MUST NOT block on the periodic loop.
+- **C-REC-4.** Recovery MUST NOT consume the retry budget
+  (C-RET-2 reiterated for emphasis).
+- **C-REC-5.** `drain_in_progress=True` at recovery time MUST be
+  honored: re-enter with `is_steered_turn=True` and use
+  `active_input` as `ctx.input`.
+
+### C-ERR (error taxonomy)
+
+- **C-ERR-1.** `TaskNotFound` MUST be raised only for genuinely
+  missing tasks.
+- **C-ERR-2.** `TaskConflictError` MUST be the SINGLE error type
+  for any "task is busy / not available" state.
+  `current_status` carries the observed status.
+- **C-ERR-3.** `TaskFailed.error` MUST be a structured dict with
+  at minimum `type` and `message`; `cause` is optional.
+- **C-ERR-4.** `_HostedConflict(_code, status_code)` is an internal
+  discriminator type. It is NOT exported and MUST NOT appear in
+  any public exception hierarchy, docstring, or error message.
+  The hosted provider's response classifier raises it for service
+  responses carrying a structured error code; the local provider
+  raises it directly for equivalent conditions. The framework
+  matches on `_code` per the §39.1 translation table.
+- **C-ERR-5.** Service error codes (`task_immutable`,
+  `invalid_state_transition`, `lease_held_by_another`,
+  `task_already_exists`, `lease_ownership_changed`, `etag_mismatch`,
+  `invalid_request`) MUST translate to the developer-facing
+  exceptions per §39.1. The translation table is the contract;
+  no service-code string appears in developer-visible types.
+- **C-ERR-6.** `etag_mismatch` MUST be retried transparently by the
+  framework (bounded retries with re-read). It escapes to
+  low-level callers as `EtagConflict` only when retries are
+  exhausted (the developer never sees it through `Task.run` /
+  `Task.start` / `MultiTurnTask.run` / `MultiTurnTask.start`).
+- **C-ERR-7.** `invalid_state_transition` is a framework bug
+  (framework drives transitions, not the developer). The
+  framework MUST log this condition and convert it to a
+  `RuntimeError` rather than propagating to developer code as a
+  task-API concept.
+
+### C-STM (streaming protocol)
+
+- **C-STM-1.** `EventStream` MUST be a 4-method protocol: `emit`,
+  `close`, `subscribe`, `last_cursor`. No destructive method on
+  the Protocol itself.
+- **C-STM-2.** Stream states are exactly `Active` and `Closed`.
+  There is no per-instance `Gone` state; destruction is a
+  registry-level concept (tombstone) surfaced as
+  `EventStreamNotFoundError` on the next operation against the
+  id.
+- **C-STM-3.** `emit(close=True)` MUST be observably atomic — every
+  subscriber attached BEFORE this call sees both the payload AND
+  the end-of-stream signal.
+- **C-STM-4.** `close()` MUST be idempotent (no-op on already-closed
+  or destroyed).
+- **C-STM-5.** `subscribe()` MUST return an `AsyncIterator`
+  directly (not a coroutine that resolves to one).
+- **C-STM-6.** `subscribe(after=N)`: if cursor support, yield only
+  payloads with cursor strictly greater than `N`; if no cursor
+  support, silently ignore the `after` argument.
+- **C-STM-7.** `last_cursor()` MUST work on `Closed` streams even
+  after all events have been TTL-evicted (load-bearing for
+  rehydration).
+- **C-STM-8.** Cursor TYPE is DESIGNED to be `int` (string cursors
+  introduce silent-wrong-comparison bugs). Implementations SHOULD
+  validate `cursor_fn` returns `int` at configurator time. The
+  canonical Python implementation does not validate today (a known
+  gap).
+- **C-STM-9.** Cursored backings MUST honor `cursor_fn` — never
+  assume payload field names (`sequence_number`, `event_id`, etc.).
+
+### C-STR-REG (streaming registry)
+
+- **C-STR-REG-1.** Six methods only on the registry: three sync
+  configurators (`use_in_memory_live`, `use_in_memory_replay`,
+  `use_file_backed_replay`) + three async lifecycle methods
+  (`get`, `get_or_create`, `delete`).
+- **C-STR-REG-2.** Default backing MUST be `BroadcastEventStream`
+  (live, no buffer).
+- **C-STR-REG-3.** `get_or_create(id)` MUST be atomic under
+  concurrent callers (per-id lock).
+- **C-STR-REG-4.** `delete(id)` MUST be idempotent and MUST
+  install a tombstone (even for ids that were never registered)
+  so a subsequent `get(id)` raises `EventStreamNotFoundError`.
+- **C-STR-REG-5.** Tombstone MUST be cleared on the next
+  `get_or_create(id)` for the same id.
+- **C-STR-REG-6.** `get(id)` MUST raise `EventStreamNotFoundError`
+  for ANY id that is not currently a live stream — whether it
+  was never registered, was explicitly `delete(id)`d, or had its
+  close-clock elapse (§46). `get(id)` MUST NOT itself install a
+  tombstone (only `delete(id)` and the close-clock auto-tombstone
+  do). There is no `EventStreamGoneError` — that error type has
+  been removed; every "id is not live" condition surfaces
+  uniformly as `EventStreamNotFoundError`.
+
+### C-STR-TTL (replay TTL)
+
+- **C-STR-TTL-1.** Per-event TTL eviction MUST run on every
+  `emit()` and `subscribe()` call, regardless of whether the
+  stream is `Active` or `Closed`. (Active streams use TTL to
+  bound buffer memory for long-running producers; Closed streams
+  use TTL to keep the per-event lifetime consistent until the
+  close-clock fires.)
+- **C-STR-TTL-2.** Auto-tombstone MUST happen when the stream is
+  `Closed` AND `now >= close_time + ttl_seconds` (the
+  "close-clock"). This is deterministic and time-driven, NOT
+  observer- or buffer-state-driven. There is no
+  `total_emit_count > 0` carve-out; a stream created, closed,
+  and never emitted to tombstones at `close_time + ttl_seconds`
+  like any other Closed stream. Implementations MAY drive the
+  clock via a wall-clock timer (preferred for production) or via
+  an opportunistic check on `get()` / `emit()` / `subscribe()`
+  (acceptable for tests). `last_cursor()` MUST remain
+  side-effect-free and MUST NOT trigger the tombstone check.
+- **C-STR-TTL-3.** `BroadcastEventStream` (live-only) MUST NOT
+  auto-tombstone; it tombstones only via explicit `delete()`.
+- **C-STR-TTL-4.** The close-clock and per-event TTL are
+  consistent by construction: for every event still in the
+  buffer at `close_time`, `emit_time <= close_time`, so
+  `emit_time + ttl_seconds <= close_time + ttl_seconds`. By the
+  time the close-clock fires, every per-event TTL has elapsed
+  and the next eviction sweep removes the events. Implementations
+  do NOT need to special-case "buffer not yet empty when the
+  close-clock fires."
+
+### C-STR-FBR (file-backed replay)
+
+- **C-STR-FBR-1.** Each stream MUST persist to
+  `storage_dir/<id>.jsonl`.
+- **C-STR-FBR-2.** Constructor MUST rehydrate from an existing
+  file (crash-recovery friendly).
+- **C-STR-FBR-3.** Optional `serializer` / `deserializer` callbacks
+  MUST be honored for non-JSON payloads. Default uses JSON.
+- **C-STR-FBR-4.** `delete()` and the close-clock auto-tombstone
+  MUST clean up the file before the registry tombstones the id.
+- **C-STR-FBR-5.** **File format.** Each emitted event is a single
+  JSONL line wrapping the payload + arrival time:
+
+  ```
+  {"emit_time": <float seconds>, "payload": <serialized payload>}
+  ```
+
+  On close, a sentinel line is appended:
+
+  ```
+  {"__terminal__": true}
+  ```
+
+- **C-STR-FBR-6.** **Rehydration robustness.** Constructor MUST
+  tolerate a trailing partial line (e.g. from a crash mid-write)
+  by truncating it. Mid-file malformed JSON lines MUST raise
+  (corruption signal, not recoverable). The TERMINAL sentinel, if
+  present anywhere mid-file, MUST be ignored unless it is the
+  final line.
+- **C-STR-FBR-7.** **Concurrency.** Implementations MUST use a
+  single-writer lock (POSIX `fcntl` advisory lock preferred,
+  `.lock` sentinel-file fallback) to prevent two processes from
+  appending to the same file concurrently. The lock guards the
+  file for the lifetime of the stream instance.
+- **C-STR-FBR-8.** **Compaction.** After ~1000 evictions,
+  implementations SHOULD rewrite the file to compact away evicted
+  lines (avoids unbounded file growth on long-lived streams with
+  short TTLs).
+
+### C-OUT (output persistence) — *removed*
+
+The framework does NOT persist handler outputs. There is no
+`payload["output"]` key, no `_output` attachment, and no
+`OutputTooLarge` exception. A multi-turn handler's `return X`
+resolves the in-process caller's `TaskRun.result()` future
+directly; a one-shot handler's `return X` does the same and the
+record is then deleted (one-shot is always ephemeral). Per-turn
+outputs that must survive crashes are the handler's responsibility
+(write through your own storage before returning).
+
+### C-INTROSPECT (introspection)
+
+- **C-INTROSPECT-1.** Read-only inspection of a persisted task
+  record MUST be available through the task manager's provider:
+  `await manager.provider.get(task_id)` returns the framework's
+  internal `TaskInfo` envelope (or `None` if the record does not
+  exist). The decorator surface (`Task` / `MultiTurnTask`) does NOT
+  expose a public `.get(task_id)` method; introspection goes
+  through the provider.
+- **C-INTROSPECT-2.** Active-execution inspection MUST be available
+  through `Task.get_active_run(task_id)` / `MultiTurnTask.get_active_run(task_id, input_id)`,
+  which return a `TaskRun` handle bound to the live execution
+  (or `None` if the task is not currently in flight in this
+  process and cannot be reclaimed inline).
+
+### C-WQ (per-task write serialization)
+
+- **C-WQ-1.** All in-process writes to a single `task_id` MUST
+  be serialized through a per-task FIFO write queue (§25.2).
+  Concurrent metadata flushes, lease renewals, steering
+  appends, and drain writes within the same process MUST NOT
+  race against each other.
+- **C-WQ-2.** The write queue is in-process only. Cross-process
+  serialization is provided by the server's ETag/CAS check
+  (412 on mismatch), not by the queue.
+- **C-WQ-3.** Per-op 412 policy MUST follow the table in §25.3:
+  retries with re-read for metadata-flush / steering-append /
+  drain Phase 1 / drain Phase 3 / lease-renewal (with
+  ownership re-check); RE-READ-AND-DECIDE for terminal writes
+  (retry if lease still ours and status still in_progress,
+  ABANDON if lease lost or status already terminal); ABANDON
+  for reclaims; default budget 5 attempts.
+
+### C-FLT (recovery scan filter)
+
+- **C-FLT-1.** The cold-start AND periodic recovery scans MUST
+  include `source_type=<framework constant>` in the `list()`
+  filter so the framework only inspects tasks created by its
+  own decorator. Tasks created by other systems (sharing the
+  same agent_name + session_id scope) MUST NOT be enumerated
+  by the framework's reclaim path. This closes a gap where a
+  multi-tenant session could surface unrelated records and the
+  framework would attempt to dispatch them to nonexistent
+  callbacks.
+
+### C-PRV (provider abstraction)
+
+- **C-PRV-1.** `provider.get(task_id)` MUST return `None` for
+  missing tasks (not raise).
+- **C-PRV-2.** `provider.update()` MUST honor `if_match` for CAS.
+- **C-PRV-3.** Payload merge MUST be shallow (top-level keys
+  merged; nested objects replaced wholesale).
+- **C-PRV-4.** Tags merge MUST be per-key with null-as-delete.
+- **C-PRV-5.** Attachments merge MUST be per-key with null-as-delete
+  (mirrors tags; §23.1).
+- **C-PRV-6.** Provider `delete()` MAY raise on missing records
+  (the canonical Python implementations do — hosted raises on
+  404, local raises on missing file). The user-facing
+  `MultiTurnTask.delete(task_id)` MUST catch "not found" provider exceptions
+  and re-raise as `TaskNotFound`; the higher-level
+  `Task`-managed delete path SHOULD be idempotent (no-op on
+  already-deleted). Implementers MAY make `provider.delete()`
+  itself idempotent if their store cleanly distinguishes.
+- **C-PRV-7.** `provider.list(...)` MUST filter server-side.
+- **C-PRV-8.** `provider.list(...)` MUST support `agent_name` and
+  `session_id` as **optional** filters (workspace-wide listing when
+  both are null), matching the service. The local provider MUST
+  also accept both as optional (search across all
+  `<agent_name>/<session_id>/` directories under the storage root).
+- **C-PRV-9.** `provider.list(...)` MUST support these additional
+  filters, all optional, all enforced server-side: `has_error`,
+  `lease_expired`, `lease_owner`, `tag` (list of key:value pairs,
+  AND semantics), `source_type`, `status` (with legacy `"done"` →
+  `"completed"` normalization).
+- **C-PRV-10.** `provider.list(...)` MUST support pagination via
+  opaque `after` cursor + `limit` (default 20, max 100, provider
+  clamps over-cap). `before` MUST be rejected as `invalid_request`
+  (cursor pagination forward-only). `order` accepts `"asc"` or
+  `"desc"` by `created_at` (default `"desc"`). Per §31a.
+- **C-PRV-11.** `provider.list(...)` MUST support
+  `omit_attachment_values` boolean. When true, returned tasks
+  carry attachment keys with `None` values (skip per-row blob
+  reads). Default false. Per §31a.
+- **C-PRV-12.** The opaque pagination cursor in the response
+  (`LastId` / `next_page_token`) MUST be treated as opaque by the
+  framework. The local provider mints its own cursor (plain
+  `task_id`); the hosted provider round-trips whatever opaque
+  token the service returns (up to 4096 chars).
+
+### C-OBS (observability — minimal)
+
+- **C-OBS-1.** The framework MUST emit structured log events at:
+  `create`, `lease renewal failure`, `eviction detected`,
+  `reclaim`, `recovery start`, `recovery skip (no callback)`,
+  `suspend`, `complete`, `fail`, `steering append`, `steering
+  drain`, `orphan attachment cleanup`. Log level minimum `INFO`
+  except where noted.
+- **C-OBS-2.** Logger names MUST be hierarchical under
+  `azure.ai.agentserver.durable` (or language-equivalent).
+
+---
+
+
+## Part IX — References
+
+- **Foundry Task Storage Protocol Specification** — the wire-level
+  contract for the hosted task store (routes, request/response
+  envelopes, server-side merge rules, authentication, activation,
+  ETag/CAS, error codes). The framework conforms to that contract;
+  this document only describes how the framework *uses* the store.
+- **Speckit specs (historical, dev-side only)** — `001-durable-tasks`
+  through `018-task-attachments` under contributor `specs/` working
+  trees. Each is a point-in-time record of how a specific feature
+  was scoped and built; the current state of every feature lives
+  in THIS document. These are not source-controlled and are
+  intentionally not linked.
+- **Canonical Python implementation:**
+  `sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/durable/`
+  and `.../streaming/`. Tests at `tests/durable/` and
+  `tests/streaming/` cover the conformance items in Part VIII.
+
+## Part X — Appendices (informative)
+
+### §A. Language-mapping cheat sheet
+
+The body of this spec uses Python-style names and types
+(`asyncio.Event`, `MutableMapping`, `AsyncIterator`, `timedelta`,
+`@classmethod`). These are illustrative; the *behavior* is what
+implementers MUST match. Mappings:
+
+| Spec uses | Conceptual meaning | .NET idiom | Notes |
+|---|---|---|---|
+| `asyncio.Event` | Awaitable level-triggered signal. | `ManualResetEventSlim` / `TaskCompletionSource<bool>`. | Must be set-once / observable many times. |
+| `asyncio.CancelledError` | Cooperative-cancel exception that callers may raise to bail. | `OperationCanceledException` (with the framework's own custom subclass). | The framework's `TaskCancelled` MUST NOT inherit the language's generic cancel exception (C-CAN-4). |
+| `MutableMapping` | Dict-like with `__getitem__` / `__setitem__` / `__contains__` / `__iter__` / `.get()`. | `IDictionary<string, object?>` or a custom map type. | Mutation visibility limited to the namespace. |
+| `AsyncIterator` | Iterator over `__anext__` that may suspend. | `IAsyncEnumerable<T>`. | `subscribe()` returns this directly (not an awaitable that resolves to one). |
+| `timedelta` | Duration. | `TimeSpan`. | All durations in the spec MAY be expressed in seconds. |
+| `tuple[type[Exception], ...]` | Type predicate for retryable exceptions. | `Func<Exception, bool>` or `IReadOnlyList<Type>`. | Used by `RetryPolicy.retry_on`. |
+| `@classmethod` factory presets | Static factory methods. | `static` methods. | `RetryPolicy.exponential_backoff()` etc. |
+| Pydantic `model_dump()` | Optional model-aware serialization. | `System.Text.Json` / `Newtonsoft.Json` round-trip. | Implementer note: try model-aware first, fall back to plain JSON. |
+| Starlette `Route` | HTTP route binding. | ASP.NET Core `MapPost`. | The framework does not contribute any HTTP route by itself; route bindings are the host framework's concern. |
+
+The spec uses these Python names because the canonical
+implementation lives in Python. Re-implementations SHOULD use
+language-idiomatic names while preserving the documented behavior.
+
+### §B. Representative full task record
+
+A single JSON document showing how every concept in this spec
+composes. This is a deep-research task mid-life: function input
+was promoted, three steering inputs are queued (one inline, two
+promoted), one drain has already happened so `next_input_seq` is
+ahead of the live keys, both default and named
+metadata namespaces are populated, framework state slots are set.
+
+```json
+{
+  "object": "task",
+  "id": "research-session-abc123",
+  "agent_name": "durable-research-agent",
+  "session_id": "session-abc123",
+  "title": "Deep research on transformer trends 2026",
+  "status": "in_progress",
+
+  "lease": {
+    "owner": "durable-research-agent|session:session-abc123",
+    "instance_id": "worker-12-3f8a9d-1780912345",
+    "generation": 7,
+    "expires_at": "2026-06-09T04:05:30.123Z",
+    "expiry_count": 0
+  },
+
+  "tags":   { "_task_name": "deep_research" },
+  "source": {
+    "type":           "agentserver.task",
+    "name":           "deep_research",
+    "server_version": "azure-ai-agentserver-core/2.0.0b6 (python/3.12)"
+  },
+
+  "payload": {
+    "input": {
+      "__attachment_ref__": {
+        "key":  "_input",
+        "hash": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
+      }
+    },
+
+
+    "metadata": {
+      "completed_phases":  3,
+      "in_progress_phase": 4,
+      "completed_subcalls": 2
+    },
+    "metadata:session": {
+      "history": [
+        { "role": "user",      "content": "Research deep learning trends" },
+        { "role": "assistant", "content": "Phase 3 of 15..." }
+      ],
+      "turn_count": 5
+    },
+
+    "_steering": {
+      "pending_inputs": [
+        "Quick note: prioritise post-2024 papers",
+        {
+          "__attachment_ref__": {
+            "key":  "_steering_input_3",
+            "hash": "sha256:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2"
+          }
+        },
+        {
+          "__attachment_ref__": {
+            "key":  "_steering_input_4",
+            "hash": "sha256:f0e1d2c3b4a5968778695a4b3c2d1e0f9a8b7c6d5e4f3a2b1c0d9e8f7a6b5c4d"
+          }
+        }
+      ],
+      "next_input_seq":    5,
+      "cancel_requested":  true,
+      "drain_in_progress": false,
+      "active_input":      null
+    },
+
+    "_turn_started_at": "2026-06-09T03:50:00.000Z",
+    "_retry_attempt":   0,
+    "_last_input_id":   "msg_abc123"
+  },
+
+  "attachments": {
+    "_input": {
+      "topic":   "deep learning trends 2026",
+      "depth":   "comprehensive",
+      "context": "<~800 KB of caller-supplied reference material>"
+    },
+    "_steering_input_3": {
+      "instruction": "refocus on transformer architectures",
+      "context":     "<~600 KB of caller-supplied reference material>"
+    },
+    "_steering_input_4": {
+      "instruction": "include reinforcement learning hybrids",
+      "context":     "<~500 KB of caller-supplied reference material>"
+    }
+  },
+
+  "etag":         "\"5e00450b-0000-0800-0000-6a223e670000\"",
+  "created_at":   "2026-06-09T03:45:00.000Z",
+  "updated_at":   "2026-06-09T03:55:30.123Z",
+  "started_at":   "2026-06-09T03:45:01.234Z",
+  "completed_at": null,
+  "error":              null,
+  "suspension_reason":  null
+}
+```
+
+What this single document demonstrates:
+
+| Concept | Where to look |
+|---|---|
+| Status, identity, timestamps | top-level fields |
+| Lease (§22) | `lease.owner`, `lease.instance_id`, `lease.generation` |
+| Framework-stamped routing (§21) | `tags._task_name`, `source.name` |
+| Input promoted to attachment (§23) | `payload.input` is a ref; `attachments._input` holds the value |
+| Multiple metadata namespaces (§17) | `payload.metadata` + `payload["metadata:session"]` |
+| Steering queue with mixed shapes (§12, §23) | `_steering.pending_inputs[0]` inline; `[1]`, `[2]` refs |
+| Monotonic seq invariant (§23.5) | `next_input_seq: 5` with live keys `_3` + `_4` — one drain consumed `_0`/`_1`/`_2`, no renumbering |
+| Steering mechanism state (§12) | `cancel_requested`, `drain_in_progress`, `active_input` |
+| Per-turn watchdog source of truth (§14) | `_turn_started_at` |
+| Durable retry counter (§15) | `_retry_attempt` |
+| Last-input-id chain (§11) | `_last_input_id` |
+| ETag CAS (§25) | `etag` |
+| Worst-case attachment count (§23.2) | 4 of 20 slots used here; framework reserves at most 11 (1 + 9 + 1) |
+
+Simpler scenarios drop fields:
+
+- **Small inputs only**: `payload.input` is the raw JSON value;
+  `pending_inputs` is all raw values; `attachments` is absent
+  (no output is ever persisted; §11/§20/C-OUT).
+- **Handler returned `X` from a turn (multi-turn implicit suspend)**:
+  `payload` has no `output` key; `attachments` has no `_output`
+  entry. The handler's return value is delivered to the in-process
+  awaiter of `TaskRun.result()` only.
+- **Just-after-resume**: `payload.input` holds the new input
+  (inline or ref); no `output` key on the record (and never was).
+- **Cold start, no steering**: `_steering` absent; `next_input_seq`
+  doesn't appear.
+
+### §C. Steering sequence (append → cancel → drain → result)
+
+```
+                                                              ┌─ time ─▶
+Caller A                Framework                Caller B              Handler
+   │  .start(t,A) ───▶ create + execute_task ───────────────────────▶ enter(fresh, input=A)
+   │                                                                  │
+   │                                                                  │ doing work...
+   │                            .start(t,B) ◀───────│                 │
+   │                            ↓                                     │
+   │              steering_append PATCH (queue B,                     │
+   │              cancel_requested=true, attachment if >20K)          │
+   │              + signal ctx.cancel locally  ─────────────────────▶ ctx.cancel.is_set() == True
+   │                                                                  │
+   │                                                                  │ winds down via strategy A
+   │                                                                  │  → return X
+   │              ◀──────────── suspend resolves                      │
+   │                            future of A with                      │
+   │                            await run.result() → X                │
+   │                                                                  │
+   │                            _try_drain_steering()                 │
+   │                            ↓                                     │
+   │                            Phase 1 PATCH: pop B,                 │
+   │                            delete _steering_input_<seq>,         │
+   │                            drain_in_progress=true,               │
+   │                            _turn_started_at refreshed            │
+   │                            ↓                                     │
+   │                            build new ctx,                        │
+   │                            entry_mode=resumed,                   │
+   │                            is_steered_turn=true ────────────────▶ enter(resumed steered, input=B)
+   │                            ↓                                     │
+   │                            Phase 3 PATCH: drain_in_progress=     │
+   │                            false, _retry_attempt=0               │
+   │                                                                  │
+   │                                                                  │ handler runs to completion
+   │                                                                  │  → return Y
+   │                       _handle_suspend(): write suspended,        │
+   │                       clear active_input, clear input,           │
+   │                       delete _input attachment if ref            │
+   │                                            ─────▶ B's future     │
+   │                                                  await run.result()
+   │                                                    → Y
+   ▼                                            ▼                     ▼
+```
+
+If between Phase 1 and Phase 3 the process crashes, the next
+recovery reads `drain_in_progress=true` and re-enters from
+`active_input` with `is_steered_turn=true` (§52 race-recovery
+contract).
+
+### §D. Cold-start recovery sequence
+
+```
+Process starts:
+   1. TaskManager.__init__():
+       - lease_owner   = "<agent>|session:<sess>"
+       - instance_id   = "worker-<pid>-<rand>-<unix>"
+       - register decorator-discovered functions in
+         _resume_callbacks  by source.name
+   2. await manager.startup():
+       a. Provider.list(agent, sess, status="in_progress",
+                        lease_owner=self.owner,
+                        source_type=_SOURCE_TYPE)   # framework-only scope
+       b. For each task in the list:
+           - if active_locally: skip
+           - _steering_cleanup_orphan_attachments(task) (§58)
+           - reclaim (PATCH lease to self, with if_match=etag —
+             on 412, ABANDON; next scan re-evaluates)
+           - look up resume callback by source.name
+           - if no callback: log and skip (we cannot recover
+             what we did not register)
+           - hydrate ctx.input from payload['input'] (resolve
+             ref via attachments if needed)
+           - entry_mode := computed from status + drain_in_progress
+           - spawn lease_renewal_loop, watchdog, execute_task_loop
+       c. spawn _periodic_recovery_loop as background task
+   3. Bind HTTP routes (only AFTER step 2 returns).
+```
+
+The "bind HTTP routes only after `startup()` returns" rule is
+load-bearing — it guarantees that handlers waiting to be
+recovered are visible before any HTTP traffic could land that
+might call into them.
+
+**Note on the recovery-scan list filter.** The list call passes
+`source_type=_SOURCE_TYPE` so the scan returns ONLY tasks created
+by this framework. Foreign-typed records in the same
+`(agent_name, session_id, lease_owner)` scope are never picked
+up. This avoids the wasted-reclaim case where a foreign record
+matching the lease owner triple would otherwise be PATCH-touched
+before being dropped by the resume-callback lookup.
+
+---
+
+
+---
+
+## Document status
+
+- **Version:** 1.0 (initial unified authoritative spec).
+- **Maintenance:** Update this document on every change that
+  affects developer-visible behavior or wire shape. Update the
+  conformance items in Part VIII when adding new behaviors.
+- **Format:** Markdown; intended for both human reading and agent
+  consumption.
+- **Location:** `sdk/agentserver/azure-ai-agentserver-core/docs/task-and-streaming-spec.md`.
+  This document is source-controlled and is the ground-truth
+  reference for Copilot/agent grounding when building or modifying
+  the primitives.
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/README.md b/sdk/agentserver/azure-ai-agentserver-invocations/README.md
index 5e9dfe515657..77197ff54d9b 100644
--- a/sdk/agentserver/azure-ai-agentserver-invocations/README.md
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/README.md
@@ -190,12 +190,15 @@ To report an issue with the client library, or request additional features, plea
 
 ## Next steps
 
-Visit the [Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-invocations/samples) folder for complete working examples:
+Visit the [Samples](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples) folder for the **durable** examples:
 
 | Sample | Description |
 |---|---|
-| [simple_invoke_agent](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-invocations/samples/simple_invoke_agent/) | Minimal synchronous request-response |
-| [async_invoke_agent](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-invocations/samples/async_invoke_agent/) | Long-running operations with polling and cancellation |
+| [durable_research](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/) | Long-running research agent with file-backed checkpoints |
+| [durable_multiturn](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/) | Multi-turn suspend / resume on top of `@multi_turn_task` |
+| [durable_langgraph](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/) | LangGraph integration with durable checkpoints |
+| [durable_copilot](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/) | GitHub Copilot SDK durable agent |
+| [durable-agent-demo](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/) | End-to-end long-running + crash + steer demo |
 
 ## Contributing
 
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/async_invoke_agent/async_invoke_agent.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/async_invoke_agent/async_invoke_agent.py
deleted file mode 100644
index cde877039960..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-invocations/samples/async_invoke_agent/async_invoke_agent.py
+++ /dev/null
@@ -1,168 +0,0 @@
-"""Async invoke agent example.
-
-Demonstrates get_invocation and cancel_invocation for long-running work.
-Invocations run in background tasks; callers poll or cancel by ID.
-
-.. warning::
-
-    **In-memory demo only.**  This sample stores all invocation state
-    (``self._tasks``, ``self._results``) in process memory.  Both in-flight
-    ``asyncio.Task`` objects and completed results are lost on process restart
-    — which *will* happen during platform rolling updates, health-check
-    failures, and scaling events.
-
-    For production long-running invocations:
-
-    * Persist results to durable storage (Redis, Cosmos DB, etc.) inside
-      ``_do_work`` **before** the method returns.
-    * On startup, rehydrate any incomplete work or mark it as failed.
-    * Consider an external task queue (Celery, Azure Queue, etc.) instead
-      of ``asyncio.create_task`` for work that must survive restarts.
-
-Usage::
-
-    # Start the agent
-    python async_invoke_agent.py
-
-    # Start a long-running invocation
-    curl -X POST http://localhost:8088/invocations -H "Content-Type: application/json" -d '{"query": "analyze dataset"}'
-    # -> x-agent-invocation-id: abc-123
-    # -> {"invocation_id": "abc-123", "status": "running"}
-
-    # Poll for result
-    curl http://localhost:8088/invocations/abc-123
-    # -> {"invocation_id": "abc-123", "status": "running"}   (still working)
-    # -> {"invocation_id": "abc-123", "status": "completed"} (done)
-
-    # Or cancel
-    curl -X POST http://localhost:8088/invocations/abc-123/cancel
-    # -> {"invocation_id": "abc-123", "status": "cancelled"}
-"""
-import asyncio
-import json
-
-from starlette.requests import Request
-from starlette.responses import JSONResponse, Response
-
-from azure.ai.agentserver.invocations import InvocationAgentServerHost
-
-
-# In-memory state for demo purposes (see module docstring for production caveats)
-_tasks: dict[str, asyncio.Task] = {}
-_results: dict[str, bytes] = {}
-
-app = InvocationAgentServerHost()
-
-
-async def _do_work(invocation_id: str, data: dict) -> bytes:
-    """Simulate long-running work.
-
-    :param invocation_id: The invocation ID for this task.
-    :type invocation_id: str
-    :param data: The parsed request data.
-    :type data: dict
-    :return: JSON result bytes.
-    :rtype: bytes
-    """
-    await asyncio.sleep(5)
-    result = json.dumps({
-        "invocation_id": invocation_id,
-        "status": "completed",
-        "output": f"Processed: {data}",
-    }).encode()
-    _results[invocation_id] = result
-    return result
-
-
-@app.invoke_handler
-async def handle_invoke(request: Request) -> Response:
-    """Start a long-running invocation in a background task.
-
-    :param request: The raw Starlette request.
-    :type request: starlette.requests.Request
-    :return: JSON status indicating the task is running.
-    :rtype: starlette.responses.JSONResponse
-    """
-    data = await request.json()
-    invocation_id = request.state.invocation_id
-
-    task = asyncio.create_task(_do_work(invocation_id, data))
-    _tasks[invocation_id] = task
-
-    return JSONResponse({
-        "invocation_id": invocation_id,
-        "status": "running",
-    })
-
-
-@app.get_invocation_handler
-async def handle_get_invocation(request: Request) -> Response:
-    """Retrieve a previous invocation result.
-
-    :param request: The raw Starlette request.
-    :type request: starlette.requests.Request
-    :return: JSON status or result.
-    :rtype: starlette.responses.JSONResponse
-    """
-    invocation_id = request.state.invocation_id
-
-    if invocation_id in _results:
-        return Response(content=_results[invocation_id], media_type="application/json")
-
-    if invocation_id in _tasks:
-        task = _tasks[invocation_id]
-        if not task.done():
-            return JSONResponse({
-                "invocation_id": invocation_id,
-                "status": "running",
-            })
-        result = task.result()
-        _results[invocation_id] = result
-        del _tasks[invocation_id]
-        return Response(content=result, media_type="application/json")
-
-    return JSONResponse({"error": "not found"}, status_code=404)
-
-
-@app.cancel_invocation_handler
-async def handle_cancel_invocation(request: Request) -> Response:
-    """Cancel a running invocation.
-
-    :param request: The raw Starlette request.
-    :type request: starlette.requests.Request
-    :return: JSON cancellation status.
-    :rtype: starlette.responses.JSONResponse
-    """
-    invocation_id = request.state.invocation_id
-
-    # Already completed — cannot cancel
-    if invocation_id in _results:
-        return JSONResponse({
-            "invocation_id": invocation_id,
-            "status": "completed",
-            "error": "invocation already completed",
-        })
-
-    if invocation_id in _tasks:
-        task = _tasks[invocation_id]
-        if task.done():
-            # Task finished between check — treat as completed
-            _results[invocation_id] = task.result()
-            del _tasks[invocation_id]
-            return JSONResponse({
-                "invocation_id": invocation_id,
-                "status": "completed",
-                "error": "invocation already completed",
-            })
-        task.cancel()
-        del _tasks[invocation_id]
-        return JSONResponse({
-            "invocation_id": invocation_id,
-            "status": "cancelled",
-        })
-
-    return JSONResponse({"error": "not found"}, status_code=404)
-
-
-if __name__ == "__main__":
-    app.run()
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/.gitignore b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/.gitignore
new file mode 100644
index 000000000000..017b94ddacc3
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/.gitignore
@@ -0,0 +1,11 @@
+# azd environment
+.azure/*/state/
+.azure/*/*.env.bak
+
+# Demo client runtime
+.demo-session
+
+# Docker-build staging dir — populated by ./build.sh which copies
+# the checked-in wheels from sdk/agentserver/wheels/ into here. Never
+# committed: source of truth is the central wheels directory.
+src/durable-research-agent/wheels/
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/README.md b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/README.md
new file mode 100644
index 000000000000..b96956a1ee82
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/README.md
@@ -0,0 +1,439 @@
+# Durable Research Agent — Demo
+
+> **▶ Deploy it (hosted, recommended):** `azd deploy` this sample, then drive it
+> with **[`demo-client.sh`](demo-client.sh)** (it auto-resolves the endpoint from
+> your azd env). Validated end-to-end on a hosted Foundry deployment — run,
+> stream, reconnect, and steer all work against the hosted task API. Prefer an
+> offline run? The verified local kit in **[`local/`](local/README.md)**
+> exercises the same run → crash → recover → verify flow file-backed
+> (`cd local && ./setup.sh && ./run.sh`).
+
+A `@multi_turn_task`-decorated long-running research agent that demonstrates two
+platform capabilities of the Azure AI Hosted Agent + durable-task primitive:
+
+1. **Long-running tasks run uninterrupted past the platform's sandbox-eviction window.**
+   The framework's `PATCH .../tasks/<id>` lease-renewal cycle (every ~30s,
+   half of the 60s lease) signals activity through the task-storage API,
+   which refreshes the platform's sandbox idle-reclaim timer. The demo
+   runs for ~33 min with **zero client-side keepalive ingress** and the
+   sandbox stays warm the whole time. Validated end-to-end against a
+   hosted Foundry deployment.
+
+2. **Recovery from container crashes.** When the agent container dies
+   (intentional crash or OOM), the platform's nanny worker brings it
+   back within ~1 min (43s measured) **without any new client ingress**.
+   The durable task automatically resumes from its last checkpoint
+   (`ctx.entry_mode == "recovered"` + a `recovered` SSE event with
+   `completed_phases`). User-visible: any reconnect attempt — whenever
+   the user gets around to it — seamlessly continues the run.
+
+3. **Steering.** Sending a new turn on a running steerable task queues
+   the input and signals cooperative cancel. The agent winds down the
+   current turn at the next checkpoint boundary and re-enters with the
+   queued input as a fresh turn (with the prior topic surfaced for the
+   viewer to see).
+
+What the agent actually does: 15 logical research phases on whatever
+topic the caller supplies. Each phase runs a small agent loop
+(research → critique → refine → synthesize) against `gpt-4o`,
+streaming every token to the consumer. The handler checkpoints to
+`ctx.metadata` and flushes **after each subcall** — so a crash
+mid-phase recovers at the next un-finished subcall (worst case: the
+one that was actively streaming is replayed). A steerer that arrives
+mid-phase causes the handler to wind down at the next phase boundary,
+not abruptly. Hosted defaults target a ~33-min wall-time run (spanning
+2x the sandbox-eviction window so every demo run exercises the lease
+keep-alive path); local `agent.py` defaults are shorter for dev
+iteration.
+
+Between subcalls and between phases the agent sleeps for
+`INTRA_PHASE_COOLDOWN_SEC` / `INTER_PHASE_COOLDOWN_SEC` (30s each in
+the hosted defaults). A `cooldown` SSE event is emitted at the start
+of each pause so the terminal shows a low-key
+`...cooling down 30s (between subcalls) — next: subcall 3/4 in phase 2/15`
+line instead of going silent.
+
+## Run locally (offline alternative)
+
+For an offline run with no hosted dependency, the durable crash-recovery flow
+can also be exercised **locally** — file-backed task store. A
+ready-to-run, verified kit lives in [`local/`](local/README.md):
+
+```bash
+cd local
+./setup.sh        # builds a venv from ../../../../wheels + deps
+
+az login
+export FOUNDRY_PROJECT_ENDPOINT="https://<account>.services.ai.azure.com/api/projects/<project>"
+export AZURE_AI_MODEL_DEPLOYMENT_NAME="gpt-4o"
+
+./run.sh          # automated: run -> crash -> restart -> recover -> verify
+./serve.sh        # or drive it yourself (curl http://localhost:8088/invocations)
+```
+
+See [`local/README.md`](local/README.md) for the manual curl recipe and how the
+local durable backend works (`AGENTSERVER_TASKS_BACKEND=local` +
+`FOUNDRY_AGENT_SESSION_ID`).
+
+## Prerequisites
+
+- Python 3.11+
+- Azure subscription with AI Foundry access
+- [Azure Developer CLI](https://learn.microsoft.com/azure/developer/azure-developer-cli/install-azd)
+- `azd` AI agents extension: `azd extension install azure.ai.agents`
+
+## Deploy
+
+```bash
+# 1. Stage the checked-in durable-task preview wheels into the docker
+#    build context (build.sh just copies sdk/agentserver/wheels/*.whl
+#    into a per-sample gitignored staging dir — no compilation, no PyPI
+#    fetch)
+./build.sh
+
+# 2. Login + deploy
+azd auth login
+azd up
+```
+
+The deploy provisions infra + ships the container image and prints the
+invocations endpoint. Point `demo-client.sh` at your deployment by
+setting the `ENDPOINT=` env var (or editing the default near the top of
+`demo-client.sh`).
+
+> The durable-task primitive (`@task` / `@multi_turn_task`) is in
+> **private preview** and is not on PyPI. It ships only as the
+> pre-release wheels checked into
+> [`sdk/agentserver/wheels/`](../../../../wheels). See
+> [`sdk/agentserver/wheels/README.md`](../../../../wheels/README.md)
+> for the consumption workflow in your own project.
+
+## demo-client.sh — command reference
+
+The client is a bash CLI. Each command operates on a single session
+tracked locally in `.demo-session`. Run from this directory:
+
+| Command | What it does |
+|---|---|
+| `./demo-client.sh start "<topic>"` | **Allocates a new session id** (UUID), writes it to `.demo-session`, dispatches `POST /invocations` with the topic, then attaches to the SSE stream. |
+| `./demo-client.sh stream` | Reuses the session + invocation from `.demo-session` and (re)attaches to the SSE stream. Passes `?last_event_id=N` so the server skips events you've already seen. |
+| `./demo-client.sh steer "<topic>"` | Reuses the current session and sends a new `POST /invocations` with the new topic. If the run is still active the framework queues this as a steering input; the agent winds down at the next checkpoint boundary and re-enters on the new topic. |
+| `./demo-client.sh cancel` | `POST /invocations/{id}/cancel` on the current invocation. The handler observes `ctx.cancel.is_set()` and winds down cooperatively. |
+| `./demo-client.sh crash` | Sends `POST /invocations` with `{"message": "crash"}`. The agent (gated by `DEMO_MODE=1`) calls `os._exit(137)`. The platform's nanny worker brings the container back within ~1 min on its own — `./demo-client.sh stream` any time after will pick up the recovered run (no need to wait for or trigger anything). |
+| `./demo-client.sh status` | Prints the local `SESSION_ID`, `INV_ID`, and `LAST_EVENT_ID` from `.demo-session`. Useful when you forget which session you're on. |
+| `./demo-client.sh logs` | Tails the agent container's stdout/stderr via `azd ai agent monitor --session-id <current> --follow`. |
+| `./demo-client.sh reset` | Deletes `.demo-session`. The next `start` will allocate a fresh session id. |
+
+### Session-id lifecycle
+
+There is exactly **one active session per `.demo-session` file**:
+
+```
+./demo-client.sh start "<topic>"
+        │
+        ├─ SESSION_ID = demo-<uuid>     ← newly allocated by the client
+        ├─ INV_ID    = inv_<...>        ← assigned by the platform on POST
+        └─ written to .demo-session
+                │
+                ▼  these commands REUSE the same session id:
+        ./demo-client.sh stream
+        ./demo-client.sh steer "<new topic>"
+        ./demo-client.sh crash
+        ./demo-client.sh cancel
+        ./demo-client.sh logs
+        ./demo-client.sh status
+
+To switch to a NEW session id:
+        ./demo-client.sh reset            # clears .demo-session
+        ./demo-client.sh start "<topic>"  # allocates a fresh demo-<uuid>
+```
+
+### Inspecting container logs
+
+`./demo-client.sh logs` opens a follow tail on the agent container's
+stdout/stderr for the current session. Useful framework log lines:
+
+- `TaskManager starting (owner=..., instance=worker-N-..., hosted=True)` —
+  a fresh container booted.
+- `Reclaimed stale task <task_id>` / `Recovered task <task_id> is now active` —
+  durable recovery picked up where the previous lifetime left off.
+- `Inbound GET /readiness completed with status 200` — the platform's
+  container health probe (a good signal that the container just came up).
+- `HTTP Request: POST .../openai/v1/responses "HTTP/1.1 200 OK"` — each
+  LLM call the agent makes.
+- `Task <task_id> suspended` / `Steering drain: task <task_id> drained next input` —
+  cooperative wind-down + steering re-entry.
+
+For one-shot queries, invoke `azd ai agent monitor` directly:
+
+```bash
+SESSION_ID=$(grep SESSION_ID .demo-session | cut -d'"' -f2)
+azd ai agent monitor --session-id "$SESSION_ID" --tail 100
+azd ai agent monitor --session-id "$SESSION_ID" --type system   # container start/stop events
+```
+
+## Three demo workflows
+
+### A. Long-running run with no client-side keepalive (~33 min wall time)
+
+This run intentionally outlasts the platform's 15-min sandbox-eviction
+window — the framework's lease-renewal cycle keeps the sandbox warm.
+
+```bash
+# t = 0:00
+./demo-client.sh start "the future of nuclear fusion"
+# Stream events. Note server_time_utc + server_uptime_sec on each event.
+
+# t = 5:00
+# Detach (Ctrl-C). Make zero ingress for the next 20-25 min.
+
+# t = 25:00 — open a new terminal:
+./demo-client.sh stream
+# The container is the SAME instance (no reclaim happened) because the
+# framework's PATCH .../tasks/<id> lease renewals kept the platform's
+# idle timer fresh. Your reconnect resumes the live SSE stream;
+# server_uptime_sec is now ~25 min, not reset to 0.
+```
+
+### B. Explicit crash + nanny restoration (no ingress required)
+
+```bash
+# Terminal 1: start a run and leave it streaming.
+./demo-client.sh start "fusion energy research priorities"
+# Wait until 3-4 phases have completed.
+
+# Terminal 2: force a crash.
+./demo-client.sh crash
+# Server returns 202 then os._exit(137). Terminal 1's stream disconnects.
+
+# Wait — DO NOT send any new ingress. The platform's nanny brings the
+# container back within ~1 min entirely on its own (validated: 43 sec
+# from crash to new worker_instance_id in a hosted Foundry
+# deployment). The durable task auto-resumes from the last checkpoint
+# inside the new process — you don't need to do anything.
+
+# When you want to verify recovery:
+./demo-client.sh stream
+# You'll see:
+#   🔁 Recovered from crash   completed_phases=3
+#   server_uptime_sec=<some-value-much-larger-than-1>
+# Stream picks up at phase 4, NOT phase 1.
+```
+
+### C. Steering (mid-run topic switch)
+
+```bash
+# Terminal 1:
+./demo-client.sh start "deep learning interpretability"
+# Wait until phase 2 starts streaming.
+
+# Terminal 2:
+./demo-client.sh steer "alignment of frontier models"
+# Server queues the new input; the running turn keeps going until the
+# next phase boundary.
+
+# Terminal 1 (within ~3 min, at the next phase boundary):
+#   ↓ Winding down   cause=steering   completed=2/15
+#   ▶ Run start    topic=alignment of frontier models
+#                  (steered from prior topic: deep learning interpretability)
+#   ▶ Phase 1/15 — Decomposing topic into focused research questions
+#   ...
+```
+
+## Architecture
+
+```
+┌───────────────────────────────────────────────────────────────────────────┐
+│  Foundry Hosted-Agent Sandbox (platform-managed lifecycle)                │
+│  ┌─────────────────────────────────────────────────────────────────────┐  │
+│  │  python app.py        (InvocationAgentServerHost on :8088)          │  │
+│  │  ┌───────────────────────────────────────────────────────────────┐  │  │
+│  │  │  POST /invocations                                            │  │  │
+│  │  │     └─ {"message": "<topic>"} →                               │  │  │
+│  │  │           deep_research.start(task_id=session_id, input=...)  │  │  │
+│  │  │        on an active steerable task: queued as a steering input│  │  │
+│  │  │     └─ {"message": "crash"} (DEMO_MODE=1 only) → os._exit     │  │  │
+│  │  │                                                               │  │  │
+│  │  │  GET /invocations/{id}?last_event_id=N                        │  │  │
+│  │  │     └─ (await streams.get(id)).subscribe(after=N) → SSE       │  │  │
+│  │  │     └─ 404 if id never seen; 410 if stream destroyed (TTL)    │  │  │
+│  │  │                                                               │  │  │
+│  │  │  POST /invocations/{id}/cancel                                │  │  │
+│  │  │     └─ run.cancel()                                           │  │  │
+│  │  │                                                               │  │  │
+│  │  │  GET  /readiness  (called by platform health probe at startup)│  │  │
+│  │  └───────────────────────────────────────────────────────────────┘  │  │
+│  │                                                                     │  │
+│  │  At module import: streams.use_file_backed_replay(                  │  │
+│  │     storage_dir=~/.durable-tasks/_streams,                          │  │
+│  │     cursor_fn=lambda ev: ev["sequence_number"],                     │  │
+│  │     ttl_seconds=600)                                                │  │
+│  │                                                                     │  │
+│  │  deep_research  (agent.py)                                          │  │
+│  │     @multi_turn_task(steerable=True)   ← no streaming kwarg         │  │
+│  │     stream = await streams.get_or_create(ctx.input["invocation_id"])│  │
+│  │     seq    = await stream.last_cursor() or 0   ← resume after crash │  │
+│  │     loop 1..NUM_PHASES:                                             │  │
+│  │        emit phase_start with server_time_utc + server_uptime_sec    │  │
+│  │        run CALLS_PER_PHASE LLM sub-calls (research → critique → …)  │  │
+│  │        ctx.metadata["completed_phases"] = i+1                       │  │
+│  │        await ctx.metadata.flush()       ← crash-recovery boundary   │  │
+│  │        emit phase_end                                               │  │
+│  │        if ctx.cancel.is_set():                                      │  │
+│  │           emit winding_down → stream.close() → return None          │  │
+│  │           (bare return X is the implicit-suspend signal for chains) │  │
+│  └─────────────────────────────────────────────────────────────────────┘  │
+└───────────────────────────────────────────────────────────────────────────┘
+                              ▲          │
+                              │          │  PATCH /api/projects/.../tasks/{id}
+                              │          │  (framework lease renewal + checkpoint flush)
+                              │          ▼
+                ┌─────────────────────────────────────────┐
+                │  Foundry control plane                  │
+                │  ─ Task-storage API (lease, payload,    │
+                │    metadata, checkpoint persistence)    │
+                │  ─ Endpoint proxy: routes /invocations* │
+                │    to the sandbox; brings the container │
+                │    back up via nanny worker after a     │
+                │    crash (no client ingress needed)     │
+                │  ─ Idle-reclaim timer: kept fresh by    │
+                │    framework lease-renewal traffic so   │
+                │    long-running tasks survive past 15min│
+                └─────────────────────────────────────────┘
+```
+
+Notable points:
+
+- The container runs `python app.py` directly. There is **no
+  application-level supervisor or auto-restart wrapper** — the platform's
+  nanny worker handles container restoration on crash.
+- `task_id == session_id`: one durable chain (`@multi_turn_task`) per
+  session. This is what routes a steering POST to the active chain
+  instead of starting a new one.
+- The framework's lease-renewal loop talks to the **task-storage API**
+  every ~30s (half of the 60s lease). This traffic both (a) refreshes
+  the lease so a successor instance won't reclaim the task, and (b)
+  signals activity to the platform's routing layer so the sandbox's
+  idle-reclaim timer stays fresh — letting the run outlive the 15-min
+  eviction window without any client ingress. The `/readiness`
+  endpoint is hit only by the platform's startup health probe;
+  `/liveness` is hit continuously (~every 2s) by the platform.
+- When the platform's nanny restores the container after a crash, the
+  framework's recovery scan finds the stranded task and re-enters the
+  handler with `ctx.entry_mode == "recovered"` and `ctx.metadata`
+  populated from the last checkpoint. A `recovered` SSE event is
+  emitted to any (re)connecting clients.
+
+## Streaming
+
+The agent emits to the SDK's `streams` registry
+(`azure.ai.agentserver.core.streaming`); the HTTP layer subscribes by
+the same id. There is no streaming kwarg on `@multi_turn_task` —
+streaming is explicitly initiated by the handler.
+
+**Public surface used here (5 exports):** `streams`, `EventStream`,
+`EventStreamError`, `EventStreamClosedError`, `EventStreamNotFoundError`.
+The SDK ships three backings (live, in-memory replay, file-backed
+replay) which you pick via the registry's configurators; concrete
+backing classes are not in the public API.
+
+**Backing.** `app.py` calls `streams.use_file_backed_replay(...)`
+once at module import. This persists every event to
+`~/.durable-tasks/_streams/<invocation_id>.jsonl` so the stream
+survives a container crash + restart and a late `GET` can replay the
+full transcript.
+
+**Stream id = per-turn `invocation_id`** (per the streaming guide).
+The HTTP layer reads `request.state.invocation_id` and propagates it
+to the handler via `task.start(input={"invocation_id": inv_id, ...})`.
+The handler reads it from `ctx.input["invocation_id"]`. **Not**
+`ctx.task_id` — `task_id` is the per-session durable-task identity
+that spans multiple turns (steering, recovery), and conflating
+logically separate per-turn streams under one id would break
+`emit`-after-close on the second turn. Each turn — including a steered
+re-entry — gets its own fresh `invocation_id` and its own stream.
+
+**Cursor field.** `cursor_fn=lambda ev: ev["sequence_number"]`.
+The handler maintains an in-memory `seq` counter and tags every emit
+with the next value. On crash recovery the handler calls
+`stream.last_cursor()` first to learn the highest sequence number
+that made it to disk, then resumes numbering from there. The HTTP
+layer surfaces `sequence_number` as the SSE `id:` field so a client
+reconnect with `?last_event_id=N` maps cleanly to
+`stream.subscribe(after=N)` — events the client already saw are
+skipped without duplicates.
+
+**Retention.** `ttl_seconds=600`. Per-event TTL bounds disk usage:
+once a stream is closed and all its events have aged out, the
+registry destroys the stream and removes the file. The 410 Gone
+wire mapping in the GET handler covers the "client tried to reconnect
+to an expired stream" case.
+
+**Close-before-suspend / close-before-return.** Every exit path in
+the handler (`run_complete`, `winding_down → suspend`,
+`finally` safety net) explicitly closes the stream before the
+framework reports the turn as terminal. This guarantees SSE
+subscribers see a clean stream terminator before any next-turn
+plumbing kicks in.
+
+## Environment variables
+
+These are set in `agent.yaml` (`environment_variables`) and travel with
+the deploy. Override by editing `agent.yaml` and re-deploying.
+
+| Variable | Default (hosted) | Default (`agent.py`) | Description |
+|---|---|---|---|
+| `FOUNDRY_PROJECT_ENDPOINT` | (required, set by platform) | — | Foundry project endpoint. |
+| `AZURE_AI_MODEL_DEPLOYMENT_NAME` | `gpt-4o` | `gpt-4o` | Responses-API model deployment name. |
+| `DEMO_MODE` | `1` (in the demo image) | unset | Enables the `{"message": "crash"}` sentinel on `POST /invocations`. A production image would leave this off. |
+| `NUM_PHASES` | `15` | `15` | Number of research phases. |
+| `CALLS_PER_PHASE` | `4` | `4` | Sub-calls per phase (research, critique, refine, synthesize). |
+| `TARGET_OUTPUT_TOKENS` | `1500` | `1500` | Max tokens per LLM sub-call. |
+| `INTRA_PHASE_COOLDOWN_SEC` | `30` | `10` | Seconds between sub-calls within a phase. Hosted default is bumped to push total wall-time past 30 min. |
+| `INTER_PHASE_COOLDOWN_SEC` | `30` | `20` | Seconds between phases. Hosted default is bumped to push total wall-time past 30 min. |
+
+Note: `azure-ai-agentserver-core` automatically uses `HostedTaskProvider`
+in hosted environments (i.e. when the platform sets
+`FOUNDRY_HOSTING_ENVIRONMENT`) and `LocalFileTaskProvider` otherwise —
+no opt-in env var required.
+
+For a **fast** development loop (~2 min total instead of ~33 min), edit
+`agent.yaml`'s `environment_variables` block:
+
+```yaml
+- name: NUM_PHASES
+  value: "3"
+- name: CALLS_PER_PHASE
+  value: "1"
+- name: INTRA_PHASE_COOLDOWN_SEC
+  value: "2"
+- name: INTER_PHASE_COOLDOWN_SEC
+  value: "2"
+- name: TARGET_OUTPUT_TOKENS
+  value: "200"
+```
+
+## File structure
+
+```
+durable-agent-demo/
+├── demo-client.sh          # bash CLI: start, stream, steer, crash, cancel, logs, status, reset
+├── azure.yaml              # azd service config
+├── build.sh                # copies sdk/agentserver/wheels/*.whl into src/.../wheels/ for docker
+├── infra/                  # Bicep templates
+├── src/durable-research-agent/
+│   ├── agent.py            # @multi_turn_task deep_research — durability + steering logic
+│   ├── app.py              # InvocationAgentServerHost — minimal HTTP plumbing
+│   ├── agent.yaml          # Foundry agent definition
+│   ├── Dockerfile          # python:3.12-slim → python app.py
+│   ├── requirements.txt
+│   └── wheels/             # GITIGNORED — docker-build staging dir populated by build.sh
+└── README.md
+```
+
+The durable-task primitive private-preview wheels are checked in at
+[`sdk/agentserver/wheels/`](../../../../wheels) — `./build.sh` just
+copies them into this sample's `wheels/` so the Dockerfile can `COPY`
+them at image-build time. See
+[`sdk/agentserver/wheels/README.md`](../../../../wheels/README.md)
+for the consumer workflow.
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/azure.yaml b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/azure.yaml
new file mode 100644
index 000000000000..2e5ca76667d5
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/azure.yaml
@@ -0,0 +1,31 @@
+# yaml-language-server: $schema=https://raw.githubusercontent.com/Azure/azure-dev/main/schemas/v1.0/azure.yaml.json
+
+requiredVersions:
+    extensions:
+        azure.ai.agents: '>=0.1.0-preview'
+name: durable-research-agent-demo
+services:
+    durable-research-agent:
+        project: src/durable-research-agent
+        host: azure.ai.agent
+        language: docker
+        docker:
+            remoteBuild: true
+        config:
+            container:
+                resources:
+                    cpu: "1"
+                    memory: 2Gi
+            deployments:
+                - model:
+                    format: OpenAI
+                    name: gpt-4o
+                    version: "2024-11-20"
+                  name: gpt-4o
+                  sku:
+                    capacity: 50
+                    name: GlobalStandard
+            startupCommand: python app.py
+infra:
+    provider: bicep
+    path: ./infra
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/build.sh b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/build.sh
new file mode 100755
index 000000000000..813ad2896cb6
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/build.sh
@@ -0,0 +1,37 @@
+#!/usr/bin/env bash
+# Stage agentserver @task preview wheels into the docker build context.
+# Run this BEFORE 'azd up' or 'docker build'.
+#
+# Wheels are checked into the repo at sdk/agentserver/wheels/ — this
+# script just copies them into a per-sample docker-build staging dir
+# (src/durable-research-agent/wheels/, gitignored) so the Dockerfile's
+# `COPY wheels/ /tmp/wheels/` finds them at build time.
+#
+# To refresh the source wheels (maintainer-only — devs shouldn't need
+# to do this), see ../../../../wheels/README.md.
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../../../../.." && pwd)"
+CENTRAL_WHEELS="$REPO_ROOT/sdk/agentserver/wheels"
+STAGING_DIR="$SCRIPT_DIR/src/durable-research-agent/wheels"
+
+if [[ ! -d "$CENTRAL_WHEELS" ]] || ! ls "$CENTRAL_WHEELS"/*.whl >/dev/null 2>&1; then
+    echo "ERROR: no checked-in wheels found at $CENTRAL_WHEELS" >&2
+    echo "       Did you pull the latest from feature/agentserver-durable-agent-demo?" >&2
+    exit 1
+fi
+
+echo "==> Staging checked-in @task preview wheels into docker build context"
+echo "    src:  $CENTRAL_WHEELS"
+echo "    dst:  $STAGING_DIR"
+rm -rf "$STAGING_DIR"
+mkdir -p "$STAGING_DIR"
+cp "$CENTRAL_WHEELS"/*.whl "$STAGING_DIR"/
+ls -la "$STAGING_DIR"/*.whl
+
+echo ""
+echo "Done. Now run: azd up   (or docker build)"
+
+
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/demo-client.sh b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/demo-client.sh
new file mode 100755
index 000000000000..5ce5f88fa8b1
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/demo-client.sh
@@ -0,0 +1,607 @@
+#!/usr/bin/env bash
+# ─────────────────────────────────────────────────────────────────────────────
+# Durable Research Agent — Demo Client
+#
+# Showcases three platform capabilities of the durable-task primitive
+# (all empirically validated against a hosted Foundry deployment):
+#   1. LONG-RUNNING TASKS — the framework's PATCH .../tasks/<id> lease
+#      renewals (every ~30s) keep the platform's sandbox idle-reclaim
+#      timer fresh, so a single run stays warm well past the 15-min
+#      eviction window without any client-side keepalive ingress.
+#   2. CRASH RECOVERY — when the container dies, the platform's nanny
+#      worker restarts it within ~1 min on its own (no new ingress
+#      needed); the durable task auto-resumes from its last checkpoint.
+#   3. STEERING — sending a new turn while a turn is still running
+#      causes the agent to wind down at the next checkpoint and start
+#      fresh on the new topic.
+#
+# Commands:
+#   ./demo-client.sh start "<topic>"   Dispatch and stream a fresh research run
+#   ./demo-client.sh stream            Reconnect to the active run (no fresh POST)
+#   ./demo-client.sh steer "<topic>"   Queue a steering input — agent winds down
+#                                      current turn at next checkpoint and switches
+#   ./demo-client.sh crash             Kill the process (DEMO_MODE=1 on server)
+#   ./demo-client.sh cancel            Operator cancel of the active run
+#   ./demo-client.sh status            Show local session info
+#   ./demo-client.sh logs              Stream container stdout/stderr via azd
+#   ./demo-client.sh reset             Clear local session state
+# ─────────────────────────────────────────────────────────────────────────────
+
+set -uo pipefail
+
+# ── Config ────────────────────────────────────────────────────────────────────
+
+# Point at your own hosted deployment. After `azd deploy`, this script
+# AUTO-RESOLVES the endpoint from your azd env
+# (AGENT_DURABLE_RESEARCH_AGENT_INVOCATIONS_ENDPOINT) when run from the demo
+# directory — no manual setup needed. To override (e.g. a different project or
+# a local server), export the ENDPOINT env var instead of editing this default.
+ENDPOINT="${ENDPOINT:-https://<account>.services.ai.azure.com/api/projects/<project>/agents/durable-research-agent/endpoint/protocols}"
+API_VERSION="v1"
+SESSION_FILE=".demo-session"
+
+# ── Colors ────────────────────────────────────────────────────────────────────
+
+BOLD='\033[1m'
+DIM='\033[2m'
+GREEN='\033[32m'
+YELLOW='\033[33m'
+RED='\033[31m'
+CYAN='\033[36m'
+MAGENTA='\033[35m'
+BLUE='\033[34m'
+RESET='\033[0m'
+
+# ── Session state ─────────────────────────────────────────────────────────────
+
+load_session() {
+    if [[ -f "$SESSION_FILE" ]]; then
+        # shellcheck disable=SC1090
+        source "$SESSION_FILE"
+    fi
+}
+
+save_session() {
+    {
+        echo "SESSION_ID=\"${SESSION_ID:-}\""
+        echo "INV_ID=\"${INV_ID:-}\""
+        echo "LAST_EVENT_ID=\"${LAST_EVENT_ID:-0}\""
+    } > "$SESSION_FILE"
+}
+
+ensure_token() {
+    ensure_endpoint
+    if [[ -z "${TOKEN:-}" ]]; then
+        TOKEN=$(az account get-access-token --resource https://ai.azure.com --query accessToken -o tsv 2>/dev/null)
+        if [[ -z "$TOKEN" ]]; then
+            echo -e "${RED}Failed to get Azure token. Run 'az login' first.${RESET}" >&2
+            exit 1
+        fi
+    fi
+}
+
+# Resolve ENDPOINT. If the caller did not override it via the ENDPOINT env
+# var (so it is still the <account>/<project> placeholder), auto-resolve it
+# from the azd environment that `azd deploy` populates. The azd value
+# AGENT_DURABLE_RESEARCH_AGENT_INVOCATIONS_ENDPOINT looks like
+#   .../agents/<name>/endpoint/protocols/invocations?api-version=...
+# and this script appends `/invocations?api-version=$API_VERSION` itself, so
+# we strip the `/invocations?...` tail to recover the protocols base.
+ensure_endpoint() {
+    if [[ "$ENDPOINT" == *"<account>"* || "$ENDPOINT" == *"<project>"* ]]; then
+        local azd_inv
+        azd_inv="$(azd env get-value AGENT_DURABLE_RESEARCH_AGENT_INVOCATIONS_ENDPOINT 2>/dev/null || true)"
+        if [[ "$azd_inv" == http* ]]; then
+            ENDPOINT="${azd_inv%%/invocations*}"
+        fi
+    fi
+    if [[ "$ENDPOINT" == *"<account>"* || "$ENDPOINT" == *"<project>"* ]]; then
+        echo -e "${RED}ENDPOINT is not configured.${RESET}" >&2
+        echo -e "${DIM}Run this from the demo dir after 'azd deploy' (auto-resolves from the azd env)," >&2
+        echo -e "or set it explicitly, e.g.:${RESET}" >&2
+        echo -e "  export ENDPOINT=\"https://<account>.services.ai.azure.com/api/projects/<project>/agents/durable-research-agent/endpoint/protocols\"" >&2
+        exit 1
+    fi
+}
+
+# Read a top-level JSON field. Returns empty string on missing/null. Used
+# only by the one-shot POST helpers below (start / steer) to extract
+# invocation_id / session_id from the dispatch response. The SSE stream
+# path does its own parsing in the python renderer.
+_jq() {
+    local json="$1"
+    local key="$2"
+    echo "$json" | python3 -c "
+import sys, json
+try:
+    d = json.loads(sys.stdin.read())
+    v = d.get('$key')
+    print('' if v is None else v)
+except Exception:
+    print('')
+" 2>/dev/null
+}
+
+# ── SSE stream renderer (Python — see comment) ───────────────────────────────
+
+# Why a python renderer instead of bash:
+#  - At LLM emit rate (50-100 tok/s) the original bash 'while read |
+#    printf' loop made the real interactive terminal the bottleneck:
+#    one printf-per-token caused syscall thrash and built up a backlog
+#    that hid the EOF (real crash signal) behind minutes of TTY draining.
+#  - python with select() + a small in-memory token buffer (flushed
+#    every FLUSH_MS) writes the terminal in batches — ~20x fewer
+#    syscalls in steady state, no backlog, EOF is observed promptly.
+#  - The renderer trusts EOF on stdin as the authoritative crash signal.
+#    No time-based "is the stream stale?" heuristic — those mis-fire
+#    during the demo's legitimate 30s cooldowns between subcalls/phases.
+#    When curl closes (server crash, network drop, ctrl-c) the renderer
+#    sees EOF and exits. When the server emits 'done' or 'run_complete'
+#    the renderer exits cleanly. There is no third path.
+#  - Renderer formatting and color codes match the previous bash version
+#    exactly so prior demo expectations still hold.
+#
+# Contract with bash:
+#   stdin   = raw SSE frames from curl (id: N / data: ...)
+#   env     = $INITIAL_EVENT_ID (resume cursor), $STATE_FILE (path to write
+#             back LAST_EVENT_ID + STREAM_RESULT on exit), $FLUSH_MS
+#   stdout  = rendered output
+#   exit    = 0 normally; non-zero only on hard errors
+
+_PY_RENDERER='
+import json, os, sys, select, time
+from datetime import datetime, timezone
+
+# Bring the env-provided knobs in once.
+INITIAL_EVENT_ID = int(os.environ.get("INITIAL_EVENT_ID", "0") or "0")
+STATE_FILE       = os.environ.get("STATE_FILE", "")
+FLUSH_MS         = float(os.environ.get("FLUSH_MS", "50"))
+
+# CRITICAL: This entire block lives inside a bash heredoc delimited by
+# the apostrophe character (the bash assignment `_PY_RENDERER=` then an
+# opening apostrophe, opaque content, closing apostrophe at column 1
+# of an otherwise empty line). Any literal apostrophe in Python code
+# below will silently end the heredoc and truncate the script — debug
+# symptom is a NameError several lines later. Use double quotes for
+# every Python string literal. Keys we pull from event dicts are
+# aliased to module-level CONSTANTS up here so the per-event code
+# stays readable without inline string literals becoming a foot-gun.
+_DSEC = "duration_sec"
+
+# ANSI palette — mirrors demo-client.sh.
+BOLD, DIM = "\033[1m", "\033[2m"
+GREEN, YELLOW, RED = "\033[32m", "\033[33m", "\033[31m"
+CYAN, MAGENTA, BLUE = "\033[36m", "\033[35m", "\033[34m"
+RESET = "\033[0m"
+
+out = sys.stdout
+def write(s): out.write(s)
+def flush(): out.flush()
+
+def now_utc():
+    return datetime.now(timezone.utc).strftime("%H:%M:%SZ")
+
+last_event_id = INITIAL_EVENT_ID
+result        = "disconnected"
+token_buf     = []                  # collected token content
+last_flush    = time.monotonic()
+
+def flush_tokens():
+    global token_buf, last_flush
+    if token_buf:
+        write("".join(token_buf))
+        flush()
+        token_buf = []
+    last_flush = time.monotonic()
+
+def render_block(evt):
+    """Render any non-token event with the same shape as the old bash render."""
+    t = evt.get("type", "")
+    n = now_utc()
+    if t == "run_start":
+        topic   = evt.get("topic", "")
+        em      = evt.get("entry_mode", "")
+        total   = evt.get("total_phases", "")
+        uptime  = evt.get("server_uptime_sec", "")
+        srv     = evt.get("server_time_utc", "")
+        prior   = evt.get("prior_topic")
+        write("\n")
+        write(f"{BOLD}{CYAN}{chr(0x2550)*62}{RESET}\n")
+        write(f"{DIM}[{n}]{RESET} {BOLD}{CYAN}\u25b6 Run start{RESET}    topic={BOLD}{topic}{RESET}  ({total} phases)\n")
+        if prior:
+            write(f"  {YELLOW}(steered from prior topic: {prior}){RESET}\n")
+        write(f"  entry_mode={em}   server_time={srv}   uptime={uptime}s\n")
+        write(f"{BOLD}{CYAN}{chr(0x2550)*62}{RESET}\n")
+    elif t == "recovered":
+        c, total = evt.get("completed_phases", ""), evt.get("total_phases", "")
+        srv, uptime = evt.get("server_time_utc", ""), evt.get("server_uptime_sec", "")
+        write("\n")
+        write(f"{DIM}[{n}]{RESET} {BOLD}{GREEN}\U0001f501 Recovered from crash{RESET}   resuming from phase {c}/{total}\n")
+        write(f"  server_time={srv}   uptime={uptime}s  {DIM}(uptime ~0s = fresh container){RESET}\n")
+    elif t == "phase_start":
+        ph, total = evt.get("phase", ""), evt.get("total", "")
+        title = evt.get("title", "")
+        srv, uptime = evt.get("server_time_utc", ""), evt.get("server_uptime_sec", "")
+        write("\n")
+        write(f"{BOLD}{BLUE}{chr(0x2500)*62}{RESET}\n")
+        write(f"{DIM}[{n}]{RESET} {BOLD}{BLUE}\u25b6 Phase {ph}/{total}{RESET} \u2014 {title}\n")
+        write(f"  \u23f0 server_time={srv}   uptime={uptime}s\n")
+        write(f"{BOLD}{BLUE}{chr(0x2500)*62}{RESET}\n")
+    elif t == "subcall_start":
+        role = evt.get("role", "")
+        idx, of = evt.get("index", ""), evt.get("of", "")
+        write(f"\n{DIM}  [{n}]  [{role} {idx}/{of}] \u2500\u2500\u2500{RESET}\n")
+    elif t == "subcall_end":
+        write("\n")
+    elif t == "phase_end":
+        ph, total = evt.get("phase", ""), evt.get("total", "")
+        title = evt.get("title", "")
+        srv, uptime, dur = evt.get("server_time_utc", ""), evt.get("server_uptime_sec", ""), evt.get("duration_sec", "")
+        write(f"\n{DIM}[{n}]{RESET} {GREEN}\u2705 Phase {ph}/{total} done{RESET} \u2014 {title}\n")
+        write(f"  \u23f0 server_time={srv}   uptime={uptime}s   \u23f1  duration={dur}s\n")
+    elif t == "winding_down":
+        cause = evt.get("cause", ""); c = evt.get("completed_phases", "")
+        total = evt.get("total_phases", ""); pend = evt.get("pending_steering_inputs", "")
+        srv, uptime = evt.get("server_time_utc", ""), evt.get("server_uptime_sec", "")
+        write(f"\n{DIM}[{n}]{RESET} {BOLD}{MAGENTA}\u2193 Winding down{RESET}   cause={cause}   completed={c}/{total}   pending_steers={pend}\n")
+        write(f"  \u23f0 server_time={srv}   uptime={uptime}s\n")
+    elif t == "cooldown":
+        # Server is intentionally sleeping (between subcalls or phases).
+        # Render a single low-key line so the terminal is not silent.
+        # NOTE: keep Python string literals in this heredoc strictly
+        # double-quoted. A literal apostrophe ends the surrounding
+        # bash heredoc and causes a confusing NameError several lines
+        # later when the truncated script is parsed.
+        try:
+            dur_str = f"{float(evt.get(_DSEC, 0)):.0f}"
+        except (TypeError, ValueError):
+            dur_str = str(evt.get(_DSEC, "?"))
+        stage = evt.get("stage", "")
+        ph    = evt.get("phase", "")
+        total = evt.get("total", "")
+        sub   = evt.get("subcall")
+        of    = evt.get("of")
+        label = "between phases" if stage == "inter_phase" else "between subcalls"
+        if stage == "inter_phase":
+            detail = f"next: phase {ph}/{total}"
+        elif sub is not None and of is not None:
+            detail = f"next: subcall {sub}/{of} in phase {ph}/{total}"
+        else:
+            detail = f"phase {ph}/{total}"
+        write(f"{DIM}[{n}]   ...cooling down {dur_str}s ({label}) \u2014 {detail}{RESET}\n")
+    elif t == "run_complete":
+        total = evt.get("phases_completed", "")
+        srv, uptime = evt.get("server_time_utc", ""), evt.get("server_uptime_sec", "")
+        write(f"\n{BOLD}{GREEN}{chr(0x2550)*62}{RESET}\n")
+        write(f"{DIM}[{n}]{RESET} {BOLD}{GREEN}\u2705 Run complete{RESET}   {total} phases   \u23f0 {srv}   uptime={uptime}s\n")
+        write(f"{BOLD}{GREEN}{chr(0x2550)*62}{RESET}\n")
+    elif t == "done":
+        reason = evt.get("reason")
+        msg = f" ({reason})" if reason else ""
+        col = YELLOW if reason else GREEN
+        write(f"\n{DIM}[{n}]{RESET} {col}\u2550\u2550 Stream done{msg} \u2550\u2550{RESET}\n")
+    else:
+        write(f"{DIM}[{n}] [unknown event] {json.dumps(evt)}{RESET}\n")
+    flush()
+
+stdin_fd = sys.stdin.fileno()
+
+try:
+    pending = b""
+    while True:
+        deadline = last_flush + FLUSH_MS / 1000.0
+        timeout = max(0.0, deadline - time.monotonic())
+        r, _, _ = select.select([stdin_fd], [], [], timeout)
+        if r:
+            try:
+                chunk = os.read(stdin_fd, 65536)
+            except OSError:
+                chunk = b""
+            if not chunk:
+                # EOF — server (or proxy) closed the SSE stream. This is
+                # the authoritative crash/disconnect signal.
+                flush_tokens()
+                break
+            pending += chunk
+            # Process complete lines only.
+            while b"\n" in pending:
+                line_b, pending = pending.split(b"\n", 1)
+                line = line_b.decode("utf-8", errors="replace").rstrip("\r")
+                if not line or line.startswith(":"):
+                    continue
+                if line.startswith("id:"):
+                    try:
+                        last_event_id = int(line[3:].strip())
+                    except ValueError:
+                        pass
+                    continue
+                if not line.startswith("data:"):
+                    continue
+                payload = line[5:].lstrip()
+                try:
+                    evt = json.loads(payload)
+                except json.JSONDecodeError:
+                    continue
+                t = evt.get("type", "")
+                if t == "token":
+                    # Hot path: buffer content. Periodic flush + flush on
+                    # non-token gives smooth visual output without
+                    # per-token TTY syscall thrash.
+                    c = evt.get("content")
+                    if isinstance(c, str):
+                        token_buf.append(c)
+                else:
+                    # Flush any pending tokens BEFORE emitting block event
+                    # so they appear in the right place visually.
+                    flush_tokens()
+                    render_block(evt)
+                    if t in ("done", "run_complete"):
+                        result = "complete"
+                        # Drain any remaining buffered tokens (none if we
+                        # just flushed) and exit.
+                        flush_tokens()
+                        raise StopIteration
+        else:
+            # Periodic flush deadline reached with no data.
+            flush_tokens()
+            # No watchdog: EOF on stdin (above) is the authoritative
+            # crash/disconnect signal. The select() timeout just drives
+            # the periodic token-buffer flush.
+except StopIteration:
+    pass
+except KeyboardInterrupt:
+    flush_tokens()
+finally:
+    if STATE_FILE:
+        try:
+            with open(STATE_FILE, "w") as fh:
+                fh.write(f"LAST_EVENT_ID={last_event_id}\n")
+                fh.write(f"STREAM_RESULT={result}\n")
+        except OSError:
+            pass
+'
+
+# ── SSE reader ───────────────────────────────────────────────────────────────
+
+STREAM_RESULT=""  # "complete" | "disconnected" | "error"
+
+stream_sse() {
+    local url="$1"
+    STREAM_RESULT="disconnected"
+
+    local state_file
+    state_file=$(mktemp)
+
+    # Pipe curl directly into the python renderer. EOF on the pipe is
+    # the authoritative disconnect signal — when curl sees the server
+    # close the TCP socket it closes its stdout, the renderer sees EOF
+    # on stdin, and we exit cleanly. No watchdog, no PID juggling.
+    INITIAL_EVENT_ID="${LAST_EVENT_ID:-0}" \
+    STATE_FILE="$state_file" \
+    FLUSH_MS="${FLUSH_MS:-50}" \
+        bash -c 'curl -sN -X GET \
+            -H "Authorization: Bearer '"$TOKEN"'" \
+            -H "Accept: text/event-stream" \
+            -H "Foundry-Features: HostedAgents=V1Preview" \
+            "'"$url"'" | python3 -u -c "$1"' _ "$_PY_RENDERER"
+
+    if [[ -f "$state_file" ]]; then
+        # shellcheck disable=SC1090
+        source "$state_file"
+        rm -f "$state_file"
+    fi
+    save_session
+}
+
+# ── Commands ──────────────────────────────────────────────────────────────────
+
+cmd_start() {
+    local topic="${1:-Research the future of quantum computing}"
+    SESSION_ID="demo-$(uuidgen | tr '[:upper:]' '[:lower:]')"
+    INV_ID=""
+    LAST_EVENT_ID="0"
+    save_session
+    ensure_token
+
+    echo -e "${GREEN}New session: ${SESSION_ID}${RESET}"
+    echo -e "${DIM}Topic: ${topic}${RESET}"
+
+    local response
+    response=$(curl -s -X POST \
+        -H "Authorization: Bearer $TOKEN" \
+        -H "Content-Type: application/json" \
+        -H "Foundry-Features: HostedAgents=V1Preview" \
+        -d "{\"message\": \"${topic}\"}" \
+        "${ENDPOINT}/invocations?api-version=${API_VERSION}&agent_session_id=${SESSION_ID}")
+    INV_ID=$(_jq "$response" invocation_id)
+    SESSION_ID=$(_jq "$response" session_id)
+    save_session
+    echo -e "${DIM}Dispatched: invocation_id=${INV_ID}${RESET}"
+
+    echo ""
+    echo -e "${BOLD}Streaming. ${DIM}Use Ctrl-C to detach; reconnect later with './demo-client.sh stream'.${RESET}"
+    stream_sse "${ENDPOINT}/invocations/${INV_ID}?api-version=${API_VERSION}"
+    _report_stream_result
+}
+
+cmd_stream() {
+    load_session
+    if [[ -z "${INV_ID:-}" ]]; then
+        echo -e "${RED}No active session. Run './demo-client.sh start \"<topic>\"' first.${RESET}" >&2
+        exit 1
+    fi
+    ensure_token
+
+    echo -e "${DIM}Reconnecting to invocation ${INV_ID}${RESET}"
+    local url="${ENDPOINT}/invocations/${INV_ID}?api-version=${API_VERSION}"
+    if [[ "${LAST_EVENT_ID:-0}" != "0" ]]; then
+        url="${url}&last_event_id=${LAST_EVENT_ID}"
+        echo -e "${DIM}Resuming from event ${LAST_EVENT_ID}${RESET}"
+    fi
+    stream_sse "$url"
+    _report_stream_result
+}
+
+cmd_steer() {
+    local topic="${1:-}"
+    if [[ -z "$topic" ]]; then
+        echo -e "${RED}Usage: ./demo-client.sh steer \"<new topic>\"${RESET}" >&2
+        exit 1
+    fi
+    load_session
+    if [[ -z "${SESSION_ID:-}" ]]; then
+        echo -e "${RED}No active session. Run './demo-client.sh start \"<topic>\"' first.${RESET}" >&2
+        exit 1
+    fi
+    ensure_token
+
+    echo -e "${BOLD}${MAGENTA}Steering session ${SESSION_ID} to: ${topic}${RESET}"
+
+    # Send a fresh POST. Because the task is steerable and an in-progress
+    # run exists, the framework queues this as a steering input.
+    local response
+    response=$(curl -s -X POST \
+        -H "Authorization: Bearer $TOKEN" \
+        -H "Content-Type: application/json" \
+        -H "Foundry-Features: HostedAgents=V1Preview" \
+        -d "{\"message\": \"${topic}\"}" \
+        "${ENDPOINT}/invocations?api-version=${API_VERSION}&agent_session_id=${SESSION_ID}")
+    echo -e "${DIM}Response: ${response}${RESET}"
+    local new_inv
+    new_inv=$(_jq "$response" invocation_id)
+    if [[ -n "$new_inv" ]]; then
+        INV_ID="$new_inv"
+        LAST_EVENT_ID="0"
+        save_session
+        echo -e "${DIM}New invocation: ${INV_ID}. Use './demo-client.sh stream' to attach.${RESET}"
+    fi
+}
+
+cmd_crash() {
+    load_session
+    if [[ -z "${SESSION_ID:-}" ]]; then
+        echo -e "${RED}No active session. Run './demo-client.sh start \"<topic>\"' first.${RESET}" >&2
+        exit 1
+    fi
+    ensure_token
+
+    echo -e "${RED}${BOLD}💥 Crashing the agent container...${RESET}"
+    echo -e "${DIM}Session: ${SESSION_ID}${RESET}"
+
+    # The platform only proxies /invocations* — we use the special
+    # "crash" sentinel message, which the agent (when DEMO_MODE=1)
+    # interprets as "exit the process".
+    local response
+    response=$(curl -s -X POST \
+        -H "Authorization: Bearer $TOKEN" \
+        -H "Content-Type: application/json" \
+        -H "Foundry-Features: HostedAgents=V1Preview" \
+        -d '{"message": "crash"}' \
+        "${ENDPOINT}/invocations?api-version=${API_VERSION}&agent_session_id=${SESSION_ID}")
+    echo -e "${DIM}Response: ${response}${RESET}"
+    echo ""
+    echo -e "${YELLOW}The container will exit. The platform's nanny worker brings it back${RESET}"
+    echo -e "${YELLOW}within ~1 min on its own (no client ingress needed) and the durable${RESET}"
+    echo -e "${YELLOW}task auto-recovers from its last checkpoint.${RESET}"
+    echo ""
+    echo -e "${DIM}Run './demo-client.sh stream' whenever you're ready to reconnect.${RESET}"
+    echo -e "${DIM}Look for a 'Recovered from crash' marker (uptime resets to ~0).${RESET}"
+}
+
+cmd_cancel() {
+    load_session
+    if [[ -z "${INV_ID:-}" ]]; then
+        echo -e "${RED}No active session. Run './demo-client.sh start \"<topic>\"' first.${RESET}" >&2
+        exit 1
+    fi
+    ensure_token
+
+    echo -e "${YELLOW}🛑 Cancelling invocation ${INV_ID}${RESET}"
+    local response
+    response=$(curl -s -X POST \
+        -H "Authorization: Bearer $TOKEN" \
+        -H "Content-Type: application/json" \
+        -H "Foundry-Features: HostedAgents=V1Preview" \
+        -d '{}' \
+        "${ENDPOINT}/invocations/${INV_ID}/cancel?api-version=${API_VERSION}")
+    echo -e "${GREEN}${response}${RESET}"
+}
+
+cmd_status() {
+    load_session
+    if [[ -f "$SESSION_FILE" ]]; then
+        echo -e "${CYAN}Session ID:${RESET}    ${SESSION_ID:-<none>}"
+        echo -e "${CYAN}Invocation ID:${RESET} ${INV_ID:-<none>}"
+        echo -e "${CYAN}Last event ID:${RESET} ${LAST_EVENT_ID:-0}"
+    else
+        echo -e "${DIM}No local session.${RESET}"
+    fi
+}
+
+cmd_logs() {
+    load_session
+    if [[ -z "${SESSION_ID:-}" ]]; then
+        echo -e "${RED}No active session. Run './demo-client.sh start \"<topic>\"' first.${RESET}" >&2
+        exit 1
+    fi
+    echo -e "${DIM}Streaming container stdout/stderr for session ${SESSION_ID}${RESET}"
+    azd ai agent monitor --session-id "${SESSION_ID}" --follow
+}
+
+cmd_reset() {
+    rm -f "$SESSION_FILE"
+    echo -e "${GREEN}Session cleared.${RESET}"
+}
+
+_report_stream_result() {
+    case "$STREAM_RESULT" in
+        complete)
+            ;;
+        disconnected)
+            echo ""
+            echo -e "${YELLOW}── Stream disconnected ──${RESET}"
+            echo -e "${DIM}The agent may still be running on the server.${RESET}"
+            echo -e "${DIM}Reconnect with: ./demo-client.sh stream${RESET}"
+            ;;
+        error)
+            echo -e "${RED}── Stream error ──${RESET}" ;;
+    esac
+}
+
+# ── Main ──────────────────────────────────────────────────────────────────────
+
+usage() {
+    cat <<EOF
+${BOLD}Durable Research Agent — Demo Client${RESET}
+
+Commands:
+  ${BOLD}start "<topic>"${RESET}    Dispatch a fresh research run and stream it
+  ${BOLD}stream${RESET}             Reconnect to the active run (resumes from last_event_id)
+  ${BOLD}steer "<topic>"${RESET}    Queue a steering input — agent winds down at next
+                     checkpoint and starts fresh on the new topic
+  ${BOLD}crash${RESET}              Kill the container (POST /invocations with message="crash";
+                     requires DEMO_MODE=1 on the server image)
+  ${BOLD}cancel${RESET}             Cooperative cancel of the active run
+  ${BOLD}status${RESET}             Show local session info
+  ${BOLD}logs${RESET}               Stream container stdout/stderr (azd ai agent monitor)
+  ${BOLD}reset${RESET}              Clear local session state
+
+Three-terminal workflow:
+  Terminal 1: ./demo-client.sh start "quantum computing"     # streams ~33 min of phases
+  Terminal 2: ./demo-client.sh logs                          # peek at server logs
+  Terminal 3: ./demo-client.sh crash                         # any time → nanny restores ~1 min later
+              ./demo-client.sh steer "fusion energy"         # mid-run pivot
+EOF
+}
+
+case "${1:-}" in
+    start)   shift; cmd_start "${1:-}" ;;
+    stream)  cmd_stream ;;
+    steer)   shift; cmd_steer "${1:-}" ;;
+    crash)   cmd_crash ;;
+    cancel)  cmd_cancel ;;
+    status)  cmd_status ;;
+    logs)    cmd_logs ;;
+    reset)   cmd_reset ;;
+    *)       usage ;;
+esac
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/abbreviations.json b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/abbreviations.json
new file mode 100644
index 000000000000..879b2a9507b1
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/abbreviations.json
@@ -0,0 +1,137 @@
+{
+    "aiFoundryAccounts": "aif",
+    "analysisServicesServers": "as",
+    "apiManagementService": "apim-",
+    "appConfigurationStores": "appcs-",
+    "appManagedEnvironments": "cae-",
+    "appContainerApps": "ca-",
+    "authorizationPolicyDefinitions": "policy-",
+    "automationAutomationAccounts": "aa-",
+    "blueprintBlueprints": "bp-",
+    "blueprintBlueprintsArtifacts": "bpa-",
+    "cacheRedis": "redis-",
+    "cdnProfiles": "cdnp-",
+    "cdnProfilesEndpoints": "cdne-",
+    "cognitiveServicesAccounts": "cog-",
+    "cognitiveServicesFormRecognizer": "cog-fr-",
+    "cognitiveServicesTextAnalytics": "cog-ta-",
+    "computeAvailabilitySets": "avail-",
+    "computeCloudServices": "cld-",
+    "computeDiskEncryptionSets": "des",
+    "computeDisks": "disk",
+    "computeDisksOs": "osdisk",
+    "computeGalleries": "gal",
+    "computeSnapshots": "snap-",
+    "computeVirtualMachines": "vm",
+    "computeVirtualMachineScaleSets": "vmss-",
+    "containerInstanceContainerGroups": "ci",
+    "containerRegistryRegistries": "cr",
+    "containerServiceManagedClusters": "aks-",
+    "databricksWorkspaces": "dbw-",
+    "dataFactoryFactories": "adf-",
+    "dataLakeAnalyticsAccounts": "dla",
+    "dataLakeStoreAccounts": "dls",
+    "dataMigrationServices": "dms-",
+    "dBforMySQLServers": "mysql-",
+    "dBforPostgreSQLServers": "psql-",
+    "devicesIotHubs": "iot-",
+    "devicesProvisioningServices": "provs-",
+    "devicesProvisioningServicesCertificates": "pcert-",
+    "documentDBDatabaseAccounts": "cosmos-",
+    "documentDBMongoDatabaseAccounts": "cosmon-",
+    "eventGridDomains": "evgd-",
+    "eventGridDomainsTopics": "evgt-",
+    "eventGridEventSubscriptions": "evgs-",
+    "eventHubNamespaces": "evhns-",
+    "eventHubNamespacesEventHubs": "evh-",
+    "hdInsightClustersHadoop": "hadoop-",
+    "hdInsightClustersHbase": "hbase-",
+    "hdInsightClustersKafka": "kafka-",
+    "hdInsightClustersMl": "mls-",
+    "hdInsightClustersSpark": "spark-",
+    "hdInsightClustersStorm": "storm-",
+    "hybridComputeMachines": "arcs-",
+    "insightsActionGroups": "ag-",
+    "insightsComponents": "appi-",
+    "keyVaultVaults": "kv-",
+    "kubernetesConnectedClusters": "arck",
+    "kustoClusters": "dec",
+    "kustoClustersDatabases": "dedb",
+    "logicIntegrationAccounts": "ia-",
+    "logicWorkflows": "logic-",
+    "machineLearningServicesWorkspaces": "mlw-",
+    "managedIdentityUserAssignedIdentities": "id-",
+    "managementManagementGroups": "mg-",
+    "migrateAssessmentProjects": "migr-",
+    "networkApplicationGateways": "agw-",
+    "networkApplicationSecurityGroups": "asg-",
+    "networkAzureFirewalls": "afw-",
+    "networkBastionHosts": "bas-",
+    "networkConnections": "con-",
+    "networkDnsZones": "dnsz-",
+    "networkExpressRouteCircuits": "erc-",
+    "networkFirewallPolicies": "afwp-",
+    "networkFirewallPoliciesWebApplication": "waf",
+    "networkFirewallPoliciesRuleGroups": "wafrg",
+    "networkFrontDoors": "fd-",
+    "networkFrontdoorWebApplicationFirewallPolicies": "fdfp-",
+    "networkLoadBalancersExternal": "lbe-",
+    "networkLoadBalancersInternal": "lbi-",
+    "networkLoadBalancersInboundNatRules": "rule-",
+    "networkLocalNetworkGateways": "lgw-",
+    "networkNatGateways": "ng-",
+    "networkNetworkInterfaces": "nic-",
+    "networkNetworkSecurityGroups": "nsg-",
+    "networkNetworkSecurityGroupsSecurityRules": "nsgsr-",
+    "networkNetworkWatchers": "nw-",
+    "networkPrivateDnsZones": "pdnsz-",
+    "networkPrivateLinkServices": "pl-",
+    "networkPublicIPAddresses": "pip-",
+    "networkPublicIPPrefixes": "ippre-",
+    "networkRouteFilters": "rf-",
+    "networkRouteTables": "rt-",
+    "networkRouteTablesRoutes": "udr-",
+    "networkTrafficManagerProfiles": "traf-",
+    "networkVirtualNetworkGateways": "vgw-",
+    "networkVirtualNetworks": "vnet-",
+    "networkVirtualNetworksSubnets": "snet-",
+    "networkVirtualNetworksVirtualNetworkPeerings": "peer-",
+    "networkVirtualWans": "vwan-",
+    "networkVpnGateways": "vpng-",
+    "networkVpnGatewaysVpnConnections": "vcn-",
+    "networkVpnGatewaysVpnSites": "vst-",
+    "notificationHubsNamespaces": "ntfns-",
+    "notificationHubsNamespacesNotificationHubs": "ntf-",
+    "operationalInsightsWorkspaces": "log-",
+    "portalDashboards": "dash-",
+    "powerBIDedicatedCapacities": "pbi-",
+    "purviewAccounts": "pview-",
+    "recoveryServicesVaults": "rsv-",
+    "resourcesResourceGroups": "rg-",
+    "searchSearchServices": "srch-",
+    "serviceBusNamespaces": "sb-",
+    "serviceBusNamespacesQueues": "sbq-",
+    "serviceBusNamespacesTopics": "sbt-",
+    "serviceEndPointPolicies": "se-",
+    "serviceFabricClusters": "sf-",
+    "signalRServiceSignalR": "sigr",
+    "sqlManagedInstances": "sqlmi-",
+    "sqlServers": "sql-",
+    "sqlServersDataWarehouse": "sqldw-",
+    "sqlServersDatabases": "sqldb-",
+    "sqlServersDatabasesStretch": "sqlstrdb-",
+    "storageStorageAccounts": "st",
+    "storageStorageAccountsVm": "stvm",
+    "storSimpleManagers": "ssimp",
+    "streamAnalyticsCluster": "asa-",
+    "synapseWorkspaces": "syn",
+    "synapseWorkspacesAnalyticsWorkspaces": "synw",
+    "synapseWorkspacesSqlPoolsDedicated": "syndp",
+    "synapseWorkspacesSqlPoolsSpark": "synsp",
+    "timeSeriesInsightsEnvironments": "tsi-",
+    "webServerFarms": "plan-",
+    "webSitesAppService": "app-",
+    "webSitesAppServiceEnvironment": "ase-",
+    "webSitesFunctions": "func-",
+    "webStaticSites": "stapp-"
+}
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/ai/acr-role-assignment.bicep b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/ai/acr-role-assignment.bicep
new file mode 100644
index 000000000000..3e0c2b218be7
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/ai/acr-role-assignment.bicep
@@ -0,0 +1,27 @@
+targetScope = 'resourceGroup'
+
+@description('Name of the existing container registry')
+param acrName string
+
+@description('Principal ID to grant AcrPull role')
+param principalId string
+
+@description('Full resource ID of the ACR (for generating unique GUID)')
+param acrResourceId string
+
+// Reference the existing ACR in this resource group
+resource acr 'Microsoft.ContainerRegistry/registries@2023-07-01' existing = {
+  name: acrName
+}
+
+// Grant AcrPull role to the AI project's managed identity
+resource acrPullRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+  scope: acr
+  name: guid(acrResourceId, principalId, '7f951dda-4ed3-4680-a7ca-43fe172d538d')
+  properties: {
+    principalId: principalId
+    principalType: 'ServicePrincipal'
+    // AcrPull role
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '7f951dda-4ed3-4680-a7ca-43fe172d538d')
+  }
+}
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/ai/ai-project.bicep b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/ai/ai-project.bicep
new file mode 100644
index 000000000000..662b53c001c8
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/ai/ai-project.bicep
@@ -0,0 +1,413 @@
+targetScope = 'resourceGroup'
+
+@description('Tags that will be applied to all resources')
+param tags object = {}
+
+@description('Main location for the resources')
+param location string
+
+var resourceToken = uniqueString(subscription().id, resourceGroup().id, location)
+
+@description('Name of the project')
+param aiFoundryProjectName string
+
+param deployments deploymentsType
+
+@description('Id of the user or app to assign application roles')
+param principalId string
+
+@description('Principal type of user or app')
+param principalType string
+
+@description('Optional. Name of an existing AI Services account in the current resource group. If not provided, a new one will be created.')
+param existingAiAccountName string = ''
+
+@description('List of connections to provision')
+param connections array = []
+
+@secure()
+@description('Map of connection name to credentials object. Kept as @secure to prevent secrets from appearing in deployment logs. Example: { "my-conn": { "key": "secret" } }')
+param connectionCredentials object = {}
+
+@description('Also provision dependent resources and connect to the project')
+param additionalDependentResources dependentResourcesType
+
+@description('Enable monitoring via appinsights and log analytics')
+param enableMonitoring bool = true
+
+@description('Enable hosted agent deployment')
+param enableHostedAgents bool = false
+
+@description('Enable the capability host for agent conversations. When false and hosted agents are enabled, the capability host is not created (v2 hosted agents handle storage automatically).')
+param enableCapabilityHost bool = true
+
+@description('Optional. Existing container registry resource ID. If provided, a connection will be created to this ACR instead of creating a new one.')
+param existingContainerRegistryResourceId string = ''
+
+@description('Optional. Existing container registry login server (e.g., myregistry.azurecr.io). Required if existingContainerRegistryResourceId is provided.')
+param existingContainerRegistryEndpoint string = ''
+
+@description('Optional. Name of an existing ACR connection on the Foundry project. If provided, no new ACR or connection will be created.')
+param existingAcrConnectionName string = ''
+
+@description('Optional. Existing Application Insights connection string. If provided, a connection will be created but no new App Insights resource.')
+param existingApplicationInsightsConnectionString string = ''
+
+@description('Optional. Existing Application Insights resource ID. Used for connection metadata when providing an existing App Insights.')
+param existingApplicationInsightsResourceId string = ''
+
+@description('Optional. Name of an existing Application Insights connection on the Foundry project. If provided, no new App Insights or connection will be created.')
+param existingAppInsightsConnectionName string = ''
+
+// Load abbreviations
+var abbrs = loadJsonContent('../../abbreviations.json')
+
+// Determine which resources to create based on connections
+var hasStorageConnection = length(filter(additionalDependentResources, conn => conn.resource == 'storage')) > 0
+var hasAcrConnection = length(filter(additionalDependentResources, conn => conn.resource == 'registry')) > 0
+var hasExistingAcr = !empty(existingContainerRegistryResourceId)
+var hasExistingAcrConnection = !empty(existingAcrConnectionName)
+var hasExistingAppInsightsConnection = !empty(existingAppInsightsConnectionName)
+var hasExistingAppInsightsConnectionString = !empty(existingApplicationInsightsConnectionString)
+// Only create new App Insights resources if monitoring enabled and no existing connection/connection string
+var shouldCreateAppInsights = enableMonitoring && !hasExistingAppInsightsConnection && !hasExistingAppInsightsConnectionString
+var hasSearchConnection = length(filter(additionalDependentResources, conn => conn.resource == 'azure_ai_search')) > 0
+var hasBingConnection = length(filter(additionalDependentResources, conn => conn.resource == 'bing_grounding')) > 0
+var hasBingCustomConnection = length(filter(additionalDependentResources, conn => conn.resource == 'bing_custom_grounding')) > 0
+
+// Extract connection names from ai.yaml for each resource type
+var storageConnectionName = hasStorageConnection ? filter(additionalDependentResources, conn => conn.resource == 'storage')[0].connectionName : ''
+var acrConnectionName = hasAcrConnection ? filter(additionalDependentResources, conn => conn.resource == 'registry')[0].connectionName : ''
+var searchConnectionName = hasSearchConnection ? filter(additionalDependentResources, conn => conn.resource == 'azure_ai_search')[0].connectionName : ''
+var bingConnectionName = hasBingConnection ? filter(additionalDependentResources, conn => conn.resource == 'bing_grounding')[0].connectionName : ''
+var bingCustomConnectionName = hasBingCustomConnection ? filter(additionalDependentResources, conn => conn.resource == 'bing_custom_grounding')[0].connectionName : ''
+
+// Enable monitoring via Log Analytics and Application Insights
+module logAnalytics '../monitor/loganalytics.bicep' = if (shouldCreateAppInsights) {
+  name: 'logAnalytics'
+  params: {
+    location: location
+    tags: tags
+    name: 'logs-${resourceToken}'
+  }
+}
+
+module applicationInsights '../monitor/applicationinsights.bicep' = if (shouldCreateAppInsights) {
+  name: 'applicationInsights'
+  params: {
+    location: location
+    tags: tags
+    name: 'appi-${resourceToken}'
+    logAnalyticsWorkspaceId: logAnalytics.outputs.id
+    projectMIPrincipalId: aiAccount::project.identity.principalId
+  }
+}
+
+// Always create a new AI Account for now (simplified approach)
+// TODO: Add support for existing accounts in a future version
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-06-01' = {
+  name: !empty(existingAiAccountName) ? existingAiAccountName : 'ai-account-${resourceToken}'
+  location: location
+  tags: tags
+  sku: {
+    name: 'S0'
+  }
+  kind: 'AIServices'
+  identity: {
+    type: 'SystemAssigned'
+  }
+  properties: {
+    allowProjectManagement: true
+    customSubDomainName: !empty(existingAiAccountName) ? existingAiAccountName : 'ai-account-${resourceToken}'
+    networkAcls: {
+      defaultAction: 'Allow'
+      virtualNetworkRules: []
+      ipRules: []
+    }
+    publicNetworkAccess: 'Enabled'
+    disableLocalAuth: true
+  }
+  
+  @batchSize(1)
+  resource seqDeployments 'deployments' = [
+    for dep in (deployments??[]): {
+      name: dep.name
+      properties: {
+        model: dep.model
+      }
+      sku: dep.sku
+    }
+  ]
+
+  resource project 'projects' = {
+    name: aiFoundryProjectName
+    location: location
+    identity: {
+      type: 'SystemAssigned'
+    }
+    properties: {
+      description: '${aiFoundryProjectName} Project'
+      displayName: '${aiFoundryProjectName}Project'
+    }
+    dependsOn: [
+      seqDeployments
+    ]
+  }
+
+  resource aiFoundryAccountCapabilityHost 'capabilityHosts@2025-10-01-preview' = if (enableHostedAgents && enableCapabilityHost) {
+    name: 'agents'
+    properties: {
+      capabilityHostKind: 'Agents'
+      // IMPORTANT: this is required to enable hosted agents deployment
+      // if no BYO Net is provided
+      enablePublicHostingEnvironment: true
+    }
+  }
+}
+
+
+// Create connection towards appinsights:
+// - when we create a new App Insights resource, OR
+// - when the user provided an existing App Insights connection string + resource ID but no existing connection name
+// Both cases are merged into a single resource to avoid duplicate ARM resource definitions (which fail deployment).
+var shouldCreateExistingAppInsightsConnection = enableMonitoring && hasExistingAppInsightsConnectionString && !hasExistingAppInsightsConnection && !empty(existingApplicationInsightsResourceId)
+var shouldCreateAppInsightsConnection = shouldCreateAppInsights || shouldCreateExistingAppInsightsConnection
+
+resource appInsightConnection 'Microsoft.CognitiveServices/accounts/projects/connections@2025-04-01-preview' = if (shouldCreateAppInsightsConnection) {
+  parent: aiAccount::project
+  name: 'appi-${resourceToken}'
+  properties: {
+    category: 'AppInsights'
+    target: shouldCreateAppInsights ? applicationInsights.outputs.id : existingApplicationInsightsResourceId
+    authType: 'ApiKey'
+    isSharedToAll: true
+    credentials: {
+      key: shouldCreateAppInsights ? applicationInsights.outputs.connectionString : existingApplicationInsightsConnectionString
+    }
+    metadata: {
+      ApiType: 'Azure'
+      ResourceId: shouldCreateAppInsights ? applicationInsights.outputs.id : existingApplicationInsightsResourceId
+    }
+  }
+}
+
+// Create additional connections from ai.yaml configuration
+module aiConnections './connection.bicep' = [for (connection, index) in connections: {
+  name: 'connection-${connection.name}'
+  params: {
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+    connectionConfig: connection
+    credentials: connectionCredentials[?connection.name] ?? {}
+  }
+}]
+
+// Azure AI User for the developer, scoped to the Foundry Project.
+// Project scope is sufficient for creating/running agents and calling models via the project endpoint.
+resource localUserAzureAIUserRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+  scope: aiAccount::project
+  name: guid(subscription().id, resourceGroup().id, principalId, '53ca6127-db72-4b80-b1b0-d745d6d5456d')
+  properties: {
+    principalId: principalId
+    principalType: principalType
+    roleDefinitionId: resourceId('Microsoft.Authorization/roleDefinitions', '53ca6127-db72-4b80-b1b0-d745d6d5456d')
+  }
+}
+
+
+// All connections are now created directly within their respective resource modules
+// using the centralized ./connection.bicep module
+
+// Storage module - deploy if storage connection is defined in ai.yaml
+module storage '../storage/storage.bicep' = if (hasStorageConnection) {
+  name: 'storage'
+  params: {
+    location: location
+    tags: tags
+    resourceName: 'st${resourceToken}'
+    connectionName: storageConnectionName
+    principalId: principalId
+    principalType: principalType
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+  }
+}
+
+// Azure Container Registry module - deploy if ACR connection is defined in ai.yaml
+module acr '../host/acr.bicep' = if (hasAcrConnection) {
+  name: 'acr'
+  params: {
+    location: location
+    tags: tags
+    resourceName: '${abbrs.containerRegistryRegistries}${resourceToken}'
+    connectionName: acrConnectionName
+    principalId: principalId
+    principalType: principalType
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+  }
+}
+
+// Connection for existing ACR - create if user provided an existing ACR resource ID but no existing connection
+module existingAcrConnection './connection.bicep' = if (hasExistingAcr && !hasExistingAcrConnection) {
+  name: 'existing-acr-connection'
+  params: {
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+    connectionConfig: {
+      name: 'acr-${resourceToken}'
+      category: 'ContainerRegistry'
+      target: existingContainerRegistryEndpoint
+      authType: 'ManagedIdentity'
+      isSharedToAll: true
+      metadata: {
+        ResourceId: existingContainerRegistryResourceId
+      }
+    }
+    credentials: {
+      clientId: aiAccount::project.identity.principalId
+      resourceId: existingContainerRegistryResourceId
+    }
+  }
+}
+
+// Extract resource group name from the existing ACR resource ID
+// Resource ID format: /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.ContainerRegistry/registries/{name}
+var existingAcrResourceGroup = hasExistingAcr ? split(existingContainerRegistryResourceId, '/')[4] : ''
+var existingAcrName = hasExistingAcr ? last(split(existingContainerRegistryResourceId, '/')) : ''
+
+// Grant AcrPull role to the AI project's managed identity on the existing ACR
+// This allows the hosted agents to pull images from the user-provided registry
+// Note: User must have permission to assign roles on the existing ACR (Owner or User Access Administrator)
+// Using a module allows scoping to a different resource group if the ACR isn't in the same RG
+// Skip if connection already exists (role assignment should already be in place)
+module existingAcrRoleAssignment './acr-role-assignment.bicep' = if (hasExistingAcr && !hasExistingAcrConnection) {
+  name: 'existing-acr-role-assignment'
+  scope: resourceGroup(existingAcrResourceGroup)
+  params: {
+    acrName: existingAcrName
+    acrResourceId: existingContainerRegistryResourceId
+    principalId: aiAccount::project.identity.principalId
+  }
+}
+
+// Bing Search grounding module - deploy if Bing connection is defined in ai.yaml or parameter is enabled
+module bingGrounding '../search/bing_grounding.bicep' = if (hasBingConnection) {
+  name: 'bing-grounding'
+  params: {
+    tags: tags
+    resourceName: 'bing-${resourceToken}'
+    connectionName: bingConnectionName
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+  }
+}
+
+// Bing Custom Search grounding module - deploy if custom Bing connection is defined in ai.yaml or parameter is enabled
+module bingCustomGrounding '../search/bing_custom_grounding.bicep' = if (hasBingCustomConnection) {
+  name: 'bing-custom-grounding'
+  params: {
+    tags: tags
+    resourceName: 'bingcustom-${resourceToken}'
+    connectionName: bingCustomConnectionName
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+  }
+}
+
+// Azure AI Search module - deploy if search connection is defined in ai.yaml
+module azureAiSearch '../search/azure_ai_search.bicep' = if (hasSearchConnection) {
+  name: 'azure-ai-search'
+  params: {
+    tags: tags
+    resourceName: 'search-${resourceToken}'
+    connectionName: searchConnectionName
+    storageAccountResourceId: hasStorageConnection ? storage!.outputs.storageAccountId : ''
+    containerName: 'knowledge'
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+    principalId: principalId
+    principalType: principalType
+    location: location
+  }
+}
+
+// Outputs
+output AZURE_AI_PROJECT_ENDPOINT string = aiAccount::project.properties.endpoints['AI Foundry API']
+output AZURE_OPENAI_ENDPOINT string = aiAccount.properties.endpoints['OpenAI Language Model Instance API']
+output aiServicesEndpoint string = aiAccount.properties.endpoint
+output accountId string = aiAccount.id
+output projectId string = aiAccount::project.id
+output aiServicesAccountName string = aiAccount.name
+output aiServicesProjectName string = aiAccount::project.name
+output aiServicesPrincipalId string = aiAccount.identity.principalId
+output projectName string = aiAccount::project.name
+output APPLICATIONINSIGHTS_CONNECTION_STRING string = shouldCreateAppInsights ? applicationInsights.outputs.connectionString : (hasExistingAppInsightsConnectionString ? existingApplicationInsightsConnectionString : '')
+output APPLICATIONINSIGHTS_RESOURCE_ID string = shouldCreateAppInsights ? applicationInsights.outputs.id : (hasExistingAppInsightsConnectionString ? existingApplicationInsightsResourceId : '')
+
+// Connection outputs from the connections array
+output connectionIds array = [for (connection, index) in (connections ?? []): {
+  name: aiConnections[index].outputs.connectionName
+  id: aiConnections[index].outputs.connectionId
+}]
+
+// Grouped dependent resources outputs
+output dependentResources object = {
+  registry: {
+    name: hasAcrConnection ? acr!.outputs.containerRegistryName : ''
+    loginServer: hasAcrConnection ? acr!.outputs.containerRegistryLoginServer : ((hasExistingAcr || hasExistingAcrConnection) ? existingContainerRegistryEndpoint : '')
+    connectionName: hasAcrConnection ? acr!.outputs.containerRegistryConnectionName : (hasExistingAcrConnection ? existingAcrConnectionName : (hasExistingAcr ? 'acr-${resourceToken}' : ''))
+  }
+  bing_grounding: {
+    name: (hasBingConnection) ? bingGrounding!.outputs.bingGroundingName : ''
+    connectionName: (hasBingConnection) ? bingGrounding!.outputs.bingGroundingConnectionName : ''
+    connectionId: (hasBingConnection) ? bingGrounding!.outputs.bingGroundingConnectionId : ''
+  }
+  bing_custom_grounding: {
+    name: (hasBingCustomConnection) ? bingCustomGrounding!.outputs.bingCustomGroundingName : ''
+    connectionName: (hasBingCustomConnection) ? bingCustomGrounding!.outputs.bingCustomGroundingConnectionName : ''
+    connectionId: (hasBingCustomConnection) ? bingCustomGrounding!.outputs.bingCustomGroundingConnectionId : ''
+  }
+  search: {
+    serviceName: hasSearchConnection ? azureAiSearch!.outputs.searchServiceName : ''
+    connectionName: hasSearchConnection ? azureAiSearch!.outputs.searchConnectionName : ''
+  }
+  storage: {
+    accountName: hasStorageConnection ? storage!.outputs.storageAccountName : ''
+    connectionName: hasStorageConnection ? storage!.outputs.storageConnectionName : ''
+  }
+}
+
+type deploymentsType = {
+  @description('Specify the name of cognitive service account deployment.')
+  name: string
+
+  @description('Required. Properties of Cognitive Services account deployment model.')
+  model: {
+    @description('Required. The name of Cognitive Services account deployment model.')
+    name: string
+
+    @description('Required. The format of Cognitive Services account deployment model.')
+    format: string
+
+    @description('Required. The version of Cognitive Services account deployment model.')
+    version: string
+  }
+
+  @description('The resource model definition representing SKU.')
+  sku: {
+    @description('Required. The name of the resource model definition representing SKU.')
+    name: string
+
+    @description('The capacity of the resource model definition representing SKU.')
+    capacity: int
+  }
+}[]?
+
+type dependentResourcesType = {
+  @description('The type of dependent resource to create')
+  resource: 'storage' | 'registry' | 'azure_ai_search' | 'bing_grounding' | 'bing_custom_grounding'
+  
+  @description('The connection name for this resource')
+  connectionName: string
+}[]
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/ai/connection.bicep b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/ai/connection.bicep
new file mode 100644
index 000000000000..a08726645243
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/ai/connection.bicep
@@ -0,0 +1,112 @@
+targetScope = 'resourceGroup'
+
+@description('AI Services account name')
+param aiServicesAccountName string
+
+@description('AI project name')
+param aiProjectName string
+
+// Connection configuration type definition
+type ConnectionConfig = {
+  @description('Name of the connection')
+  name: string
+
+  @description('Category of the connection (e.g., ContainerRegistry, AzureStorageAccount, CognitiveSearch, AzureOpenAI)')
+  category: string
+
+  @description('Target endpoint or URL for the connection')
+  target: string
+
+  @description('Authentication type')
+  authType: 'AAD' | 'AccessKey' | 'AccountKey' | 'AgenticIdentity' | 'ApiKey' | 'CustomKeys' | 'ManagedIdentity' | 'None' | 'OAuth2' | 'PAT' | 'SAS' | 'ServicePrincipal' | 'UsernamePassword' | 'UserEntraToken' | 'ProjectManagedIdentity'
+
+  @description('Whether the connection is shared to all users (optional, defaults to true)')
+  isSharedToAll: bool?
+
+  @description('Additional metadata for the connection (optional)')
+  metadata: object?
+
+  @description('Error message if the connection fails (optional)')
+  error: string?
+
+  @description('Expiry time for the connection (optional)')
+  expiryTime: string?
+
+  @description('Private endpoint requirement: Required, NotRequired, or NotApplicable (optional)')
+  peRequirement: ('NotApplicable' | 'NotRequired' | 'Required')?
+
+  @description('Private endpoint status: Active, Inactive, or NotApplicable (optional)')
+  peStatus: ('Active' | 'Inactive' | 'NotApplicable')?
+
+  @description('List of users to share the connection with (optional, alternative to isSharedToAll)')
+  sharedUserList: string[]?
+
+  @description('Whether to use workspace managed identity (optional)')
+  useWorkspaceManagedIdentity: bool?
+
+  @description('OAuth2 authorization endpoint URL (optional, OAuth2 authType only)')
+  authorizationUrl: string?
+
+  @description('OAuth2 token endpoint URL (optional, OAuth2 authType only)')
+  tokenUrl: string?
+
+  @description('OAuth2 refresh token endpoint URL (optional, OAuth2 authType only)')
+  refreshUrl: string?
+
+  @description('OAuth2 scopes to request (optional, OAuth2 authType only)')
+  scopes: string[]?
+
+  @description('Token audience for UserEntraToken / AgenticIdentity auth types (optional)')
+  audience: string?
+
+  @description('Managed connector name for OAuth2 managed connectors (optional)')
+  connectorName: string?
+}
+
+@description('Connection configuration')
+param connectionConfig ConnectionConfig
+
+@secure()
+@description('Credentials for the connection. Kept as a separate @secure parameter to prevent secrets from appearing in deployment logs. Shape depends on authType — e.g. { key: "..." } for ApiKey, { clientId: "...", clientSecret: "..." } for OAuth2/ServicePrincipal.')
+param credentials object = {}
+
+
+// Get reference to the AI Services account and project
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' existing = {
+  name: aiServicesAccountName
+
+  resource project 'projects' existing = {
+    name: aiProjectName
+  }
+}
+
+// Create the connection
+resource connection 'Microsoft.CognitiveServices/accounts/projects/connections@2025-04-01-preview' = {
+  parent: aiAccount::project
+  name: connectionConfig.name
+  properties: {
+    category: connectionConfig.category
+    target: connectionConfig.target
+    authType: connectionConfig.authType
+    isSharedToAll: connectionConfig.?isSharedToAll ?? true
+    credentials: !empty(credentials) ? credentials : null
+    metadata: connectionConfig.?metadata
+    // Only include if they appear in the connectionConfig
+    ...connectionConfig.?error != null ? { error: connectionConfig.?error  } : {}
+    ...connectionConfig.?expiryTime != null ? { expiryTime: connectionConfig.?expiryTime  } : {}
+    ...connectionConfig.?peRequirement != null ? { peRequirement: connectionConfig.?peRequirement  } : {}
+    ...connectionConfig.?peStatus != null ? { peStatus: connectionConfig.?peStatus  } : {}
+    ...connectionConfig.?sharedUserList != null ? { sharedUserList: connectionConfig.?sharedUserList  } : {}
+    ...connectionConfig.?useWorkspaceManagedIdentity != null ? { useWorkspaceManagedIdentity: connectionConfig.?useWorkspaceManagedIdentity  } : {}
+    ...connectionConfig.?authorizationUrl != null ? { authorizationUrl: connectionConfig.?authorizationUrl } : {}
+    ...connectionConfig.?tokenUrl != null ? { tokenUrl: connectionConfig.?tokenUrl } : {}
+    ...connectionConfig.?refreshUrl != null ? { refreshUrl: connectionConfig.?refreshUrl } : {}
+    ...connectionConfig.?scopes != null ? { scopes: connectionConfig.?scopes } : {}
+    ...connectionConfig.?audience != null ? { audience: connectionConfig.?audience } : {}
+    ...connectionConfig.?connectorName != null ? { connectorName: connectionConfig.?connectorName } : {}
+  }
+}
+
+// Outputs
+output connectionName string = connection.name
+output connectionId string = connection.id
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/ai/existing-ai-project.bicep b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/ai/existing-ai-project.bicep
new file mode 100644
index 000000000000..fea2782fdfa5
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/ai/existing-ai-project.bicep
@@ -0,0 +1,96 @@
+targetScope = 'resourceGroup'
+
+@description('Name of the existing AI Services account')
+param aiServicesAccountName string
+
+@description('Name of the existing AI Foundry project')
+param aiFoundryProjectName string
+
+@description('Existing ACR connection name (already set in the environment)')
+param existingAcrConnectionName string = ''
+
+@description('Existing container registry endpoint (already set in the environment)')
+param existingContainerRegistryEndpoint string = ''
+
+@description('Existing Application Insights connection string (already set in the environment)')
+param existingApplicationInsightsConnectionString string = ''
+
+@description('Existing Application Insights resource ID (already set in the environment)')
+param existingApplicationInsightsResourceId string = ''
+
+@description('List of connections to provision on the existing project')
+param connections array = []
+
+@secure()
+@description('Map of connection name to credentials object. Kept as @secure to prevent secrets from appearing in deployment logs. Example: { "my-conn": { "key": "secret" } }')
+param connectionCredentials object = {}
+
+// Reference the existing account and project — read-only except for the
+// additional connections provisioned below from the agent manifest.
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-06-01' existing = {
+  name: aiServicesAccountName
+
+  resource project 'projects' existing = {
+    name: aiFoundryProjectName
+  }
+}
+
+// Create additional connections from ai.yaml / agent manifest configuration on
+// the existing project. Mirrors the loop in ai-project.bicep so manifest-declared
+// connections are provisioned regardless of whether the project itself is new or
+// pre-existing.
+module aiConnections './connection.bicep' = [for (connection, index) in connections: {
+  name: 'existing-connection-${connection.name}'
+  params: {
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+    connectionConfig: connection
+    credentials: connectionCredentials[?connection.name] ?? {}
+  }
+}]
+
+// Outputs — same shape as ai-project.bicep so main.bicep can use either interchangeably
+output AZURE_AI_PROJECT_ENDPOINT string = aiAccount::project.properties.endpoints['AI Foundry API']
+output AZURE_OPENAI_ENDPOINT string = aiAccount.properties.endpoints['OpenAI Language Model Instance API']
+output aiServicesEndpoint string = aiAccount.properties.endpoint
+output accountId string = aiAccount.id
+output projectId string = aiAccount::project.id
+output aiServicesAccountName string = aiAccount.name
+output aiServicesProjectName string = aiAccount::project.name
+output aiServicesPrincipalId string = aiAccount.identity.principalId
+output projectName string = aiAccount::project.name
+output APPLICATIONINSIGHTS_CONNECTION_STRING string = existingApplicationInsightsConnectionString
+output APPLICATIONINSIGHTS_RESOURCE_ID string = existingApplicationInsightsResourceId
+
+// Empty connection outputs — these are already set in the azd environment from init
+// Connection outputs from the connections array (provisioned above)
+output connectionIds array = [for (connection, index) in (connections ?? []): {
+  name: aiConnections[index].outputs.connectionName
+  id: aiConnections[index].outputs.connectionId
+}]
+
+output dependentResources object = {
+  registry: {
+    name: ''
+    loginServer: existingContainerRegistryEndpoint
+    connectionName: existingAcrConnectionName
+  }
+  bing_grounding: {
+    name: ''
+    connectionName: ''
+    connectionId: ''
+  }
+  bing_custom_grounding: {
+    name: ''
+    connectionName: ''
+    connectionId: ''
+  }
+  search: {
+    serviceName: ''
+    connectionName: ''
+  }
+  storage: {
+    accountName: ''
+    connectionName: ''
+  }
+}
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/host/acr.bicep b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/host/acr.bicep
new file mode 100644
index 000000000000..f1893d8ff312
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/host/acr.bicep
@@ -0,0 +1,88 @@
+targetScope = 'resourceGroup'
+
+@description('The location used for all deployed resources')
+param location string = resourceGroup().location
+
+@description('Tags that will be applied to all resources')
+param tags object = {}
+
+@description('Resource name for the container registry')
+param resourceName string
+
+@description('Id of the user or app to assign application roles')
+param principalId string
+
+@description('Principal type of user or app')
+param principalType string
+
+@description('AI Services account name for the project parent')
+param aiServicesAccountName string = ''
+
+@description('AI project name for creating the connection')
+param aiProjectName string = ''
+
+@description('Name for the AI Foundry ACR connection')
+param connectionName string
+
+// Get reference to the AI Services account and project to access their managed identities
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' existing = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: aiServicesAccountName
+
+  resource aiProject 'projects' existing = {
+    name: aiProjectName
+  }
+}
+
+// Create the Container Registry
+module containerRegistry 'br/public:avm/res/container-registry/registry:0.1.1' = {
+  name: 'registry'
+  params: {
+    name: resourceName
+    location: location
+    tags: tags
+    publicNetworkAccess: 'Enabled'
+    roleAssignments:[
+      {
+        principalId: principalId
+        principalType: principalType
+        // Container Registry Tasks Contributor — build images with ACR tasks and push container images
+        roleDefinitionIdOrName: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'fb382eab-e894-4461-af04-94435c366c3f')
+      }
+      // TODO SEPARATELY
+      {
+        // the foundry project itself can pull from the ACR
+        principalId: aiAccount::aiProject.identity.principalId
+        principalType: 'ServicePrincipal'
+        roleDefinitionIdOrName: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '7f951dda-4ed3-4680-a7ca-43fe172d538d')
+      }
+    ]
+  }
+}
+
+// Create the ACR connection using the centralized connection module
+module acrConnection '../ai/connection.bicep' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: 'acr-connection-creation'
+  params: {
+    aiServicesAccountName: aiServicesAccountName
+    aiProjectName: aiProjectName
+    connectionConfig: {
+      name: connectionName
+      category: 'ContainerRegistry'
+      target: containerRegistry.outputs.loginServer
+      authType: 'ManagedIdentity'
+      isSharedToAll: true
+      metadata: {
+        ResourceId: containerRegistry.outputs.resourceId
+      }
+    }
+    credentials: {
+      clientId: aiAccount::aiProject.identity.principalId
+      resourceId: containerRegistry.outputs.resourceId
+    }
+  }
+}
+
+output containerRegistryName string = containerRegistry.outputs.name
+output containerRegistryLoginServer string = containerRegistry.outputs.loginServer
+output containerRegistryResourceId string = containerRegistry.outputs.resourceId
+output containerRegistryConnectionName string = acrConnection.outputs.connectionName
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/monitor/applicationinsights-dashboard.bicep b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/monitor/applicationinsights-dashboard.bicep
new file mode 100644
index 000000000000..d082e668ed9f
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/monitor/applicationinsights-dashboard.bicep
@@ -0,0 +1,1236 @@
+metadata description = 'Creates a dashboard for an Application Insights instance.'
+param name string
+param applicationInsightsName string
+param location string = resourceGroup().location
+param tags object = {}
+
+// 2020-09-01-preview because that is the latest valid version
+resource applicationInsightsDashboard 'Microsoft.Portal/dashboards@2020-09-01-preview' = {
+  name: name
+  location: location
+  tags: tags
+  properties: {
+    lenses: [
+      {
+        order: 0
+        parts: [
+          {
+            position: {
+              x: 0
+              y: 0
+              colSpan: 2
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'id'
+                  value: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                }
+                {
+                  name: 'Version'
+                  value: '1.0'
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/AspNetOverviewPinnedPart'
+              asset: {
+                idInputName: 'id'
+                type: 'ApplicationInsights'
+              }
+              defaultMenuItemId: 'overview'
+            }
+          }
+          {
+            position: {
+              x: 2
+              y: 0
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ComponentId'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'Version'
+                  value: '1.0'
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/ProactiveDetectionAsyncPart'
+              asset: {
+                idInputName: 'ComponentId'
+                type: 'ApplicationInsights'
+              }
+              defaultMenuItemId: 'ProactiveDetection'
+            }
+          }
+          {
+            position: {
+              x: 3
+              y: 0
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ComponentId'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'ResourceId'
+                  value: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/QuickPulseButtonSmallPart'
+              asset: {
+                idInputName: 'ComponentId'
+                type: 'ApplicationInsights'
+              }
+            }
+          }
+          {
+            position: {
+              x: 4
+              y: 0
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ComponentId'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'TimeContext'
+                  value: {
+                    durationMs: 86400000
+                    endTime: null
+                    createdTime: '2018-05-04T01:20:33.345Z'
+                    isInitialTime: true
+                    grain: 1
+                    useDashboardTimeRange: false
+                  }
+                }
+                {
+                  name: 'Version'
+                  value: '1.0'
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/AvailabilityNavButtonPart'
+              asset: {
+                idInputName: 'ComponentId'
+                type: 'ApplicationInsights'
+              }
+            }
+          }
+          {
+            position: {
+              x: 5
+              y: 0
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ComponentId'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'TimeContext'
+                  value: {
+                    durationMs: 86400000
+                    endTime: null
+                    createdTime: '2018-05-08T18:47:35.237Z'
+                    isInitialTime: true
+                    grain: 1
+                    useDashboardTimeRange: false
+                  }
+                }
+                {
+                  name: 'ConfigurationId'
+                  value: '78ce933e-e864-4b05-a27b-71fd55a6afad'
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/AppMapButtonPart'
+              asset: {
+                idInputName: 'ComponentId'
+                type: 'ApplicationInsights'
+              }
+            }
+          }
+          {
+            position: {
+              x: 0
+              y: 1
+              colSpan: 3
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: []
+              type: 'Extension/HubsExtension/PartType/MarkdownPart'
+              settings: {
+                content: {
+                  settings: {
+                    content: '# Usage'
+                    title: ''
+                    subtitle: ''
+                  }
+                }
+              }
+            }
+          }
+          {
+            position: {
+              x: 3
+              y: 1
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ComponentId'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'TimeContext'
+                  value: {
+                    durationMs: 86400000
+                    endTime: null
+                    createdTime: '2018-05-04T01:22:35.782Z'
+                    isInitialTime: true
+                    grain: 1
+                    useDashboardTimeRange: false
+                  }
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/UsageUsersOverviewPart'
+              asset: {
+                idInputName: 'ComponentId'
+                type: 'ApplicationInsights'
+              }
+            }
+          }
+          {
+            position: {
+              x: 4
+              y: 1
+              colSpan: 3
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: []
+              type: 'Extension/HubsExtension/PartType/MarkdownPart'
+              settings: {
+                content: {
+                  settings: {
+                    content: '# Reliability'
+                    title: ''
+                    subtitle: ''
+                  }
+                }
+              }
+            }
+          }
+          {
+            position: {
+              x: 7
+              y: 1
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ResourceId'
+                  value: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                }
+                {
+                  name: 'DataModel'
+                  value: {
+                    version: '1.0.0'
+                    timeContext: {
+                      durationMs: 86400000
+                      createdTime: '2018-05-04T23:42:40.072Z'
+                      isInitialTime: false
+                      grain: 1
+                      useDashboardTimeRange: false
+                    }
+                  }
+                  isOptional: true
+                }
+                {
+                  name: 'ConfigurationId'
+                  value: '8a02f7bf-ac0f-40e1-afe9-f0e72cfee77f'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/CuratedBladeFailuresPinnedPart'
+              isAdapter: true
+              asset: {
+                idInputName: 'ResourceId'
+                type: 'ApplicationInsights'
+              }
+              defaultMenuItemId: 'failures'
+            }
+          }
+          {
+            position: {
+              x: 8
+              y: 1
+              colSpan: 3
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: []
+              type: 'Extension/HubsExtension/PartType/MarkdownPart'
+              settings: {
+                content: {
+                  settings: {
+                    content: '# Responsiveness\r\n'
+                    title: ''
+                    subtitle: ''
+                  }
+                }
+              }
+            }
+          }
+          {
+            position: {
+              x: 11
+              y: 1
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ResourceId'
+                  value: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                }
+                {
+                  name: 'DataModel'
+                  value: {
+                    version: '1.0.0'
+                    timeContext: {
+                      durationMs: 86400000
+                      createdTime: '2018-05-04T23:43:37.804Z'
+                      isInitialTime: false
+                      grain: 1
+                      useDashboardTimeRange: false
+                    }
+                  }
+                  isOptional: true
+                }
+                {
+                  name: 'ConfigurationId'
+                  value: '2a8ede4f-2bee-4b9c-aed9-2db0e8a01865'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/CuratedBladePerformancePinnedPart'
+              isAdapter: true
+              asset: {
+                idInputName: 'ResourceId'
+                type: 'ApplicationInsights'
+              }
+              defaultMenuItemId: 'performance'
+            }
+          }
+          {
+            position: {
+              x: 12
+              y: 1
+              colSpan: 3
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: []
+              type: 'Extension/HubsExtension/PartType/MarkdownPart'
+              settings: {
+                content: {
+                  settings: {
+                    content: '# Browser'
+                    title: ''
+                    subtitle: ''
+                  }
+                }
+              }
+            }
+          }
+          {
+            position: {
+              x: 15
+              y: 1
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ComponentId'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'MetricsExplorerJsonDefinitionId'
+                  value: 'BrowserPerformanceTimelineMetrics'
+                }
+                {
+                  name: 'TimeContext'
+                  value: {
+                    durationMs: 86400000
+                    createdTime: '2018-05-08T12:16:27.534Z'
+                    isInitialTime: false
+                    grain: 1
+                    useDashboardTimeRange: false
+                  }
+                }
+                {
+                  name: 'CurrentFilter'
+                  value: {
+                    eventTypes: [
+                      4
+                      1
+                      3
+                      5
+                      2
+                      6
+                      13
+                    ]
+                    typeFacets: {}
+                    isPermissive: false
+                  }
+                }
+                {
+                  name: 'id'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'Version'
+                  value: '1.0'
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/MetricsExplorerBladePinnedPart'
+              asset: {
+                idInputName: 'ComponentId'
+                type: 'ApplicationInsights'
+              }
+              defaultMenuItemId: 'browser'
+            }
+          }
+          {
+            position: {
+              x: 0
+              y: 2
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'sessions/count'
+                          aggregationType: 5
+                          namespace: 'microsoft.insights/components/kusto'
+                          metricVisualization: {
+                            displayName: 'Sessions'
+                            color: '#47BDF5'
+                          }
+                        }
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'users/count'
+                          aggregationType: 5
+                          namespace: 'microsoft.insights/components/kusto'
+                          metricVisualization: {
+                            displayName: 'Users'
+                            color: '#7E58FF'
+                          }
+                        }
+                      ]
+                      title: 'Unique sessions and users'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                      openBladeOnClick: {
+                        openBlade: true
+                        destinationBlade: {
+                          extensionName: 'HubsExtension'
+                          bladeName: 'ResourceMenuBlade'
+                          parameters: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                            menuid: 'segmentationUsers'
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 4
+              y: 2
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'requests/failed'
+                          aggregationType: 7
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Failed requests'
+                            color: '#EC008C'
+                          }
+                        }
+                      ]
+                      title: 'Failed requests'
+                      visualization: {
+                        chartType: 3
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                      openBladeOnClick: {
+                        openBlade: true
+                        destinationBlade: {
+                          extensionName: 'HubsExtension'
+                          bladeName: 'ResourceMenuBlade'
+                          parameters: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                            menuid: 'failures'
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 8
+              y: 2
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'requests/duration'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Server response time'
+                            color: '#00BCF2'
+                          }
+                        }
+                      ]
+                      title: 'Server response time'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                      openBladeOnClick: {
+                        openBlade: true
+                        destinationBlade: {
+                          extensionName: 'HubsExtension'
+                          bladeName: 'ResourceMenuBlade'
+                          parameters: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                            menuid: 'performance'
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 12
+              y: 2
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'browserTimings/networkDuration'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Page load network connect time'
+                            color: '#7E58FF'
+                          }
+                        }
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'browserTimings/processingDuration'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Client processing time'
+                            color: '#44F1C8'
+                          }
+                        }
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'browserTimings/sendDuration'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Send request time'
+                            color: '#EB9371'
+                          }
+                        }
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'browserTimings/receiveDuration'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Receiving response time'
+                            color: '#0672F1'
+                          }
+                        }
+                      ]
+                      title: 'Average page load time breakdown'
+                      visualization: {
+                        chartType: 3
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 0
+              y: 5
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'availabilityResults/availabilityPercentage'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Availability'
+                            color: '#47BDF5'
+                          }
+                        }
+                      ]
+                      title: 'Average availability'
+                      visualization: {
+                        chartType: 3
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                      openBladeOnClick: {
+                        openBlade: true
+                        destinationBlade: {
+                          extensionName: 'HubsExtension'
+                          bladeName: 'ResourceMenuBlade'
+                          parameters: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                            menuid: 'availability'
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 4
+              y: 5
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'exceptions/server'
+                          aggregationType: 7
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Server exceptions'
+                            color: '#47BDF5'
+                          }
+                        }
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'dependencies/failed'
+                          aggregationType: 7
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Dependency failures'
+                            color: '#7E58FF'
+                          }
+                        }
+                      ]
+                      title: 'Server exceptions and Dependency failures'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 8
+              y: 5
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'performanceCounters/processorCpuPercentage'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Processor time'
+                            color: '#47BDF5'
+                          }
+                        }
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'performanceCounters/processCpuPercentage'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Process CPU'
+                            color: '#7E58FF'
+                          }
+                        }
+                      ]
+                      title: 'Average processor and process CPU utilization'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 12
+              y: 5
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'exceptions/browser'
+                          aggregationType: 7
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Browser exceptions'
+                            color: '#47BDF5'
+                          }
+                        }
+                      ]
+                      title: 'Browser exceptions'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 0
+              y: 8
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'availabilityResults/count'
+                          aggregationType: 7
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Availability test results count'
+                            color: '#47BDF5'
+                          }
+                        }
+                      ]
+                      title: 'Availability test results count'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 4
+              y: 8
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'performanceCounters/processIOBytesPerSecond'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Process IO rate'
+                            color: '#47BDF5'
+                          }
+                        }
+                      ]
+                      title: 'Average process I/O rate'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 8
+              y: 8
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'performanceCounters/memoryAvailableBytes'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Available memory'
+                            color: '#47BDF5'
+                          }
+                        }
+                      ]
+                      title: 'Average available memory'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+        ]
+      }
+    ]
+  }
+}
+
+resource applicationInsights 'Microsoft.Insights/components@2020-02-02' existing = {
+  name: applicationInsightsName
+}
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/monitor/applicationinsights.bicep b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/monitor/applicationinsights.bicep
new file mode 100644
index 000000000000..73240d1b1c9a
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/monitor/applicationinsights.bicep
@@ -0,0 +1,47 @@
+metadata description = 'Creates an Application Insights instance based on an existing Log Analytics workspace.'
+param name string
+param dashboardName string = ''
+param location string = resourceGroup().location
+param tags object = {}
+param logAnalyticsWorkspaceId string
+
+@description('Optional. Principal ID of the Foundry Project managed identity to grant Log Analytics Reader.')
+param projectMIPrincipalId string = ''
+
+resource applicationInsights 'Microsoft.Insights/components@2020-02-02' = {
+  name: name
+  location: location
+  tags: tags
+  kind: 'web'
+  properties: {
+    Application_Type: 'web'
+    WorkspaceResourceId: logAnalyticsWorkspaceId
+  }
+}
+
+module applicationInsightsDashboard 'applicationinsights-dashboard.bicep' = if (!empty(dashboardName)) {
+  name: 'application-insights-dashboard'
+  params: {
+    name: dashboardName
+    location: location
+    applicationInsightsName: applicationInsights.name
+  }
+}
+
+// Log Analytics Reader for the Foundry Project managed identity.
+// Required for running evaluations on traces generated by agents.
+resource logAnalyticsReaderRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(projectMIPrincipalId)) {
+  scope: applicationInsights
+  name: guid(applicationInsights.id, projectMIPrincipalId, '73c42c96-874c-492b-b04d-ab87d138a893')
+  properties: {
+    principalId: projectMIPrincipalId
+    principalType: 'ServicePrincipal'
+    // Log Analytics Reader
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '73c42c96-874c-492b-b04d-ab87d138a893')
+  }
+}
+
+output connectionString string = applicationInsights.properties.ConnectionString
+output id string = applicationInsights.id
+output instrumentationKey string = applicationInsights.properties.InstrumentationKey
+output name string = applicationInsights.name
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/monitor/loganalytics.bicep b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/monitor/loganalytics.bicep
new file mode 100644
index 000000000000..33f9dc29443a
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/monitor/loganalytics.bicep
@@ -0,0 +1,22 @@
+metadata description = 'Creates a Log Analytics workspace.'
+param name string
+param location string = resourceGroup().location
+param tags object = {}
+
+resource logAnalytics 'Microsoft.OperationalInsights/workspaces@2021-12-01-preview' = {
+  name: name
+  location: location
+  tags: tags
+  properties: any({
+    retentionInDays: 30
+    features: {
+      searchVersion: 1
+    }
+    sku: {
+      name: 'PerGB2018'
+    }
+  })
+}
+
+output id string = logAnalytics.id
+output name string = logAnalytics.name
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/search/azure_ai_search.bicep b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/search/azure_ai_search.bicep
new file mode 100644
index 000000000000..7bb8e6350025
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/search/azure_ai_search.bicep
@@ -0,0 +1,211 @@
+targetScope = 'resourceGroup'
+
+@description('Tags that will be applied to all resources')
+param tags object = {}
+
+@description('Azure Search resource name')
+param resourceName string
+
+@description('Azure Search SKU name')
+param azureSearchSkuName string = 'basic'
+
+@description('Azure storage account resource ID')
+param storageAccountResourceId string
+
+@description('container name')
+param containerName string = 'knowledgebase'
+
+@description('AI Services account name for the project parent')
+param aiServicesAccountName string = ''
+
+@description('AI project name for creating the connection')
+param aiProjectName string = ''
+
+@description('Id of the user or app to assign application roles')
+param principalId string
+
+@description('Principal type of user or app')
+param principalType string
+
+@description('Name for the AI Foundry search connection')
+param connectionName string
+
+@description('Location for all resources')
+param location string = resourceGroup().location
+
+// Get reference to the AI Services account and project to access their managed identities
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' existing = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: aiServicesAccountName
+
+  resource aiProject 'projects' existing = {
+    name: aiProjectName
+  }
+}
+
+// Azure Search Service
+resource searchService 'Microsoft.Search/searchServices@2024-06-01-preview' = {
+  name: resourceName
+  location: location
+  tags: tags
+  sku: {
+    name: azureSearchSkuName
+  }
+  identity: {
+    type: 'SystemAssigned'
+  }
+  properties: {
+    replicaCount: 1
+    partitionCount: 1
+    hostingMode: 'default'
+    authOptions: {
+      aadOrApiKey: {
+        aadAuthFailureMode: 'http401WithBearerChallenge'
+      }
+    }
+    disableLocalAuth: false
+    encryptionWithCmk: {
+      enforcement: 'Unspecified'
+    }
+    publicNetworkAccess: 'enabled'
+  }
+}
+
+// Reference to existing Storage Account
+resource storageAccount 'Microsoft.Storage/storageAccounts@2023-05-01' existing = {
+  name: last(split(storageAccountResourceId, '/'))
+}
+
+// Reference to existing Blob Service
+resource blobService 'Microsoft.Storage/storageAccounts/blobServices@2023-05-01' existing = {
+  parent: storageAccount
+  name: 'default'
+}
+
+// Storage Container (create if it doesn't exist)
+resource storageContainer 'Microsoft.Storage/storageAccounts/blobServices/containers@2023-05-01' = {
+  parent: blobService
+  name: containerName
+  properties: {
+    publicAccess: 'None'
+  }
+}
+
+// RBAC Assignments
+
+// Search needs to read from Storage
+resource searchToStorageRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+  name: guid(storageAccount.id, searchService.id, 'Storage Blob Data Reader', uniqueString(deployment().name))
+  scope: storageAccount
+  properties: {
+    // GOOD
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '2a2b9908-6ea1-4ae2-8e65-a410df84e7d1') // Storage Blob Data Reader
+    principalId: searchService.identity.principalId
+    principalType: 'ServicePrincipal'
+  }
+}
+
+// Search needs OpenAI access (AI Services account)
+resource searchToAIServicesRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(aiServicesAccountName)) {
+  name: guid(aiServicesAccountName, searchService.id, 'Cognitive Services OpenAI User', uniqueString(deployment().name))
+  properties: {
+    // GOOD
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '5e0bd9bd-7b93-4f28-af87-19fc36ad61bd') // Cognitive Services OpenAI User
+    principalId: searchService.identity.principalId
+    principalType: 'ServicePrincipal'
+  }
+}
+
+// AI Project needs Search access - Service Contributor
+resource aiServicesToSearchServiceRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: guid(searchService.id, aiServicesAccountName, aiProjectName, 'Search Service Contributor', uniqueString(deployment().name))
+  scope: searchService
+  properties: {
+    // GOOD
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '7ca78c08-252a-4471-8644-bb5ff32d4ba0') // Search Service Contributor
+    principalId: aiAccount::aiProject.identity.principalId
+    principalType: 'ServicePrincipal'
+  }
+}
+
+// AI Project needs Search access - Index Data Contributor
+resource aiServicesToSearchDataRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: guid(searchService.id, aiServicesAccountName, aiProjectName, 'Search Index Data Contributor', uniqueString(deployment().name))
+  scope: searchService
+  properties: {
+    // GOOD
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '8ebe5a00-799e-43f5-93ac-243d3dce84a7') // Search Index Data Contributor
+    principalId: aiAccount::aiProject.identity.principalId
+    principalType: 'ServicePrincipal'
+  }
+}
+
+// User permissions - Search Index Data Contributor
+resource userToSearchRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+  name: guid(searchService.id, principalId, 'Search Index Data Contributor', uniqueString(deployment().name))
+  scope: searchService
+  properties: {
+    // GOOD
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '8ebe5a00-799e-43f5-93ac-243d3dce84a7') // Search Index Data Contributor
+    principalId: principalId
+    principalType: principalType
+  }
+}
+
+// // User permissions - Storage Blob Data Contributor
+// resource userToStorageRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+//   name: guid(storageAccount.id, principalId, 'Storage Blob Data Contributor', uniqueString(deployment().name))
+//   scope: storageAccount
+//   properties: {
+//     roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe') // Storage Blob Data Contributor
+//     principalId: principalId
+//     principalType: principalType
+//   }
+// }
+
+// // Project needs Search access - Index Data Contributor
+// resource projectToSearchRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+//   name: guid(searchService.id, aiProjectName, 'Search Index Data Contributor', uniqueString(deployment().name))
+//   scope: searchService
+//   properties: {
+//     roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '8ebe5a00-799e-43f5-93ac-243d3dce84a7') // Search Index Data Contributor
+//     principalId: aiAccountPrincipalId // Using AI account principal ID as project identity
+//     principalType: 'ServicePrincipal'
+//   }
+// }
+
+// Create the AI Search connection using the centralized connection module
+module aiSearchConnection '../ai/connection.bicep' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: 'ai-search-connection-creation'
+  params: {
+    aiServicesAccountName: aiServicesAccountName
+    aiProjectName: aiProjectName
+    connectionConfig: {
+      name: connectionName
+      category: 'CognitiveSearch'
+      target: 'https://${searchService.name}.search.windows.net'
+      authType: 'AAD'
+      isSharedToAll: true
+      metadata: {
+        ApiVersion: '2024-07-01'
+        ResourceId: searchService.id
+        ApiType: 'Azure'
+        type: 'azure_ai_search'
+      }
+    }
+  }
+  dependsOn: [
+    aiServicesToSearchDataRoleAssignment
+  ]
+}
+
+// Outputs
+output searchServiceName string = searchService.name
+output searchServiceId string = searchService.id
+output searchServicePrincipalId string = searchService.identity.principalId
+output storageAccountName string = storageAccount.name
+output storageAccountId string = storageAccount.id
+output containerName string = storageContainer.name
+output storageAccountPrincipalId string = storageAccount.identity.principalId
+output searchConnectionName string = (!empty(aiServicesAccountName) && !empty(aiProjectName)) ? aiSearchConnection!.outputs.connectionName : ''
+output searchConnectionId string = (!empty(aiServicesAccountName) && !empty(aiProjectName)) ? aiSearchConnection!.outputs.connectionId : ''
+
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/search/bing_custom_grounding.bicep b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/search/bing_custom_grounding.bicep
new file mode 100644
index 000000000000..1fddea079e2e
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/search/bing_custom_grounding.bicep
@@ -0,0 +1,84 @@
+targetScope = 'resourceGroup'
+
+@description('Tags that will be applied to all resources')
+param tags object = {}
+
+@description('Bing custom grounding resource name')
+param resourceName string
+
+@description('AI Services account name for the project parent')
+param aiServicesAccountName string = ''
+
+@description('AI project name for creating the connection')
+param aiProjectName string = ''
+
+@description('Name for the AI Foundry Bing Custom Search connection')
+param connectionName string
+
+// Get reference to the AI Services account and project to access their managed identities
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' existing = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: aiServicesAccountName
+
+  resource aiProject 'projects' existing = {
+    name: aiProjectName
+  }
+}
+
+// Bing Search resource for grounding capability
+resource bingCustomSearch 'Microsoft.Bing/accounts@2020-06-10' = {
+  name: resourceName
+  location: 'global'
+  tags: tags
+  sku: {
+    name: 'G1'
+  }
+  properties: {
+    statisticsEnabled: false
+  }
+  kind: 'Bing.CustomGrounding'
+}
+
+// Role assignment to allow AI project to use Bing Search
+resource bingCustomSearchRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  scope: bingCustomSearch
+  name: guid(subscription().id, resourceGroup().id, 'bing-search-role', aiServicesAccountName, aiProjectName)
+  properties: {
+    principalId: aiAccount::aiProject.identity.principalId
+    principalType: 'ServicePrincipal'
+    roleDefinitionId: resourceId('Microsoft.Authorization/roleDefinitions', 'a97b65f3-24c7-4388-baec-2e87135dc908') // Cognitive Services User
+  }
+}
+
+// Create the Bing Custom Search connection using the centralized connection module
+module aiSearchConnection '../ai/connection.bicep' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: 'bing-custom-search-connection-creation'
+  params: {
+    aiServicesAccountName: aiServicesAccountName
+    aiProjectName: aiProjectName
+    connectionConfig: {
+      name: connectionName
+      category: 'GroundingWithCustomSearch'
+      target: bingCustomSearch.properties.endpoint
+      authType: 'ApiKey'
+      isSharedToAll: true
+      metadata: {
+        Location: 'global'
+        ResourceId: bingCustomSearch.id
+        ApiType: 'Azure'
+        type: 'bing_custom_search'
+      }
+    }
+    credentials: {
+      key: bingCustomSearch.listKeys().key1
+    }
+  }
+  dependsOn: [
+    bingCustomSearchRoleAssignment
+  ]
+}
+
+// Outputs
+output bingCustomGroundingName string = bingCustomSearch.name
+output bingCustomGroundingConnectionName string = aiSearchConnection.outputs.connectionName
+output bingCustomGroundingResourceId string = bingCustomSearch.id
+output bingCustomGroundingConnectionId string = aiSearchConnection.outputs.connectionId
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/search/bing_grounding.bicep b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/search/bing_grounding.bicep
new file mode 100644
index 000000000000..20ea5e9f160a
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/search/bing_grounding.bicep
@@ -0,0 +1,83 @@
+targetScope = 'resourceGroup'
+
+@description('Tags that will be applied to all resources')
+param tags object = {}
+
+@description('Bing grounding resource name')
+param resourceName string
+
+@description('AI Services account name for the project parent')
+param aiServicesAccountName string = ''
+
+@description('AI project name for creating the connection')
+param aiProjectName string = ''
+
+@description('Name for the AI Foundry Bing Search connection')
+param connectionName string
+
+// Get reference to the AI Services account and project to access their managed identities
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' existing = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: aiServicesAccountName
+
+  resource aiProject 'projects' existing = {
+    name: aiProjectName
+  }
+}
+
+// Bing Search resource for grounding capability
+resource bingSearch 'Microsoft.Bing/accounts@2020-06-10' = {
+  name: resourceName
+  location: 'global'
+  tags: tags
+  sku: {
+    name: 'G1'
+  }
+  properties: {
+    statisticsEnabled: false
+  }
+  kind: 'Bing.Grounding'
+}
+
+// Role assignment to allow AI project to use Bing Search
+resource bingSearchRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  scope: bingSearch
+  name: guid(subscription().id, resourceGroup().id, 'bing-search-role', aiServicesAccountName, aiProjectName)
+  properties: {
+    principalId: aiAccount::aiProject.identity.principalId
+    principalType: 'ServicePrincipal'
+    roleDefinitionId: resourceId('Microsoft.Authorization/roleDefinitions', 'a97b65f3-24c7-4388-baec-2e87135dc908') // Cognitive Services User
+  }
+}
+
+// Create the Bing Search connection using the centralized connection module
+module bingSearchConnection '../ai/connection.bicep' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: 'bing-search-connection-creation'
+  params: {
+    aiServicesAccountName: aiServicesAccountName
+    aiProjectName: aiProjectName
+    connectionConfig: {
+      name: connectionName
+      category: 'GroundingWithBingSearch'
+      target: bingSearch.properties.endpoint
+      authType: 'ApiKey'
+      isSharedToAll: true
+      metadata: {
+        Location: 'global'
+        ResourceId: bingSearch.id
+        ApiType: 'Azure'
+        type: 'bing_grounding'
+      }
+    }
+    credentials: {
+      key: bingSearch.listKeys().key1
+    }
+  }
+  dependsOn: [
+    bingSearchRoleAssignment
+  ]
+}
+
+output bingGroundingName string = bingSearch.name
+output bingGroundingConnectionName string = bingSearchConnection.outputs.connectionName
+output bingGroundingResourceId string = bingSearch.id
+output bingGroundingConnectionId string = bingSearchConnection.outputs.connectionId
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/storage/storage.bicep b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/storage/storage.bicep
new file mode 100644
index 000000000000..18d9535dcd0b
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/core/storage/storage.bicep
@@ -0,0 +1,113 @@
+targetScope = 'resourceGroup'
+
+@description('The location used for all deployed resources')
+param location string = resourceGroup().location
+
+@description('Tags that will be applied to all resources')
+param tags object = {}
+
+@description('Storage account resource name')
+param resourceName string
+
+@description('Id of the user or app to assign application roles')
+param principalId string
+
+@description('Principal type of user or app')
+param principalType string
+
+@description('AI Services account name for the project parent')
+param aiServicesAccountName string = ''
+
+@description('AI project name for creating the connection')
+param aiProjectName string = ''
+
+@description('Name for the AI Foundry storage connection')
+param connectionName string
+
+// Storage Account for the AI Services account
+resource storageAccount 'Microsoft.Storage/storageAccounts@2023-05-01' = {
+  name: resourceName
+  location: location
+  tags: tags
+  sku: {
+    name: 'Standard_LRS'
+  }
+  kind: 'StorageV2'
+  identity: {
+    type: 'SystemAssigned'
+  }
+  properties: {
+    supportsHttpsTrafficOnly: true
+    allowBlobPublicAccess: false
+    minimumTlsVersion: 'TLS1_2'
+    accessTier: 'Hot'
+    encryption: {
+      services: {
+        blob: {
+          enabled: true
+        }
+        file: {
+          enabled: true
+        }
+      }
+      keySource: 'Microsoft.Storage'
+    }
+  }
+}
+
+// Get reference to the AI Services account and project to access their managed identities
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' existing = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: aiServicesAccountName
+
+  resource aiProject 'projects' existing = {
+    name: aiProjectName
+  }
+}
+
+// Role assignment for AI Services to access the storage account
+resource storageRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: guid(storageAccount.id, aiAccount.id, 'ai-storage-contributor')
+  scope: storageAccount
+  properties: {
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe') // Storage Blob Data Contributor
+    principalId: aiAccount::aiProject.identity.principalId
+    principalType: 'ServicePrincipal'
+  }
+}
+
+// User permissions - Storage Blob Data Contributor
+resource userStorageRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+  name: guid(storageAccount.id, principalId, 'Storage Blob Data Contributor')
+  scope: storageAccount
+  properties: {
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe') // Storage Blob Data Contributor
+    principalId: principalId
+    principalType: principalType
+  }
+}
+
+// Create the storage connection using the centralized connection module
+module storageConnection '../ai/connection.bicep' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: 'storage-connection-creation'
+  params: {
+    aiServicesAccountName: aiServicesAccountName
+    aiProjectName: aiProjectName
+    connectionConfig: {
+      name: connectionName
+      category: 'AzureStorageAccount'
+      target: storageAccount.properties.primaryEndpoints.blob
+      authType: 'AAD'
+      isSharedToAll: true
+      metadata: {
+        ApiType: 'Azure'
+        ResourceId: storageAccount.id
+        location: storageAccount.location
+      }
+    }
+  }
+}
+
+output storageAccountName string = storageAccount.name
+output storageAccountId string = storageAccount.id
+output storageAccountPrincipalId string = storageAccount.identity.principalId
+output storageConnectionName string = storageConnection.outputs.connectionName
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/main.bicep b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/main.bicep
new file mode 100644
index 000000000000..df29abd59bf6
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/main.bicep
@@ -0,0 +1,239 @@
+targetScope = 'subscription'
+// targetScope = 'resourceGroup'
+
+@minLength(1)
+@maxLength(64)
+@description('Name of the environment that can be used as part of naming resource convention')
+param environmentName string
+
+@minLength(1)
+@maxLength(90)
+@description('Name of the resource group to use or create')
+param resourceGroupName string = 'rg-${environmentName}'
+
+// Restricted locations to match list from
+// https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/responses?tabs=python-key#region-availability
+@minLength(1)
+@description('Primary location for all resources')
+@allowed([
+  'australiaeast'
+  'brazilsouth'
+  'canadacentral'
+  'canadaeast'
+  'eastus'
+  'eastus2'
+  'francecentral'
+  'germanywestcentral'
+  'italynorth'
+  'japaneast'
+  'koreacentral'
+  'northcentralus'
+  'norwayeast'
+  'polandcentral'
+  'southafricanorth'
+  'southcentralus'
+  'southeastasia'
+  'southindia'
+  'spaincentral'
+  'swedencentral'
+  'switzerlandnorth'
+  'uaenorth'
+  'uksouth'
+  'westus'
+  'westus2'
+  'westus3'
+])
+param location string
+
+param aiDeploymentsLocation string
+
+@description('Id of the user or app to assign application roles')
+param principalId string
+
+@description('Principal type of user or app')
+param principalType string
+
+@description('Optional. Name of an existing AI Services account within the resource group. If not provided, a new one will be created.')
+param aiFoundryResourceName string = ''
+
+@description('Optional. Name of the AI Foundry project. If not provided, a default name will be used.')
+param aiFoundryProjectName string = 'ai-project-${environmentName}'
+
+@description('List of model deployments')
+param aiProjectDeploymentsJson string = '[]'
+
+@description('List of connections')
+param aiProjectConnectionsJson string = '[]'
+
+@secure()
+@description('JSON map of connection name to credentials object. Example: {"my-conn":{"key":"secret"}}')
+param aiProjectConnectionCredentialsJson string = '{}'
+
+@description('List of resources to create and connect to the AI project')
+param aiProjectDependentResourcesJson string = '[]'
+
+var aiProjectDeployments = json(aiProjectDeploymentsJson)
+var aiProjectConnections = json(aiProjectConnectionsJson)
+var aiProjectConnectionCreds = json(aiProjectConnectionCredentialsJson)
+var aiProjectDependentResources = json(aiProjectDependentResourcesJson)
+
+@description('Enable hosted agent deployment')
+param enableHostedAgents bool
+
+@description('Enable the capability host for supporting BYO storage of agent conversations. When false and hosted agents are enabled, the capability host is not created.')
+param enableCapabilityHost bool
+
+@description('Enable monitoring for the AI project')
+param enableMonitoring bool
+
+@description('When true, skip Foundry project/role/connection provisioning and reference the existing project read-only. Use when pointing at an existing Foundry project via --project-id.')
+param useExistingAiProject bool = false
+
+@description('Optional. Existing container registry resource ID. If provided, no new ACR will be created and a connection to this ACR will be established.')
+param existingContainerRegistryResourceId string = ''
+
+@description('Optional. Existing container registry endpoint (login server). Required if existingContainerRegistryResourceId is provided.')
+param existingContainerRegistryEndpoint string = ''
+
+@description('Optional. Name of an existing ACR connection on the Foundry project. If provided, no new ACR or connection will be created.')
+param existingAcrConnectionName string = ''
+
+@description('Optional. Existing Application Insights connection string. If provided, a connection will be created but no new App Insights resource.')
+param existingApplicationInsightsConnectionString string = ''
+
+@description('Optional. Existing Application Insights resource ID. Used for connection metadata when providing an existing App Insights.')
+param existingApplicationInsightsResourceId string = ''
+
+@description('Optional. Name of an existing Application Insights connection on the Foundry project. If provided, no new App Insights or connection will be created.')
+param existingAppInsightsConnectionName string = ''
+
+// Tags that should be applied to all resources.
+// 
+// Note that 'azd-service-name' tags should be applied separately to service host resources.
+// Example usage:
+//   tags: union(tags, { 'azd-service-name': <service name in azure.yaml> })
+var tags = {
+  'azd-env-name': environmentName
+}
+
+// Check if resource group exists and create it if it doesn't
+resource rg 'Microsoft.Resources/resourceGroups@2021-04-01' = {
+  name: resourceGroupName
+  location: location
+  tags: tags
+}
+
+// Build dependent resources array conditionally
+// Check if ACR already exists in the user-provided array to avoid duplicates
+// Also skip if user provided an existing container registry endpoint or connection name
+var hasAcr = contains(map(aiProjectDependentResources, r => r.resource), 'registry')
+var shouldCreateAcr = enableHostedAgents && !hasAcr && empty(existingContainerRegistryResourceId) && empty(existingAcrConnectionName)
+var dependentResources = shouldCreateAcr ? union(aiProjectDependentResources, [
+  {
+    resource: 'registry'
+    connectionName: 'acr-${uniqueString(subscription().id, resourceGroupName, location)}'
+  }
+]) : aiProjectDependentResources
+
+// AI Project module — only when creating new resources
+module aiProject 'core/ai/ai-project.bicep' = if (!useExistingAiProject) {
+  scope: rg
+  name: 'ai-project'
+  params: {
+    tags: tags
+    location: aiDeploymentsLocation
+    aiFoundryProjectName: aiFoundryProjectName
+    principalId: principalId
+    principalType: principalType
+    existingAiAccountName: aiFoundryResourceName
+    deployments: aiProjectDeployments
+    connections: aiProjectConnections
+    connectionCredentials: aiProjectConnectionCreds
+    additionalDependentResources: dependentResources
+    enableMonitoring: enableMonitoring
+    enableHostedAgents: enableHostedAgents
+    enableCapabilityHost: enableCapabilityHost
+    existingContainerRegistryResourceId: existingContainerRegistryResourceId
+    existingContainerRegistryEndpoint: existingContainerRegistryEndpoint
+    existingAcrConnectionName: existingAcrConnectionName
+    existingApplicationInsightsConnectionString: existingApplicationInsightsConnectionString
+    existingApplicationInsightsResourceId: existingApplicationInsightsResourceId
+    existingAppInsightsConnectionName: existingAppInsightsConnectionName
+  }
+}
+
+// Existing project module — read-only reference when reusing an existing Foundry project
+module existingAiProject 'core/ai/existing-ai-project.bicep' = if (useExistingAiProject) {
+  scope: rg
+  name: 'existing-ai-project'
+  params: {
+    aiServicesAccountName: aiFoundryResourceName
+    aiFoundryProjectName: aiFoundryProjectName
+    existingAcrConnectionName: existingAcrConnectionName
+    existingContainerRegistryEndpoint: existingContainerRegistryEndpoint
+    existingApplicationInsightsConnectionString: existingApplicationInsightsConnectionString
+    existingApplicationInsightsResourceId: existingApplicationInsightsResourceId
+    connections: aiProjectConnections
+    connectionCredentials: aiProjectConnectionCreds
+  }
+}
+
+// ACR for existing project — create when hosted agents need a registry but the existing project has none
+var shouldCreateAcrForExistingProject = useExistingAiProject && shouldCreateAcr
+var acrConnectionName = 'acr-${uniqueString(subscription().id, resourceGroupName, location)}'
+
+module acrForExistingProject 'core/host/acr.bicep' = if (shouldCreateAcrForExistingProject) {
+  scope: rg
+  name: 'acr-for-existing-project'
+  params: {
+    location: location
+    tags: tags
+    resourceName: 'cr${uniqueString(subscription().id, resourceGroupName, location)}'
+    connectionName: acrConnectionName
+    principalId: principalId
+    principalType: principalType
+    aiServicesAccountName: aiFoundryResourceName
+    aiProjectName: aiFoundryProjectName
+  }
+}
+
+// Resources
+output AZURE_RESOURCE_GROUP string = resourceGroupName
+output AZURE_AI_ACCOUNT_ID string = useExistingAiProject ? existingAiProject.outputs.accountId : aiProject.outputs.accountId
+output AZURE_AI_PROJECT_ID string = useExistingAiProject ? existingAiProject.outputs.projectId : aiProject.outputs.projectId
+output AZURE_AI_FOUNDRY_PROJECT_ID string = useExistingAiProject ? existingAiProject.outputs.projectId : aiProject.outputs.projectId
+output AZURE_AI_ACCOUNT_NAME string = useExistingAiProject ? existingAiProject.outputs.aiServicesAccountName : aiProject.outputs.aiServicesAccountName
+output AZURE_AI_PROJECT_NAME string = useExistingAiProject ? existingAiProject.outputs.projectName : aiProject.outputs.projectName
+
+// Endpoints
+output AZURE_AI_PROJECT_ENDPOINT string = useExistingAiProject ? existingAiProject.outputs.AZURE_AI_PROJECT_ENDPOINT : aiProject.outputs.AZURE_AI_PROJECT_ENDPOINT
+output AZURE_OPENAI_ENDPOINT string = useExistingAiProject ? existingAiProject.outputs.AZURE_OPENAI_ENDPOINT : aiProject.outputs.AZURE_OPENAI_ENDPOINT
+output APPLICATIONINSIGHTS_CONNECTION_STRING string = useExistingAiProject ? existingAiProject.outputs.APPLICATIONINSIGHTS_CONNECTION_STRING : aiProject.outputs.APPLICATIONINSIGHTS_CONNECTION_STRING
+output APPLICATIONINSIGHTS_RESOURCE_ID string = useExistingAiProject ? existingAiProject.outputs.APPLICATIONINSIGHTS_RESOURCE_ID : aiProject.outputs.APPLICATIONINSIGHTS_RESOURCE_ID
+
+// Dependent Resources and Connections
+
+// ACR
+output AZURE_AI_PROJECT_ACR_CONNECTION_NAME string = shouldCreateAcrForExistingProject ? acrForExistingProject.outputs.containerRegistryConnectionName : (useExistingAiProject ? existingAiProject.outputs.dependentResources.registry.connectionName : aiProject.outputs.dependentResources.registry.connectionName)
+output AZURE_CONTAINER_REGISTRY_ENDPOINT string = shouldCreateAcrForExistingProject ? acrForExistingProject.outputs.containerRegistryLoginServer : (useExistingAiProject ? existingAiProject.outputs.dependentResources.registry.loginServer : aiProject.outputs.dependentResources.registry.loginServer)
+
+// Bing Search
+output BING_GROUNDING_CONNECTION_NAME  string = useExistingAiProject ? existingAiProject.outputs.dependentResources.bing_grounding.connectionName : aiProject.outputs.dependentResources.bing_grounding.connectionName
+output BING_GROUNDING_RESOURCE_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.bing_grounding.name : aiProject.outputs.dependentResources.bing_grounding.name
+output BING_GROUNDING_CONNECTION_ID string = useExistingAiProject ? existingAiProject.outputs.dependentResources.bing_grounding.connectionId : aiProject.outputs.dependentResources.bing_grounding.connectionId
+
+// Bing Custom Search
+output BING_CUSTOM_GROUNDING_CONNECTION_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.bing_custom_grounding.connectionName : aiProject.outputs.dependentResources.bing_custom_grounding.connectionName
+output BING_CUSTOM_GROUNDING_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.bing_custom_grounding.name : aiProject.outputs.dependentResources.bing_custom_grounding.name
+output BING_CUSTOM_GROUNDING_CONNECTION_ID string = useExistingAiProject ? existingAiProject.outputs.dependentResources.bing_custom_grounding.connectionId : aiProject.outputs.dependentResources.bing_custom_grounding.connectionId
+
+// Azure AI Search
+output AZURE_AI_SEARCH_CONNECTION_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.search.connectionName : aiProject.outputs.dependentResources.search.connectionName
+output AZURE_AI_SEARCH_SERVICE_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.search.serviceName : aiProject.outputs.dependentResources.search.serviceName
+
+// Azure Storage
+output AZURE_STORAGE_CONNECTION_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.storage.connectionName : aiProject.outputs.dependentResources.storage.connectionName
+output AZURE_STORAGE_ACCOUNT_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.storage.accountName : aiProject.outputs.dependentResources.storage.accountName
+
+// Connections
+output AI_PROJECT_CONNECTION_IDS_JSON string = useExistingAiProject ? string(existingAiProject.outputs.connectionIds) : string(aiProject.outputs.connectionIds)
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/main.parameters.json b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/main.parameters.json
new file mode 100644
index 000000000000..dbf643f3f48f
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/infra/main.parameters.json
@@ -0,0 +1,72 @@
+{
+    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
+    "contentVersion": "1.0.0.0",
+    "parameters": {
+      "resourceGroupName": {
+        "value": "${AZURE_RESOURCE_GROUP}"
+      },
+      "environmentName": {
+        "value": "${AZURE_ENV_NAME}"
+      },
+      "location": {
+        "value": "${AZURE_LOCATION}"
+      },
+      "aiFoundryResourceName": {
+        "value": "${AZURE_AI_ACCOUNT_NAME}"
+      },
+      "aiFoundryProjectName": {
+        "value": "${AZURE_AI_PROJECT_NAME}"
+      },
+      "aiDeploymentsLocation": {
+        "value": "${AZURE_LOCATION}"
+      },
+      "principalId": {
+        "value": "${AZURE_PRINCIPAL_ID}"
+      },
+      "principalType": {
+        "value": "${AZURE_PRINCIPAL_TYPE}"
+      },
+      "aiProjectDeploymentsJson": {
+        "value": "${AI_PROJECT_DEPLOYMENTS=[]}"
+      },
+      "aiProjectConnectionsJson": {
+        "value": "${AI_PROJECT_CONNECTIONS=[]}"
+      },
+      "aiProjectConnectionCredentialsJson": {
+        "value": "${AI_PROJECT_CONNECTION_CREDENTIALS}"
+      },
+      "aiProjectDependentResourcesJson": {
+        "value": "${AI_PROJECT_DEPENDENT_RESOURCES=[]}"
+      },
+      "enableMonitoring": {
+        "value": "${ENABLE_MONITORING=true}"
+      },
+      "enableHostedAgents": {
+        "value": "${ENABLE_HOSTED_AGENTS=false}"
+      },
+      "enableCapabilityHost": {
+        "value": "${ENABLE_CAPABILITY_HOST=true}"
+      },
+      "useExistingAiProject": {
+        "value": "${USE_EXISTING_AI_PROJECT=false}"
+      },
+      "existingContainerRegistryResourceId": {
+        "value": "${AZURE_CONTAINER_REGISTRY_RESOURCE_ID=}"
+      },
+      "existingContainerRegistryEndpoint": {
+        "value": "${AZURE_CONTAINER_REGISTRY_ENDPOINT=}"
+      },
+      "existingAcrConnectionName": {
+        "value": "${AZURE_AI_PROJECT_ACR_CONNECTION_NAME=}"
+      },
+      "existingApplicationInsightsConnectionString": {
+        "value": "${APPLICATIONINSIGHTS_CONNECTION_STRING=}"
+      },
+      "existingApplicationInsightsResourceId": {
+        "value": "${APPLICATIONINSIGHTS_RESOURCE_ID=}"
+      },
+      "existingAppInsightsConnectionName": {
+        "value": "${APPLICATIONINSIGHTS_CONNECTION_NAME=}"
+      }
+    }
+}
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/.gitignore b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/.gitignore
new file mode 100644
index 000000000000..e4f4657e5654
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/.gitignore
@@ -0,0 +1,4 @@
+# Local-run artifacts — never commit these
+.venv/
+.durable/
+out/
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/README.md b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/README.md
new file mode 100644
index 000000000000..349070c1dc9e
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/README.md
@@ -0,0 +1,150 @@
+# Run the durable research agent locally (crash → recover)
+
+This kit runs the invocations `durable-research-agent` **entirely on your
+machine** and demonstrates durable crash-recovery — **without** the hosted
+Foundry task API.
+
+> **Why local?** Durable recovery normally relies on the hosted task-store
+> `/tasks` API. That API is currently returning **403** for hosted agents, which
+> blocks deployed recovery. Off-platform, the framework auto-selects a
+> **file-backed** task store, and the agent persists its per-turn event streams +
+> checkpoints to disk — so the *exact same* recovery code path runs locally with
+> no hosted dependency. Only the LLM sub-calls go to your Foundry project.
+
+## Prerequisites
+
+- Python 3.10+
+- `az login` (the LLM sub-calls use `DefaultAzureCredential`)
+- A Foundry **project endpoint** and a **model deployment** in it
+
+## Quick start (automated demo)
+
+```bash
+cd local
+./setup.sh                          # builds a venv from ../../../../wheels + deps
+
+az login
+export FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project>
+export AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o     # a deployment in that project
+
+./run.sh
+```
+
+`run.sh` drives the whole thing and prints a narrated, verified result:
+
+1. **Start** the agent as a local server (file-backed durable store).
+2. **`POST /invocations {"message": "<topic>"}`** starts a 3-phase research run
+   (one durable checkpoint per phase) and returns an `invocation_id`; the SSE
+   from `GET /invocations/{id}` streams to `out/sse_initial.txt`.
+3. **Crash** it after the first phase checkpoint (`POST {"message": "crash"}`
+   forces `os._exit(137)`).
+4. **Restart** → the startup recovery scan reclaims the in-progress task and
+   re-invokes the handler (`ctx.entry_mode == "recovered"`), reading the
+   persisted phase watermark and resuming at the next un-finished phase.
+5. **Reconnect** with `GET …?last_event_id=<seq>` → `out/sse_resumed.txt` (skips
+   already-seen events), and assert the run emits `recovered` and reaches
+   `run_complete` with all phases done.
+
+Example tail:
+
+```
+[4/4] Reconnecting to the same invocation and verifying the run completes across the crash
+  » recovery confirmed: handler re-invoked, 1 phase(s) already done
+  » resumed checkpoint: phase 2/3 done
+  » resumed checkpoint: phase 3/3 done
+  » terminal event: run_complete (3 phases)
+
+RESULT
+{
+  "pre_crash_checkpoints": 1,
+  "recovered_event_completed_phases": 1,
+  "terminal_event": "run_complete",
+  "phases_completed": 3,
+  "expected_phases": 3,
+  "RECOVERED_FULL_PLAN": true
+}
+
+✓ Durable recovery succeeded — the run completed all phases across a crash.
+```
+
+Tunables (env): `NUM_PHASES` (default 3), `CRASH_AFTER` (default 1 phase
+checkpoint), `PORT` (default 8088), `TARGET_OUTPUT_TOKENS` (default 80).
+
+## Manual exploration
+
+Drive the agent yourself in two terminals.
+
+**Terminal 1 — start the agent:**
+
+```bash
+cd local
+az login
+export FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project>
+export AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o
+./serve.sh
+```
+
+**Terminal 2 — start, stream, crash, reconnect:**
+
+```bash
+# 1) Start a run. Capture the invocation_id from the response.
+INV=$(curl -s http://localhost:8088/invocations \
+  -H 'content-type: application/json' \
+  -d '{"message":"renewable energy supply chains"}' | python -c 'import sys,json;print(json.load(sys.stdin)["invocation_id"])')
+echo "invocation_id=$INV"
+
+# 2) Stream it. Note the highest "sequence_number" before you crash it,
+#    and watch for "type":"phase_end" checkpoints.
+curl -N -s "http://localhost:8088/invocations/$INV"
+
+# 3) In a THIRD terminal, after a phase_end, crash the process:
+curl -s http://localhost:8088/invocations \
+  -H 'content-type: application/json' -d '{"message":"crash"}'
+
+# The server exits (137). Restart it in Terminal 1 (./serve.sh again — SAME
+# session id; serve.sh pins FOUNDRY_AGENT_SESSION_ID). On startup it logs
+# "Reclaimed stale task ... Recovered task ... is now active".
+
+# 4) Reconnect, skipping events you already saw (use the last seq from step 2):
+curl -N -s "http://localhost:8088/invocations/$INV?last_event_id=<last_seq>"
+# First you'll see a {"type":"recovered","completed_phases":N} event, then the
+# remaining phases stream, ending with {"type":"run_complete"}.
+```
+
+> No auth is needed locally. The session is pinned by `FOUNDRY_AGENT_SESSION_ID`
+> (set by `serve.sh`) — both the original run and the restarted process must
+> agree on it for the recovery scan to find the in-progress task.
+
+## How it works locally
+
+`serve.sh` / `run.sh` set the env vars that flip the framework into local mode:
+
+| Env var | Effect |
+|---------|--------|
+| `AGENTSERVER_TASKS_BACKEND=local` | Use the file-backed task store instead of the hosted `/tasks` API. |
+| `AGENTSERVER_DURABLE_ROOT=<dir>` | Where the durable task store lives (`<dir>/tasks`). |
+| `FOUNDRY_AGENT_SESSION_ID=<id>` | The session = the durable task id. Must be identical across restarts. |
+| `DEMO_MODE=1` | Enables the `"crash"` message sentinel. |
+
+The agent additionally persists its per-turn **event streams** and **phase
+checkpoints** under `~/.durable-tasks/` (file-backed replay), so a reconnecting
+client can replay from `last_event_id` after the restart. Recovery works by
+restarting the process against the same task store + session id.
+
+## Files
+
+| File | Purpose |
+|------|---------|
+| `setup.sh` | Create a venv and install the preview wheels + demo deps. |
+| `run.sh` | One-command automated crash → recover → verify demo. |
+| `serve.sh` | Start the agent locally for manual exploration. |
+| `recovery_demo.py` | The orchestrator `run.sh` invokes. |
+
+The agent itself is `../src/durable-research-agent/` (`app.py` = HTTP host,
+`agent.py` = the durable task).
+
+## Troubleshooting
+
+**`Address already in use` / `OSError: [Errno 98]`** — a server is still running
+on the port. `run.sh` auto-picks the next free port; for `serve.sh`, stop the
+old server (`Ctrl-C` in its terminal) or pick another port: `PORT=8090 ./serve.sh`.
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/recovery_demo.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/recovery_demo.py
new file mode 100755
index 000000000000..9e6111226954
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/recovery_demo.py
@@ -0,0 +1,330 @@
+#!/usr/bin/env python3
+"""Local durable crash-recovery demo for the invocations durable-agent-demo.
+
+Runs the `durable-research-agent` **entirely on your machine** — the durable
+task store is file-backed (`AGENTSERVER_TASKS_BACKEND=local`) and the per-turn
+event streams + checkpoints are file-backed under the agent's
+``~/.durable-tasks`` dir, so you do **not** need the hosted Foundry task API
+(the one currently returning 403). Only the LLM sub-calls go to your Foundry
+project, so you need ``az login`` + a project endpoint + a model deployment.
+
+What it demonstrates, automatically, in one run:
+
+  1. Starts the agent as a local server (file-backed durable backend).
+  2. ``POST /invocations {"message": "<topic>"}`` starts a long-running research
+     task (one durable phase checkpoint per research phase). It returns an
+     ``invocation_id``; we open ``GET /invocations/{id}`` and stream the SSE to
+     ``out/sse_initial.txt``, tracking the ``sequence_number`` watermark and the
+     ``phase_end`` checkpoints.
+  3. After a checkpoint lands, ``POST /invocations {"message": "crash"}`` forces
+     ``os._exit(137)``. The stream drops mid-flight.
+  4. Restarts the server against the **same** durable root + session id. On
+     startup the framework's recovery scan reclaims the in-progress task and
+     re-invokes the handler (``ctx.entry_mode == "recovered"``); it reads the
+     persisted phase watermark and resumes at the next un-finished phase.
+  5. Reconnects with ``GET /invocations/{id}?last_event_id=<seq>`` →
+     ``out/sse_resumed.txt`` (skips already-seen events), then asserts the run
+     emits ``recovered`` and reaches ``run_complete`` with all phases done.
+
+Run it via ``./run.sh`` (which sets up the venv + env), or directly:
+
+    FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project> \
+    AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o \
+    python recovery_demo.py
+
+Tunables (env): ``NUM_PHASES`` (default 3), ``CRASH_AFTER`` (default 1 phase
+checkpoint), ``PORT`` (default 8088).
+"""
+from __future__ import annotations
+
+import json
+import os
+import signal
+import subprocess
+import sys
+import threading
+import time
+from pathlib import Path
+
+try:
+    import httpx
+except ImportError:  # pragma: no cover - guided setup
+    sys.exit("httpx is required. Run ./run.sh, or: pip install httpx")
+
+HERE = Path(__file__).resolve().parent
+APP_PY = HERE.parent / "src" / "durable-research-agent" / "app.py"
+
+PORT = int(os.environ.get("PORT", "8088"))
+
+
+def _port_is_free(port: int) -> bool:
+    import socket
+
+    s = socket.socket()
+    try:
+        s.bind(("0.0.0.0", port))
+        return True
+    except OSError:
+        return False
+    finally:
+        s.close()
+
+
+# Auto-pick a free port if the requested one is busy (e.g. a leftover server).
+_requested_port = PORT
+while not _port_is_free(PORT) and PORT < _requested_port + 25:
+    PORT += 1
+if PORT != _requested_port:
+    print(f"  » port {_requested_port} is busy; using {PORT} instead", flush=True)
+
+BASE = f"http://localhost:{PORT}"
+NUM_PHASES = int(os.environ.get("NUM_PHASES", "3"))
+CRASH_AFTER = int(os.environ.get("CRASH_AFTER", "1"))
+DURABLE_ROOT = Path(os.environ.get("DURABLE_ROOT", HERE / ".durable")).resolve()
+OUT_DIR = Path(os.environ.get("OUT_DIR", HERE / "out")).resolve()
+SESSION_ID = os.environ.get("FOUNDRY_AGENT_SESSION_ID", "local-demo-session")
+TOPIC = os.environ.get("TOPIC", "The impact of renewable energy adoption on global supply chains")
+# The agent hard-codes its streams + checkpoints under ~/.durable-tasks.
+AGENT_STATE_DIR = Path.home() / ".durable-tasks"
+
+if "FOUNDRY_PROJECT_ENDPOINT" not in os.environ:
+    sys.exit(
+        "FOUNDRY_PROJECT_ENDPOINT is required (your Foundry project endpoint for the LLM\n"
+        "sub-calls). Run `az login` first, then set it. See README.md."
+    )
+
+# Child-process env: real LLM via the project endpoint, durability local.
+# FOUNDRY_AGENT_SESSION_ID pins the task's session across both lifetimes so the
+# restarted process's recovery scan finds the in-progress task to re-invoke.
+CHILD_ENV = {
+    **os.environ,
+    "AZURE_AI_MODEL_DEPLOYMENT_NAME": os.environ.get("AZURE_AI_MODEL_DEPLOYMENT_NAME", "gpt-4o"),
+    "DEMO_MODE": "1",  # enables the "crash" message sentinel in app.py
+    "AGENTSERVER_TASKS_BACKEND": "local",
+    "AGENTSERVER_DURABLE_ROOT": str(DURABLE_ROOT),
+    "FOUNDRY_AGENT_SESSION_ID": SESSION_ID,
+    "INTRA_PHASE_COOLDOWN_SEC": os.environ.get("INTRA_PHASE_COOLDOWN_SEC", "1"),
+    "INTER_PHASE_COOLDOWN_SEC": os.environ.get("INTER_PHASE_COOLDOWN_SEC", "1"),
+    "TARGET_OUTPUT_TOKENS": os.environ.get("TARGET_OUTPUT_TOKENS", "80"),
+    "NUM_PHASES": str(NUM_PHASES),
+    "PORT": str(PORT),
+}
+
+st = {"inv": None, "max_seq": 0, "checkpoints": 0, "crashed": False}
+
+
+def log(*a: object) -> None:
+    print("  »", *a, flush=True)
+
+
+def banner(text: str) -> None:
+    print(f"\n\033[1m{text}\033[0m", flush=True)
+
+
+def wait_port(timeout: float = 45.0) -> bool:
+    t0 = time.time()
+    while time.time() - t0 < timeout:
+        try:
+            httpx.get(f"{BASE}/invocations/_ping", timeout=2)
+            return True
+        except Exception:
+            time.sleep(0.5)
+    return False
+
+
+def start_server(tag: str) -> subprocess.Popen:
+    OUT_DIR.mkdir(parents=True, exist_ok=True)
+    logf = open(OUT_DIR / f"server_{tag}.log", "w")
+    proc = subprocess.Popen(
+        [sys.executable, str(APP_PY)],
+        env=CHILD_ENV,
+        stdout=logf,
+        stderr=subprocess.STDOUT,
+        start_new_session=True,
+    )
+    if not wait_port():
+        raise RuntimeError(f"server '{tag}' did not come up — see {OUT_DIR / f'server_{tag}.log'}")
+    log(f"server '{tag}' is up (pid {proc.pid}), logs -> out/server_{tag}.log")
+    return proc
+
+
+def parse_frame(frame: str):
+    data = None
+    for line in frame.split("\n"):
+        if line.startswith("data:"):
+            data = line[5:].strip()
+    if data is None:
+        return {}
+    try:
+        return json.loads(data)
+    except Exception:
+        return {}
+
+
+def start_run() -> None:
+    r = httpx.post(f"{BASE}/invocations", json={"message": TOPIC}, timeout=30)
+    body = r.json()
+    st["inv"] = body.get("invocation_id")
+    log(f"started run (HTTP {r.status_code}); invocation_id={st['inv']}")
+
+
+def inject_crash() -> None:
+    log("injecting crash (POST message='crash') ...")
+    try:
+        httpx.post(f"{BASE}/invocations", json={"message": "crash"}, timeout=10)
+    except Exception as exc:
+        log(f"crash request returned/disconnected (expected): {type(exc).__name__}")
+    st["crashed"] = True
+
+
+def stream_initial() -> None:
+    f = open(OUT_DIR / "sse_initial.txt", "w")
+    buf = ""
+    try:
+        with httpx.stream("GET", f"{BASE}/invocations/{st['inv']}", timeout=None) as r:
+            log(f"initial stream opened (HTTP {r.status_code})")
+            for chunk in r.iter_text():
+                if not chunk:
+                    continue
+                f.write(chunk)
+                f.flush()
+                buf += chunk
+                while "\n\n" in buf:
+                    frame, buf = buf.split("\n\n", 1)
+                    data = parse_frame(frame)
+                    seq = data.get("sequence_number")
+                    if isinstance(seq, int):
+                        st["max_seq"] = max(st["max_seq"], seq)
+                    if data.get("type") == "phase_end":
+                        st["checkpoints"] += 1
+                        log(f"checkpoint: phase {data.get('phase')}/{data.get('total')} done (seq={st['max_seq']})")
+                        if st["checkpoints"] == CRASH_AFTER and not st["crashed"]:
+                            threading.Thread(target=inject_crash, daemon=True).start()
+    except Exception as exc:
+        log(f"initial stream dropped: {type(exc).__name__} (this is the crash)")
+    finally:
+        f.close()
+
+
+def reconnect_and_verify() -> bool:
+    starting_after = st["max_seq"]
+    log(f"reconnecting: GET /invocations/{st['inv']}?last_event_id={starting_after}")
+    f = open(OUT_DIR / "sse_resumed.txt", "w")
+    buf = ""
+    saw_recovered = None
+    terminal = None
+    phases_completed = None
+    deadline = time.time() + 240
+    try:
+        with httpx.stream(
+            "GET",
+            f"{BASE}/invocations/{st['inv']}",
+            params={"last_event_id": starting_after},
+            timeout=None,
+        ) as r:
+            log(f"reconnect stream opened (HTTP {r.status_code})")
+            for chunk in r.iter_text():
+                if time.time() > deadline:
+                    log("reconnect deadline reached")
+                    break
+                if not chunk:
+                    continue
+                f.write(chunk)
+                f.flush()
+                buf += chunk
+                while "\n\n" in buf:
+                    frame, buf = buf.split("\n\n", 1)
+                    data = parse_frame(frame)
+                    t = data.get("type")
+                    if t == "recovered" and saw_recovered is None:
+                        saw_recovered = data.get("completed_phases")
+                        log(f"recovery confirmed: handler re-invoked, {saw_recovered} phase(s) already done")
+                    if t == "phase_end":
+                        log(f"resumed checkpoint: phase {data.get('phase')}/{data.get('total')} done")
+                    if t == "run_complete":
+                        terminal = t
+                        phases_completed = data.get("phases_completed")
+                        log(f"terminal event: run_complete ({phases_completed} phases)")
+                        break
+    except Exception as exc:
+        log(f"reconnect stream ended: {type(exc).__name__}")
+    finally:
+        f.close()
+
+    ok = terminal == "run_complete" and phases_completed == NUM_PHASES
+    st["_summary"] = {
+        "invocation_id": st["inv"],
+        "pre_crash_checkpoints": st["checkpoints"],
+        "pre_crash_max_seq": st["max_seq"],
+        "recovered_event_completed_phases": saw_recovered,
+        "terminal_event": terminal,
+        "phases_completed": phases_completed,
+        "expected_phases": NUM_PHASES,
+        "RECOVERED_FULL_PLAN": ok,
+    }
+    return ok
+
+
+def _clean(d: Path) -> None:
+    if not d.exists():
+        return
+    for p in sorted(d.rglob("*"), reverse=True):
+        try:
+            p.unlink() if p.is_file() else p.rmdir()
+        except OSError:
+            pass
+
+
+def main() -> int:
+    if not APP_PY.exists():
+        sys.exit(f"agent entrypoint not found: {APP_PY}")
+    # Fresh state each run: task store + the agent's stream/checkpoint dirs.
+    DURABLE_ROOT.mkdir(parents=True, exist_ok=True)
+    OUT_DIR.mkdir(parents=True, exist_ok=True)
+    for sub in ("tasks", "responses", "streams"):
+        _clean(DURABLE_ROOT / sub)
+    _clean(AGENT_STATE_DIR / "_streams")
+    _clean(AGENT_STATE_DIR / "_checkpoints")
+
+    banner(f"[1/4] Starting local durable agent (task store {DURABLE_ROOT}, session {SESSION_ID})")
+    p1 = start_server("1")
+
+    banner(f"[2/4] Starting a {NUM_PHASES}-phase research run; will crash after {CRASH_AFTER} phase checkpoint(s)")
+    start_run()
+    stream_initial()
+    log(f"pre-crash watermark: {st['checkpoints']} checkpoint(s), max seq {st['max_seq']}, invocation {st['inv']}")
+    for _ in range(60):
+        if p1.poll() is not None:
+            log(f"server '1' exited (rc={p1.returncode}) — crash confirmed")
+            break
+        time.sleep(0.5)
+    else:
+        log("server '1' still alive; killing it to simulate the crash")
+        os.killpg(os.getpgid(p1.pid), signal.SIGKILL)
+    time.sleep(2)
+
+    banner("[3/4] Restarting the agent — startup recovery scan reclaims the in-progress task")
+    p2 = start_server("2")
+    log("giving recovery a moment to re-invoke the handler ...")
+    time.sleep(8)
+
+    banner("[4/4] Reconnecting to the same invocation and verifying the run completes across the crash")
+    ok = reconnect_and_verify()
+
+    try:
+        os.killpg(os.getpgid(p2.pid), signal.SIGTERM)
+    except Exception:
+        pass
+
+    banner("RESULT")
+    print(json.dumps(st["_summary"], indent=2))
+    print(f"\nSSE transcripts: {OUT_DIR / 'sse_initial.txt'}  +  {OUT_DIR / 'sse_resumed.txt'}")
+    if ok:
+        print("\n\033[32m✓ Durable recovery succeeded — the run completed all phases across a crash.\033[0m")
+        return 0
+    print("\n\033[31m✗ Recovery did not complete the run — inspect out/server_2.log.\033[0m")
+    return 1
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/run.sh b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/run.sh
new file mode 100755
index 000000000000..2fb0acebc6e1
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/run.sh
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+# ─────────────────────────────────────────────────────────────────────────────
+# Automated end-to-end durable crash-recovery demo:
+#   start agent (local store) -> run -> crash -> restart -> recover -> verify.
+#
+#   az login
+#   export FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project>
+#   export AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o
+#   ./run.sh
+#
+# Tunables (env): NUM_PHASES (default 3), CRASH_AFTER (default 1), PORT (8088).
+# ─────────────────────────────────────────────────────────────────────────────
+set -euo pipefail
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+VENV="${VENV:-$HERE/.venv}"
+
+if [[ ! -d "$VENV" ]]; then
+    echo "venv not found at $VENV — run ./setup.sh first." >&2
+    exit 1
+fi
+: "${FOUNDRY_PROJECT_ENDPOINT:?set FOUNDRY_PROJECT_ENDPOINT (your Foundry project endpoint) and run 'az login' first}"
+
+exec "$VENV/bin/python" "$HERE/recovery_demo.py"
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/serve.sh b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/serve.sh
new file mode 100755
index 000000000000..83b1b34b887d
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/serve.sh
@@ -0,0 +1,48 @@
+#!/usr/bin/env bash
+# ─────────────────────────────────────────────────────────────────────────────
+# Start the durable research agent locally (file-backed durable store, no hosted
+# task API) so you can drive it yourself. See README.md "Manual exploration".
+#
+#   az login
+#   export FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project>
+#   export AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o
+#   ./serve.sh
+# ─────────────────────────────────────────────────────────────────────────────
+set -euo pipefail
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+VENV="${VENV:-$HERE/.venv}"
+APP="$HERE/../src/durable-research-agent/app.py"
+
+if [[ ! -d "$VENV" ]]; then
+    echo "venv not found at $VENV — run ./setup.sh first." >&2
+    exit 1
+fi
+: "${FOUNDRY_PROJECT_ENDPOINT:?set FOUNDRY_PROJECT_ENDPOINT (your Foundry project endpoint) and run 'az login' first}"
+
+# Local durable backend — this is what removes the hosted /tasks API dependency.
+export AGENTSERVER_TASKS_BACKEND=local
+export AGENTSERVER_DURABLE_ROOT="${AGENTSERVER_DURABLE_ROOT:-$HERE/.durable}"
+# Pin the session so a restart's recovery scan finds the in-progress task.
+export FOUNDRY_AGENT_SESSION_ID="${FOUNDRY_AGENT_SESSION_ID:-local-demo-session}"
+# Enables the "crash" message sentinel so you can trigger a crash on demand.
+export DEMO_MODE=1
+export AZURE_AI_MODEL_DEPLOYMENT_NAME="${AZURE_AI_MODEL_DEPLOYMENT_NAME:-gpt-4o}"
+export NUM_PHASES="${NUM_PHASES:-3}"
+export INTRA_PHASE_COOLDOWN_SEC="${INTRA_PHASE_COOLDOWN_SEC:-1}"
+export INTER_PHASE_COOLDOWN_SEC="${INTER_PHASE_COOLDOWN_SEC:-1}"
+export TARGET_OUTPUT_TOKENS="${TARGET_OUTPUT_TOKENS:-80}"
+export PORT="${PORT:-8088}"
+
+# Fail fast with a clear message if the port is already taken.
+if "$VENV/bin/python" -c "import socket,sys; s=socket.socket(); r=s.connect_ex(('127.0.0.1', ${PORT})); s.close(); sys.exit(0 if r==0 else 1)"; then
+    echo "Port ${PORT} is already in use (a server may still be running). Stop it, or pick another port: PORT=8090 ./serve.sh" >&2
+    exit 1
+fi
+
+echo "Starting durable research agent on http://localhost:${PORT}"
+echo "  task store   : ${AGENTSERVER_DURABLE_ROOT}/tasks   (streams + checkpoints under ~/.durable-tasks)"
+echo "  session id   : ${FOUNDRY_AGENT_SESSION_ID}  (must match across restarts to recover)"
+echo "  crash input  : POST /invocations {\"message\":\"crash\"}   (DEMO_MODE=1)"
+echo "  stop         : Ctrl-C"
+exec "$VENV/bin/python" "$APP"
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/setup.sh b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/setup.sh
new file mode 100755
index 000000000000..33a1327faab3
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/local/setup.sh
@@ -0,0 +1,34 @@
+#!/usr/bin/env bash
+# ─────────────────────────────────────────────────────────────────────────────
+# One-time setup: create a local venv and install the preview wheels + the
+# demo's runtime dependencies. Re-run any time to refresh.
+#
+#   ./setup.sh
+#
+# Override the interpreter or venv location:
+#   PYTHON=python3.12 VENV=/tmp/durable-inv-venv ./setup.sh
+# ─────────────────────────────────────────────────────────────────────────────
+set -euo pipefail
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+WHEELS="$(cd "$HERE/../../../../wheels" && pwd)"
+VENV="${VENV:-$HERE/.venv}"
+PYTHON="${PYTHON:-python3}"
+
+echo "==> Creating venv: $VENV"
+"$PYTHON" -m venv "$VENV"
+"$VENV/bin/pip" install --quiet --upgrade pip
+
+echo "==> Installing preview wheels from: $WHEELS"
+"$VENV/bin/pip" install --quiet "$WHEELS"/*.whl
+
+echo "==> Installing demo runtime deps (azure-ai-projects, azure-identity, httpx)"
+"$VENV/bin/pip" install --quiet azure-ai-projects==2.0.1 azure-identity==1.25.3 httpx
+
+echo ""
+echo "Done. Next:"
+echo "  az login"
+echo "  export FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project>"
+echo "  export AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o   # a model deployment in that project"
+echo "  ./run.sh        # automated crash -> recover demo"
+echo "  ./serve.sh      # or run the agent yourself for manual exploration"
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/Dockerfile b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/Dockerfile
new file mode 100644
index 000000000000..5407c8a01553
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/Dockerfile
@@ -0,0 +1,22 @@
+FROM python:3.12-slim
+
+WORKDIR /app
+
+# Install local wheel packages first (built by build.sh before docker build)
+COPY wheels/ /tmp/wheels/
+RUN pip install --no-cache-dir /tmp/wheels/*.whl && rm -rf /tmp/wheels
+
+# Install remaining dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+COPY app.py agent.py store.py ./
+
+EXPOSE 8088
+
+# This is a demo image — enables the "crash" sentinel handling.
+# A production image would leave this off (default).
+ENV DEMO_MODE=1
+
+# Platform nanny worker handles restart on crash; we just run the agent.
+CMD ["python", "app.py"]
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/agent.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/agent.py
new file mode 100644
index 000000000000..31e02be1d9f6
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/agent.py
@@ -0,0 +1,510 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""The durable research task — crash-resilient, steerable, long-running.
+
+Streaming uses the SDK ``streams`` registry: events for a given turn
+are emitted to ``streams.get_or_create(invocation_id)``. The HTTP
+layer subscribes to the same stream by id (see ``app.py``). On crash
+recovery, ``stream.last_cursor()`` rehydrates the in-process sequence
+counter from disk so we resume numbering from where we left off — no
+gap, no duplicate cursor value.
+
+Per the durable-task primitive's persistence model (see
+``core/docs/durable-task-guide.md``), ``ctx.metadata`` is a
+*small-watermark* store — never a bulk-data store. This handler
+keeps only three small integer watermarks in ``ctx.metadata``
+(``completed_phases``, ``in_progress_phase``, ``completed_subcalls``)
+and parks the in-flight subcall text (potentially several KB) in a
+separate file-backed :class:`CheckpointStore` keyed by the per-turn
+``invocation_id``. The checkpoint-store entry, the wire stream, and
+the metadata watermarks are all reset together at every turn-
+completion boundary (normal completion AND wind-down-via-suspend) so
+the next turn — steered re-entry or otherwise — starts cleanly. We
+explicitly do NOT reset on crash paths: the watermarks left behind
+are exactly what the recovery re-entry needs to resume mid-turn.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+import time
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any, Awaitable, Callable
+
+from azure.ai.projects.aio import AIProjectClient
+from azure.identity.aio import DefaultAzureCredential
+
+from azure.ai.agentserver.core.durable import TaskContext, multi_turn_task
+from azure.ai.agentserver.core.streaming import streams
+
+from store import CheckpointStore
+
+logger = logging.getLogger(__name__)
+
+
+# --- Server wall-clock helpers ----------------------------------------------
+
+_APP_STARTED_MONOTONIC = time.monotonic()
+
+
+def _now_iso() -> str:
+    """UTC ISO-8601 timestamp with millisecond precision and Z suffix."""
+    now = datetime.now(timezone.utc)
+    return now.strftime("%Y-%m-%dT%H:%M:%S.") + f"{now.microsecond // 1000:03d}Z"
+
+
+def _server_uptime_sec() -> float:
+    """Seconds since this Python process started (resets to ~0 after crash)."""
+    return round(time.monotonic() - _APP_STARTED_MONOTONIC, 1)
+
+
+# --- Azure AI client setup --------------------------------------------------
+
+_endpoint = os.environ.get("FOUNDRY_PROJECT_ENDPOINT")
+if not _endpoint:
+    raise EnvironmentError("FOUNDRY_PROJECT_ENDPOINT is required.")
+
+_model = os.environ.get("AZURE_AI_MODEL_DEPLOYMENT_NAME", "gpt-4o")
+_credential = DefaultAzureCredential()
+_project_client = AIProjectClient(endpoint=_endpoint, credential=_credential)
+_openai_client = _project_client.get_openai_client()
+
+
+# --- File-backed checkpoint store (heavy artifacts live here) --------------
+
+# Co-located with the streams directory so a single mount/volume
+# carries everything the handler needs to survive a restart. The
+# directory is created on first write.
+_CHECKPOINT_DIR = Path.home() / ".durable-tasks" / "_checkpoints"
+_checkpoint_store = CheckpointStore(_CHECKPOINT_DIR)
+
+
+# --- Research phase plan ----------------------------------------------------
+
+PHASE_TITLES = [
+    "Decomposing topic into focused research questions",
+    "Surveying foundational literature and key concepts",
+    "Identifying leading researchers and institutions",
+    "Mapping the historical trajectory of the field",
+    "Analyzing recent breakthroughs and publications",
+    "Examining competing theories and methodological debates",
+    "Evaluating experimental evidence and data quality",
+    "Mapping connections to adjacent fields",
+    "Identifying open problems and knowledge gaps",
+    "Assessing real-world applications and current adoption",
+    "Analyzing funding landscape and research trends",
+    "Surveying ethical considerations and societal implications",
+    "Projecting near-term and long-term outlook",
+    "Synthesizing findings into a coherent narrative",
+    "Generating key insights and concrete recommendations",
+]
+
+_SUB_CALL_ROLES = [
+    ("research",
+     "Conduct an in-depth investigation of the assigned aspect. Include "
+     "specific findings, examples, and references where you can. Aim for "
+     "substantive, multi-paragraph content."),
+    ("critique",
+     "Critically evaluate the research above. Identify weak claims, gaps, "
+     "competing interpretations, and quality concerns. Be specific."),
+    ("refine",
+     "Revise the original research, incorporating the critique. Strengthen "
+     "weak claims, address gaps, and clarify uncertainty. Produce a "
+     "tightened, more rigorous version."),
+    ("synthesize",
+     "Distill the refined material into 2-3 paragraphs of key takeaways "
+     "suitable for someone briefing a decision-maker on this phase."),
+]
+
+NUM_PHASES = max(1, int(os.environ.get("NUM_PHASES", str(len(PHASE_TITLES)))))
+CALLS_PER_PHASE = max(1, min(len(_SUB_CALL_ROLES),
+                             int(os.environ.get("CALLS_PER_PHASE", "4"))))
+TARGET_OUTPUT_TOKENS = int(os.environ.get("TARGET_OUTPUT_TOKENS", "1500"))
+INTRA_PHASE_COOLDOWN_SEC = float(os.environ.get("INTRA_PHASE_COOLDOWN_SEC", "10"))
+INTER_PHASE_COOLDOWN_SEC = float(os.environ.get("INTER_PHASE_COOLDOWN_SEC", "20"))
+
+
+def _phase_title(i: int) -> str:
+    return PHASE_TITLES[i] if i < len(PHASE_TITLES) else f"Continued research (phase {i + 1})"
+
+
+# --- The durable task -------------------------------------------------------
+
+# Type alias: the per-turn emit function the helpers below take. It
+# wraps stream.emit() with auto-increment of ``sequence_number``.
+EmitFn = Callable[[dict], Awaitable[None]]
+
+
+async def _finish_turn(stream: Any, ctx: TaskContext, inv_id: str) -> None:
+    """Tear down per-turn resources at every non-crash exit.
+
+    Steered re-entries, operator cancels, timeouts, and normal
+    completions all flow through here. We:
+
+    1. Close the wire stream so SSE subscribers see the terminator
+       before the framework reports the turn as suspended / completed.
+    2. Wipe ``ctx.metadata`` watermarks so the NEXT turn — steered
+       re-entry on the same task, or a fresh ``start()`` — naturally
+       starts at phase 0 without any "is this a steered turn?"
+       branching.
+    3. Delete this invocation's checkpoint-store entry so disk
+       usage doesn't grow with completed turns.
+
+    We explicitly do NOT call this on crash paths: the wire stream
+    must stay OPEN (per the orchestrator's
+    ``leave_stream_open_for_recovery`` contract) and the watermarks
+    must remain so the recovery re-entry can resume mid-turn.
+    """
+    await stream.close()
+    ctx.metadata.pop("completed_phases", None)
+    ctx.metadata.pop("in_progress_phase", None)
+    ctx.metadata.pop("completed_subcalls", None)
+    _checkpoint_store.delete(inv_id)
+
+
+@multi_turn_task(
+    name="deep_research",
+    steerable=True,
+)
+async def deep_research(ctx: TaskContext[dict]) -> None:
+    """Long-running deep-research task: crash-resilient, steerable.
+
+    Checkpointing is **per subcall**, not just per phase. After each
+    LLM subcall finishes we (a) advance the three small integer
+    watermarks on ``ctx.metadata`` (``completed_phases``,
+    ``in_progress_phase``, ``completed_subcalls``) and (b) write the
+    in-flight phase text to the file-backed checkpoint store keyed by
+    the per-invocation id. On recovery we resume the in-progress    phase at the next un-finished subcall, re-using the text we had
+    streamed before the crash — so the worst case is one wasted
+    subcall (the one that was actively streaming when the container
+    died).
+
+    Steering is transparent: a new POST while a turn is running
+    enqueues the input on the framework's steering queue and sets
+    ``ctx.cancel``. The handler observes the cancel at the next
+    checkpoint, winds down via a bare ``return`` (which calls
+    :func:`_finish_turn` first to clear all per-turn state), and the
+    framework re-enters the body with the new ``ctx.input`` on the
+    queued steering input. Because state was cleared at suspend, the
+    re-entered handler naturally starts the new topic at phase 0 —
+    no ``is_steered_turn`` check needed in handler code.
+
+    The body returns ``None`` on both normal completion AND the
+    wind-down path. Multi-turn ``return X`` is the framework's only
+    end-of-turn signal: the chain transitions to ``suspended`` with
+    the next turn's input queued (or stays suspended awaiting a
+    future ``.start`` / ``.run`` if nothing is queued). Clients read
+    progress + final content from the per-invocation SSE stream, not
+    from the task's terminal output, so there is no return-value
+    payload to construct.
+    """
+    topic: str = ctx.input["topic"]
+    inv_id: str = ctx.input["invocation_id"]
+
+    stream = await streams.get_or_create(inv_id)
+    # On crash recovery, last_cursor() returns the highest
+    # sequence_number that made it to disk before the crash.
+    last_cursor = await stream.last_cursor()
+    seq = last_cursor or 0
+
+    async def emit(payload: dict) -> None:
+        nonlocal seq
+        seq += 1
+        await stream.emit({"sequence_number": seq, **payload})
+
+    await _emit_run_start(emit, ctx, topic=topic)
+
+    try:
+        completed: int = ctx.metadata.get("completed_phases", 0)
+
+        if ctx.entry_mode == "recovered" and completed > 0:
+            await emit({
+                "type": "recovered",
+                "completed_phases": completed,
+                "total_phases": NUM_PHASES,
+                "server_time_utc": _now_iso(),
+                "server_uptime_sec": _server_uptime_sec(),
+            })
+
+        for phase_idx in range(completed, NUM_PHASES):
+            if ctx.cancel.is_set():
+                return await _wind_down(emit, stream, ctx, inv_id, phase_idx)
+
+            phase_started_mono = time.monotonic()
+            title = _phase_title(phase_idx)
+
+            await emit({
+                "type": "phase_start",
+                "phase": phase_idx + 1,
+                "total": NUM_PHASES,
+                "title": title,
+                "server_time_utc": _now_iso(),
+                "server_uptime_sec": _server_uptime_sec(),
+            })
+
+            await _run_phase(emit, ctx, inv_id, phase_idx, topic, title)
+
+            # --- PHASE-COMPLETE CHECKPOINT ---
+            # Advance the phase watermark, clear the in-phase watermarks +
+            # the checkpoint-store entry. The next iteration starts at
+            # phase_idx+1 with no in-flight text to resume.
+            ctx.metadata["completed_phases"] = phase_idx + 1
+            ctx.metadata["in_progress_phase"] = None
+            ctx.metadata["completed_subcalls"] = 0
+            _checkpoint_store.delete(inv_id)
+            await ctx.metadata.flush()
+
+            phase_duration = round(time.monotonic() - phase_started_mono, 1)
+            await emit({
+                "type": "phase_end",
+                "phase": phase_idx + 1,
+                "total": NUM_PHASES,
+                "title": title,
+                "server_time_utc": _now_iso(),
+                "server_uptime_sec": _server_uptime_sec(),
+                "duration_sec": phase_duration,
+            })
+
+            if ctx.cancel.is_set():
+                return await _wind_down(emit, stream, ctx, inv_id, phase_idx + 1)
+
+            if phase_idx + 1 < NUM_PHASES and INTER_PHASE_COOLDOWN_SEC > 0:
+                await _cooldown(
+                    emit, ctx, INTER_PHASE_COOLDOWN_SEC,
+                    stage="inter_phase",
+                    phase=phase_idx + 2,
+                    total=NUM_PHASES,
+                )
+                if ctx.cancel.is_set():
+                    return await _wind_down(emit, stream, ctx, inv_id, phase_idx + 1)
+
+        await emit({
+            "type": "run_complete",
+            "server_time_utc": _now_iso(),
+            "server_uptime_sec": _server_uptime_sec(),
+            "phases_completed": NUM_PHASES,
+        })
+        # Normal completion: close stream + wipe watermarks + clear
+        # checkpoint entry. Skipped on crash (the handler exits via an
+        # exception and the orchestrator's leave_stream_open_for_recovery
+        # path keeps the stream open for the next-lifetime recovery).
+        await _finish_turn(stream, ctx, inv_id)
+    except Exception as exc:  # pylint: disable=broad-except
+        # Logical-failure path: a downstream call (e.g. the LLM) raised.
+        # Emit a terminal SSE frame so subscribers fast-fail instead of
+        # hanging on the open stream, then close the stream and re-raise
+        # so the framework records the task as failed.
+        #
+        # We catch ``Exception`` (not ``BaseException``) so cooperative
+        # cancellation (``asyncio.CancelledError``) and process death
+        # (SIGKILL, where the handler doesn't run at all) still flow
+        # through their normal paths — the orchestrator's
+        # ``leave_stream_open_for_recovery`` contract still holds for
+        # true crashes.
+        logger.exception("deep_research task failed; emitting terminal SSE frame")
+        try:
+            await emit({
+                "type": "run_failed",
+                "error": {
+                    "type": type(exc).__name__,
+                    "message": str(exc)[:2000],
+                },
+                "server_time_utc": _now_iso(),
+                "server_uptime_sec": _server_uptime_sec(),
+            })
+            await _finish_turn(stream, ctx, inv_id)
+        except Exception:  # pylint: disable=broad-except
+            # If terminal-frame emission itself fails (e.g. stream is
+            # already gone) we still want to surface the original task
+            # failure rather than swallow it.
+            logger.exception("failed to emit terminal run_failed frame")
+        raise
+
+
+# --- Helpers ---------------------------------------------------------------
+
+async def _emit_run_start(
+    emit: "EmitFn", ctx: "TaskContext", *, topic: str,
+) -> None:
+    await emit({
+        "type": "run_start",
+        "topic": topic,
+        "entry_mode": ctx.entry_mode,
+        "total_phases": NUM_PHASES,
+        "calls_per_phase": CALLS_PER_PHASE,
+        "server_time_utc": _now_iso(),
+        "server_uptime_sec": _server_uptime_sec(),
+    })
+
+
+async def _wind_down(
+    emit: "EmitFn", stream, ctx: "TaskContext", inv_id: str,
+    completed_phases: int,
+):
+    """Cooperative wind-down at a phase boundary.
+
+    Tears down per-turn resources (stream close + metadata wipe +
+    checkpoint-store clear) via :func:`_finish_turn` BEFORE returning
+    so the SSE subscriber observes a clean terminator before the
+    framework reports the turn as suspended, and so the steered
+    re-entry (or any future ``.start`` / ``.run``) finds metadata
+    wiped.
+    """
+    if ctx.timeout_exceeded:
+        cause = "timeout"
+    elif ctx.cancel_requested:
+        cause = "operator_cancel"
+    else:
+        cause = "steering"
+
+    await emit({
+        "type": "winding_down",
+        "cause": cause,
+        "completed_phases": completed_phases,
+        "total_phases": NUM_PHASES,
+        "pending_steering_inputs": ctx.pending_input_count,
+        "server_time_utc": _now_iso(),
+        "server_uptime_sec": _server_uptime_sec(),
+    })
+
+    await _finish_turn(stream, ctx, inv_id)
+    return None
+
+
+async def _cooldown(
+    emit: "EmitFn",
+    ctx: "TaskContext",
+    duration_sec: float,
+    *,
+    stage: str,
+    phase: int,
+    total: int,
+    subcall=None,
+    of=None,
+) -> None:
+    """Cooldown wait with a visible client-side marker."""
+    payload = {
+        "type": "cooldown",
+        "duration_sec": duration_sec,
+        "stage": stage,
+        "phase": phase,
+        "total": total,
+        "server_time_utc": _now_iso(),
+        "server_uptime_sec": _server_uptime_sec(),
+    }
+    if subcall is not None:
+        payload["subcall"] = subcall
+    if of is not None:
+        payload["of"] = of
+    await emit(payload)
+    try:
+        await asyncio.wait_for(ctx.cancel.wait(), timeout=duration_sec)
+    except asyncio.TimeoutError:
+        pass
+
+
+async def _run_phase(
+    emit: "EmitFn",
+    ctx: "TaskContext",
+    inv_id: str,
+    phase_idx: int,
+    topic: str,
+    phase_title: str,
+) -> None:
+    """Run the sub-call loop for one phase.
+
+    Checkpoints after each completed subcall so a crash mid-phase
+    recovers at the next un-finished subcall (loses at most the one
+    that was actively streaming). The in-flight phase text lives in
+    the file-backed checkpoint store keyed by ``inv_id``; the
+    subcall index lives in ``ctx.metadata`` as a small watermark.
+    """
+    in_progress = ctx.metadata.get("in_progress_phase")
+    if in_progress == phase_idx:
+        start_sub = int(ctx.metadata.get("completed_subcalls", 0) or 0)
+        current_text = _checkpoint_store.get(inv_id)
+    else:
+        start_sub = 0
+        current_text = ""
+        ctx.metadata["in_progress_phase"] = phase_idx
+        ctx.metadata["completed_subcalls"] = 0
+        _checkpoint_store.delete(inv_id)
+        await ctx.metadata.flush()
+
+    for sub_idx in range(start_sub, CALLS_PER_PHASE):
+        role_name, role_prompt = _SUB_CALL_ROLES[sub_idx]
+        instructions = (
+            "You are a research analyst working on the topic: '" + topic + "'.\n"
+            "Current phase: '" + phase_title + "'.\n"
+            "Your role in this sub-step: " + role_name + ".\n\n"
+            + role_prompt
+        )
+        if current_text:
+            user_input = (
+                "Topic: " + topic + "\nPhase: " + phase_title + "\n\n"
+                "Previous sub-step output:\n" + current_text
+            )
+        else:
+            user_input = "Topic: " + topic + "\nPhase: " + phase_title
+
+        await emit({
+            "type": "subcall_start",
+            "role": role_name,
+            "index": sub_idx + 1,
+            "of": CALLS_PER_PHASE,
+            "server_time_utc": _now_iso(),
+        })
+
+        sub_text = await _stream_llm(
+            emit, instructions=instructions, user_input=user_input,
+        )
+
+        await emit({
+            "type": "subcall_end",
+            "role": role_name,
+            "index": sub_idx + 1,
+            "of": CALLS_PER_PHASE,
+            "server_time_utc": _now_iso(),
+        })
+
+        current_text = sub_text
+
+        # Heavy content -> file-backed checkpoint store. Light
+        # watermark (subcall index) -> ctx.metadata.
+        _checkpoint_store.put(inv_id, current_text)
+        ctx.metadata["completed_subcalls"] = sub_idx + 1
+        await ctx.metadata.flush()
+
+        if sub_idx + 1 < CALLS_PER_PHASE and INTRA_PHASE_COOLDOWN_SEC > 0:
+            await _cooldown(
+                emit, ctx, INTRA_PHASE_COOLDOWN_SEC,
+                stage="intra_phase",
+                phase=phase_idx + 1,
+                total=NUM_PHASES,
+                subcall=sub_idx + 2,
+                of=CALLS_PER_PHASE,
+            )
+            if ctx.cancel.is_set():
+                break
+
+
+async def _stream_llm(
+    emit: "EmitFn", *, instructions: str, user_input: str,
+) -> str:
+    """One streaming LLM call. Forwards token deltas via the per-turn stream."""
+    full_text = ""
+    async for event in await _openai_client.responses.create(
+        model=_model,
+        instructions=instructions,
+        input=user_input,
+        store=False,
+        stream=True,
+        max_output_tokens=TARGET_OUTPUT_TOKENS,
+    ):
+        if event.type == "response.output_text.delta":
+            full_text += event.delta
+            await emit({"type": "token", "content": event.delta})
+    return full_text
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/agent.yaml b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/agent.yaml
new file mode 100644
index 000000000000..159718c5e4cc
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/agent.yaml
@@ -0,0 +1,35 @@
+# yaml-language-server: $schema=https://raw.githubusercontent.com/microsoft/AgentSchema/refs/heads/main/schemas/v1.0/ContainerAgent.yaml
+
+kind: hosted
+name: durable-research-agent
+description: |
+    Demo agent showcasing crash-resilient long-running tasks using @task.
+    Survives crashes and auto-resumes from last checkpoint on restart.
+metadata:
+    tags:
+        - AI Agent Hosting
+        - Invocations Protocol
+        - Durable Tasks
+        - Crash Resilience
+        - Python
+protocols:
+    - protocol: invocations
+      version: 1.0.0
+resources:
+    cpu: "1"
+    memory: 2Gi
+environment_variables:
+    - name: AZURE_AI_MODEL_DEPLOYMENT_NAME
+      value: gpt-4o
+    - name: STAGE_DURATION
+      value: "10"
+    # Long-running demo: per-phase ≈ 12s LLM + 3×30s intra + 30s inter ≈ 132s,
+    # × 15 phases ≈ 33 min total — runs ~2x past the platform's 15-min
+    # sandbox-eviction window so each demo run exercises the durable-task
+    # primitive's lease keep-alive path end-to-end (the behavior this
+    # sample exists to showcase). Local agent.py defaults (10/20s, ~15 min)
+    # apply when running outside the hosted container for fast iteration.
+    - name: INTRA_PHASE_COOLDOWN_SEC
+      value: "30"
+    - name: INTER_PHASE_COOLDOWN_SEC
+      value: "30"
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/app.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/app.py
new file mode 100644
index 000000000000..1acbb768052c
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/app.py
@@ -0,0 +1,257 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""HTTP host for the durable research agent.
+
+This file is minimal plumbing. The durability + steering logic is in ``agent.py``.
+
+Streaming is wired through the SDK ``streams`` registry: at startup
+we pick the **file-backed replay** backing (events persist to disk so
+they survive a crash + container restart). The POST handler reserves
+the per-turn stream id (``invocation_id``) BEFORE starting the task so
+the GET handler can subscribe deterministically. The handler in
+``agent.py`` emits to the same id; events on the SSE wire carry the
+emitted ``sequence_number`` as the SSE ``id:`` field, so a reconnect
+with ``?last_event_id=N`` skips events the client already received.
+
+Routes (all platform-managed — only ``/invocations*`` is reachable
+through the Foundry endpoint proxy):
+  * ``POST /invocations``                       — fire-and-forget dispatch (or
+                                                   steering input on an in-progress run);
+                                                   special: ``{"message": "crash"}``
+                                                   when ``DEMO_MODE=1`` forces a process
+                                                   exit so the platform nanny restarts us
+  * ``GET  /invocations/{id}?last_event_id=N``  — SSE stream of the active run
+  * ``POST /invocations/{id}/cancel``           — operator cancel
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+import os
+from pathlib import Path
+
+from starlette.requests import Request
+from starlette.responses import JSONResponse, Response, StreamingResponse
+
+from azure.ai.agentserver.core.streaming import (
+    EventStreamNotFoundError,
+    streams,
+)
+from azure.ai.agentserver.invocations import InvocationAgentServerHost
+
+from agent import deep_research
+
+logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+logger = logging.getLogger(__name__)
+
+
+# --- Streams bootstrap (run once at module import) --------------------------
+
+# Per-turn streams persist to disk so they survive a container crash +
+# restart. ``cursor_fn`` reads the agent's natural sequence number so
+# ``?last_event_id=N`` reconnects skip already-delivered events.
+# ``ttl_seconds=600`` bounds disk usage: once a stream is closed and
+# all its events have aged out, the registry destroys it and removes
+# the file.
+_STREAM_DIR = Path.home() / ".durable-tasks" / "_streams"
+_STREAM_DIR.mkdir(parents=True, exist_ok=True)
+
+streams.use_file_backed_replay(
+    storage_dir=_STREAM_DIR,
+    cursor_fn=lambda ev: ev["sequence_number"],
+    ttl_seconds=600,
+)
+
+
+app = InvocationAgentServerHost()
+
+
+# --- Invocation handlers ---------------------------------------------------
+
+@app.invoke_handler
+async def handle_invoke(request: Request) -> Response:
+    """Dispatch a research task (fire-and-forget).
+
+    Input shape: ``{"message": "<topic>"}``.
+
+    Two special behaviors driven by the request body:
+
+    * ``{"message": "crash"}`` (when the container has ``DEMO_MODE=1``) forces
+      ``os._exit(137)`` shortly after returning ``202``. The platform's nanny
+      worker brings the container back within ~1 min on its own — no new
+      client ingress required — and the durable task auto-resumes from its
+      last checkpoint.
+
+    * Any other ``{"message": "<topic>"}`` dispatches a normal research run.
+      If a steerable run is already in progress on this session, the input is
+      queued as a steering input — the agent winds down the current turn at
+      the next checkpoint and re-enters with the new topic.
+    """
+    body = await request.body()
+    try:
+        data = json.loads(body) if body else {}
+    except json.JSONDecodeError:
+        data = {}
+    topic = str(data.get("message") or "").strip()
+    if not topic:
+        return JSONResponse({"error": "Provide a 'message' field"}, status_code=400)
+
+    # Demo-only crash trigger.
+    if topic.lower() in ("crash", "kill", "💥") and os.environ.get("DEMO_MODE") == "1":
+        logger.critical("CRASH triggered via /invocations message=%r — exiting in 300ms", topic)
+
+        async def _crash() -> None:
+            await asyncio.sleep(0.3)
+            os._exit(137)
+
+        asyncio.get_event_loop().create_task(_crash())
+        return JSONResponse(
+            {
+                "status": "crashing",
+                "message": (
+                    "Process will exit. The platform's nanny worker brings the "
+                    "container back within ~1 min on its own (no new ingress "
+                    "required) and the durable task auto-resumes from its last "
+                    "checkpoint."
+                ),
+            },
+            status_code=202,
+        )
+
+    invocation_id: str = request.state.invocation_id
+    session_id: str = request.state.session_id
+    # ONE durable task per session so steering finds the active run.
+    # invocation_id labels THIS turn; session_id labels the long-lived task.
+    task_id = session_id
+    logger.info("POST handler: session_id=%r task_id=%r invocation_id=%r topic=%r",
+                session_id, task_id, invocation_id, topic)
+
+    # Reserve the per-turn stream id BEFORE starting the task. This
+    # guarantees a GET that races the POST sees the stream (rather than
+    # a 404 NotFound). The file-backed replay backing means we don't
+    # need to wait for a subscriber before the handler starts emitting.
+    await streams.get_or_create(invocation_id)
+
+    # Steering is transparent to callers: for a steerable=True chain,
+    # multi_turn_task.start() queues the input on the in-progress chain's
+    # steering queue WITHOUT raising. The agent's currently-running turn
+    # observes ctx.cancel.is_set(), winds down at its next checkpoint, and
+    # the framework re-enters the body with the queued input as
+    # ctx.input — at which point the new turn streams its events to
+    # the per-turn invocation_id stream reserved above. No status
+    # branching is needed here.
+    #
+    # invocation_id is also the per-turn ``input_id`` — the framework
+    # records it as the chain's last-accepted input id (see
+    # ``payload["_last_input_id"]``) and uses it for the multi-turn
+    # ``get_active_run(task_id, input_id)`` match.
+    await deep_research.start(
+        task_id=task_id,
+        input={"topic": topic, "invocation_id": invocation_id},
+        input_id=invocation_id,
+    )
+
+    return JSONResponse(
+        {
+            "status": "started",
+            "invocation_id": invocation_id,
+            "session_id": session_id,
+        },
+        status_code=202,
+    )
+
+
+@app.get_invocation_handler
+async def handle_get(request: Request) -> Response:
+    """Stream SSE from the per-invocation stream.
+
+    The platform routes ``GET /invocations/{id}`` to this container based on
+    the invocation-to-session mapping set up by the original POST. Clients
+    can pass ``?last_event_id=N`` to skip events they've already seen on a
+    reconnect — we forward this to ``stream.subscribe(after=N)`` which
+    skips events whose sequence_number ≤ N (whether they're being served
+    from in-memory live, from on-disk replay, or from a rehydrated stream
+    after a crash).
+
+    HTTP mapping:
+      - 404 if the invocation id was never seen
+        (``EventStreamNotFoundError``).
+      - 410 if the stream was destroyed (TTL eviction or explicit
+        ``streams.delete``) (``EventStreamNotFoundError``).
+    """
+    invocation_id = request.state.invocation_id
+
+    last_event_id = request.query_params.get("last_event_id", "")
+    skip_count = int(last_event_id) if last_event_id.isdigit() else 0
+    logger.info("GET handler: invocation_id=%r skip=%d", invocation_id, skip_count)
+
+    try:
+        stream = await streams.get(invocation_id)
+    except EventStreamNotFoundError:
+        return JSONResponse(
+            {"status": "not_found",
+             "message": "No stream for this invocation id."},
+            status_code=404,
+        )
+    except EventStreamNotFoundError:
+        return JSONResponse(
+            {"status": "gone",
+             "message": "Stream for this invocation id has been destroyed."},
+            status_code=410,
+        )
+
+    async def sse_stream():
+        try:
+            async for event in stream.subscribe(after=skip_count or None):
+                seq = event.get("sequence_number")
+                yield f"id: {seq}\ndata: {json.dumps(event)}\n\n"
+        except EventStreamNotFoundError:
+            # Stream destroyed while we were attached (TTL eviction or
+            # explicit delete). Tell the client we're done.
+            yield (
+                f"event: gone\ndata: "
+                + json.dumps({"type": "gone", "invocation_id": invocation_id})
+                + "\n\n"
+            )
+
+    return StreamingResponse(
+        sse_stream(),
+        media_type="text/event-stream",
+        headers={"Cache-Control": "no-cache"},
+    )
+
+
+@app.cancel_invocation_handler
+async def handle_cancel(request: Request) -> Response:
+    """Cancel the running research task.
+
+    Cancel applies to the per-session durable task (task_id == session_id).
+    The handler observes ``ctx.cancel.is_set()`` and runs its
+    cooperative wind-down at the next checkpoint, which closes the
+    per-turn stream before suspending.
+    """
+    invocation_id = request.state.invocation_id
+    # The framework resolves session_id from the platform env var
+    # ``FOUNDRY_AGENT_SESSION_ID`` (or a caller-supplied
+    # ``agent_session_id`` query param override) and stamps it on
+    # ``request.state.session_id``. No local fallback needed.
+    session_id = request.state.session_id
+    task_id = session_id  # one task per session — match POST handler
+    logger.info("CANCEL handler: invocation_id=%r task_id=%r", invocation_id, task_id)
+
+    # ``input_id == invocation_id`` per the POST handler's start() call.
+    # MultiTurnTask.get_active_run requires the input_id of the current
+    # turn so the framework can verify the caller is targeting the
+    # in-flight turn and not a stale one.
+    run = await deep_research.get_active_run(task_id, invocation_id)
+    if run is None:
+        return JSONResponse({"status": "not_found", "message": "No active task to cancel."})
+
+    await run.cancel()
+    return JSONResponse({"status": "cancelled", "message": "Task cancellation requested."})
+
+
+if __name__ == "__main__":
+    app.run()
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/requirements.txt b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/requirements.txt
new file mode 100644
index 000000000000..95cc4a5a84a7
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/requirements.txt
@@ -0,0 +1,7 @@
+# Azure AI packages (installed from local wheels during build)
+azure-ai-agentserver-core
+azure-ai-agentserver-invocations
+
+# Azure SDKs
+azure-ai-projects>=1.0.0b10
+azure-identity>=1.17.0
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/store.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/store.py
new file mode 100644
index 000000000000..5a56822af1d1
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/store.py
@@ -0,0 +1,87 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""File-backed checkpoint store for in-flight LLM content.
+
+``ctx.metadata`` on the durable-task primitive is a *small-watermark*
+store, not a bulk-data store (see ``core/docs/durable-task-guide.md``
+§"Persistence Model"). For anything heavier than a few bytes — e.g.
+the partially-streamed text of the current phase's in-flight subcall
+chain — the application is expected to maintain its own per-app
+checkpoint store and just keep a *reference* in metadata.
+
+This file is the minimal local checkpoint store for the durable
+research agent. Each phase's in-progress text is a JSON blob keyed by
+``<task_id>:<phase_idx>``. Writes are atomic (tempfile + rename) so a
+crash mid-write leaves either the old value or the new value, never a
+truncated file. The store is deliberately tiny — no metrics, no
+contention handling — because this is a sample, not a production
+component. In production, swap this for a real durable blob store
+(Cosmos, blob storage, etc.).
+
+The store survives container restarts via the platform's per-session
+mounted directory (the same directory the streams registry uses); it
+does not survive task deletion.
+"""
+
+from __future__ import annotations
+
+import json
+import tempfile
+from pathlib import Path
+
+
+class CheckpointStore:
+    """File-backed key->str blob store with atomic writes.
+
+    Used for in-flight phase text — the heaviest non-stream artifact
+    the durable handler keeps around. The agent's per-phase recovery
+    flow loads the previous-subcall text via :meth:`get` at phase
+    entry, advances it after each subcall via :meth:`put`, and clears
+    the phase entry via :meth:`delete` at phase end (so completed
+    phases don't accumulate disk usage).
+    """
+
+    def __init__(self, base_dir: Path) -> None:
+        self._base = base_dir
+        self._base.mkdir(parents=True, exist_ok=True)
+
+    def _path(self, key: str) -> Path:
+        # Hyphens + colons are safe on every fs we target; keep the
+        # original key as-is so a directory listing is self-describing.
+        return self._base / f"{key}.json"
+
+    def get(self, key: str) -> str:
+        """Return the stored text, or empty string if absent."""
+        path = self._path(key)
+        if not path.exists():
+            return ""
+        return json.loads(path.read_text(encoding="utf-8"))
+
+    def put(self, key: str, value: str) -> None:
+        """Atomically write *value* — temp file + rename."""
+        target = self._path(key)
+        fd, tmp = tempfile.mkstemp(
+            dir=str(self._base), prefix=f"{key}_", suffix=".tmp"
+        )
+        try:
+            with open(fd, "w", encoding="utf-8") as fh:
+                json.dump(value, fh)
+            Path(tmp).replace(target)
+        except BaseException:
+            Path(tmp).unlink(missing_ok=True)
+            raise
+
+    def delete(self, key: str) -> None:
+        """Remove *key* if present; no-op otherwise."""
+        path = self._path(key)
+        if path.exists():
+            path.unlink()
+
+    def delete_prefix(self, prefix: str) -> None:
+        """Remove all keys with the given prefix.
+
+        Used on a steered-turn reset to clear all phase entries for a
+        task in one shot, without enumerating each phase index.
+        """
+        for path in self._base.glob(f"{prefix}*.json"):
+            path.unlink(missing_ok=True)
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/wheels/azure_ai_agentserver_core-2.0.0b7-py3-none-any.whl b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/wheels/azure_ai_agentserver_core-2.0.0b7-py3-none-any.whl
new file mode 100644
index 000000000000..8aa80e81ac92
Binary files /dev/null and b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/wheels/azure_ai_agentserver_core-2.0.0b7-py3-none-any.whl differ
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/wheels/azure_ai_agentserver_invocations-1.0.0b6-py3-none-any.whl b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/wheels/azure_ai_agentserver_invocations-1.0.0b6-py3-none-any.whl
new file mode 100644
index 000000000000..65023295a340
Binary files /dev/null and b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo/src/durable-research-agent/wheels/azure_ai_agentserver_invocations-1.0.0b6-py3-none-any.whl differ
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/__init__.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/__init__.py
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/agent.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/agent.py
new file mode 100644
index 000000000000..f5a5328ae92d
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/agent.py
@@ -0,0 +1,355 @@
+"""Steerable durable Copilot conversation agent (invocations protocol).
+
+Wraps the **GitHub Copilot SDK** in a steerable durable task and bridges
+its session-event stream into the invocations transport.
+
+The handler delivers five key behaviors:
+
+1. ``streaming=True`` is wired into both ``create_session`` and
+   ``resume_session``, so the SDK emits incremental
+   ``AssistantMessageDeltaData`` events rather than batching the whole
+   reply into one ``AssistantMessageData`` envelope at the end.
+2. The handler forwards each ``AssistantMessageDeltaData`` as a
+   ``text_delta`` chunk the moment it arrives — clients see characters
+   appear live.
+3. The handler forwards ``SessionIdleData`` (turn-complete) as a
+   ``session_idle`` chunk so consumers can deterministically detect
+   end-of-turn without polling.
+4. Upstream-history **dedup**: before sending the user's message, the
+   handler reads the Copilot session's persisted event log via
+   ``session.get_messages()`` and skips the send when the most-recent
+   user message already matches this turn's input. This is the source
+   of truth for "did I already send this turn" — no separate metadata
+   watermark, no flush-ordering race.
+5. Recovery **replay**: on ``ctx.entry_mode == "recovered"`` the
+   handler emits the assistant text the previous lifetime had already
+   accumulated (read from ``session.get_messages()``) as a single
+   recovered ``text_delta`` chunk before starting / continuing the
+   stream — so a consumer that reconnects after a crash sees the same
+   transcript a healthy consumer would have seen.
+
+Three-phase steering cancel pattern preserved from the original
+sample:
+
+- Phase 1 — Pre-entry cancel: queued steering input that arrived
+  before this entry. Persist the message into the upstream session
+  (so the cancelled turn does not lose context) and ``session.abort()``
+  immediately.
+- Phase 2 — Mid-stream cancel: ``ctx.cancel`` fires while the assistant
+  is generating; ``session.abort()`` stops it and we suspend.
+- Phase 3 — Post-completion cancel: cancel arrived after the assistant
+  message landed but before we returned; record as superseded.
+
+Input schema: ``{"session_id": str, "message": str, "invocation_id": str}``
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+from pathlib import Path
+from typing import Any
+
+from azure.ai.agentserver.core.durable import TaskContext, multi_turn_task
+from azure.ai.agentserver.core.streaming import streams
+
+from .store import FileStore
+
+logger = logging.getLogger(__name__)
+
+_DATA_DIR = Path.home() / ".durable-sessions"
+
+invocation_store = FileStore(_DATA_DIR / "copilot-invocations")
+
+
+# --------------------------------------------------------------------------
+# Helpers
+# --------------------------------------------------------------------------
+
+
+async def _open_session(client: Any, session_id: str, entry_mode: str) -> Any:
+    """Open the Copilot session, choosing create vs. resume by entry mode.
+
+    On ``"fresh"`` we use ``create_session``; on ``"resumed"`` or
+    ``"recovered"`` we use ``resume_session`` (the SDK's reattach API).
+    Both paths set ``streaming=True``.
+
+    If ``resume_session`` raises "Session not found" (the upstream
+    Copilot CLI was not given enough time to persist the session
+    before the previous process exited — most common after SIGTERM
+    with a short grace, or SIGKILL), we fall back to
+    ``create_session``. We lose the pre-crash conversation context
+    for this turn, but the handler makes forward progress instead of
+    failing outright — upstream-dependency hiccups must NOT propagate
+    as task failures (which would orphan the invocation and fail any
+    queued steers). This mirrors the
+    ``sdk/agentserver/azure-ai-agentserver-responses/samples/sample_18_durable_copilot.py``
+    resilience pattern.
+    """
+    from copilot.session import PermissionHandler  # pylint: disable=import-outside-toplevel
+
+    if entry_mode != "fresh":
+        try:
+            return await client.resume_session(
+                session_id,
+                on_permission_request=PermissionHandler.approve_all,
+                streaming=True,
+            )
+        except Exception as exc:  # pylint: disable=broad-exception-caught
+            msg = str(exc)
+            if "Session not found" not in msg and "not found" not in msg.lower():
+                raise
+            logger.warning(
+                "Copilot session %s not found on resume (%s); creating fresh "
+                "session — pre-crash conversation context for this turn is lost.",
+                session_id,
+                msg,
+            )
+            # Fall through to create_session below.
+    return await client.create_session(
+        session_id=session_id,
+        on_permission_request=PermissionHandler.approve_all,
+        streaming=True,
+    )
+
+
+async def _last_user_message_matches(session: Any, message: str) -> bool:
+    """upstream-history dedup.
+
+    Read the session's persisted event log; the user-turn was already
+    sent if the most recent ``UserMessageData`` event's content equals
+    this turn's input. The upstream session is the source of truth.
+    """
+    from copilot.generated.session_events import (  # pylint: disable=import-outside-toplevel
+        UserMessageData,
+    )
+
+    try:
+        events = await session.get_messages()
+    except (AttributeError, RuntimeError):
+        # SDK has no get_messages (older SDK build): cannot dedup; skip safely.
+        # Re-send is acceptable because Copilot tolerates duplicate-user-message
+        # on the same turn.
+        return False
+
+    for ev in reversed(events or []):
+        data = getattr(ev, "data", None)
+        if isinstance(data, UserMessageData):
+            content = (getattr(data, "content", "") or "").strip()
+            return content == message.strip()
+    return False
+
+
+async def _recovered_assistant_text(session: Any) -> str:
+    """recovery replay snapshot.
+
+    On crash-recovery, read whatever assistant content the previous
+    lifetime had already accumulated for the current turn from the
+    upstream session log; this is what we replay to the reconnected
+    consumer before resuming the live stream.
+    """
+    from copilot.generated.session_events import (  # pylint: disable=import-outside-toplevel
+        AssistantMessageData,
+        AssistantMessageDeltaData,
+        UserMessageData,
+    )
+
+    try:
+        events = await session.get_messages()
+    except (AttributeError, RuntimeError):
+        return ""
+
+    # Find the last user message; everything after it is the in-flight
+    # assistant turn we are recovering.
+    parts: list[str] = []
+    saw_user = False
+    for ev in events or []:
+        data = getattr(ev, "data", None)
+        if isinstance(data, UserMessageData):
+            saw_user = True
+            parts.clear()
+            continue
+        if not saw_user:
+            continue
+        if isinstance(data, AssistantMessageDeltaData):
+            parts.append(getattr(data, "delta_content", "") or "")
+        elif isinstance(data, AssistantMessageData):
+            # Final assembled message; takes precedence over deltas if present.
+            parts = [getattr(data, "content", "") or ""]
+    return "".join(parts)
+
+
+# --------------------------------------------------------------------------
+# The durable task
+# --------------------------------------------------------------------------
+
+
+@multi_turn_task(name="copilot_session", steerable=True)
+async def copilot_session(ctx: TaskContext[dict]) -> dict[str, Any]:
+    """Run one Copilot conversation turn with steering + crash resilience."""
+
+    from copilot import CopilotClient  # pylint: disable=import-outside-toplevel
+    from copilot.generated.session_events import (  # pylint: disable=import-outside-toplevel
+        AssistantMessageData,
+        AssistantMessageDeltaData,
+        SessionIdleData,
+    )
+
+    session_id: str = ctx.input["session_id"]
+    message: str = ctx.input["message"]
+    invocation_id: str = ctx.input["invocation_id"]
+
+    invocation_store.save(invocation_id, {"status": "running"})
+    stream = await streams.get_or_create(invocation_id)
+    await stream.emit({"type": "lifecycle", "status": "running"})
+
+    logger.info(
+        "Copilot session %s steered=%s invocation=%s entry=%s",
+        session_id,
+        ctx.is_steered_turn,
+        invocation_id,
+        ctx.entry_mode,
+    )
+
+    async with CopilotClient() as client:
+        session = await _open_session(client, session_id, ctx.entry_mode)
+
+        # ── recovery replay ─────────────────────────
+        # On recovery, replay whatever the previous lifetime had already
+        # streamed to the consumer, reading from the upstream session log.
+        if ctx.entry_mode == "recovered":
+            recovered_text = await _recovered_assistant_text(session)
+            if recovered_text:
+                logger.info(
+                    "Recovery replay: %d chars from upstream session log",
+                    len(recovered_text),
+                )
+                await stream.emit(
+                    {
+                        "type": "text_delta",
+                        "delta": recovered_text,
+                        "recovered": True,
+                    }
+                )
+
+        # ── Phase 1: Pre-entry cancel (rapid-fire steering) ────────
+        if ctx.cancel.is_set():
+            logger.info("Skipping steered=%s — cancel pre-set", ctx.is_steered_turn)
+            # Still send so the message is preserved in upstream history —
+            # but go through dedup so we don't double-send on recovery.
+            if not await _last_user_message_matches(session, message):
+                await session.send(message)
+            await session.abort()
+            invocation_store.save(
+                invocation_id,
+                {
+                    "status": "cancelled",
+                    "reason": "steered",
+                    "message_preserved": True,
+                },
+            )
+            return None
+        # ── upstream-history dedup ──────────────────
+        # Send the message only if the upstream session does not already
+        # have it as the most recent user message.
+        already_sent = await _last_user_message_matches(session, message)
+        if not already_sent:
+            await session.send(message)
+        else:
+            logger.info("Skipping session.send — upstream history already has this turn")
+
+        # ── Phase 2: Stream the Copilot turn, checking cancel ──────
+        reply_parts: list[str] = []
+        idle_event = asyncio.Event()
+        loop = asyncio.get_event_loop()
+
+        def on_event(event: Any) -> None:
+            """SDK callback — emit deltas live, signal on idle."""
+            data = event.data
+            if isinstance(data, AssistantMessageDeltaData):
+                delta = getattr(data, "delta_content", "") or ""
+                reply_parts.append(delta)
+                # emit delta as it arrives.
+                loop.create_task(_stream_and_persist(stream, invocation_id, delta, reply_parts))
+            elif isinstance(data, AssistantMessageData):
+                # Fallback for SDK builds that emit only the assembled message.
+                if not reply_parts:
+                    content = getattr(data, "content", "") or ""
+                    reply_parts.append(content)
+                    loop.create_task(_stream_and_persist(stream, invocation_id, content, reply_parts))
+            elif isinstance(data, SessionIdleData):
+                # emit session_idle to consumers and unblock us.
+                loop.create_task(stream.emit({"type": "session_idle"}))
+                idle_event.set()
+
+        session.on(on_event)
+
+        # Wait for idle (turn complete) or cancel, whichever first.
+        was_aborted = False
+        cancel_task = asyncio.create_task(ctx.cancel.wait())
+        idle_task = asyncio.create_task(idle_event.wait())
+        try:
+            done, pending = await asyncio.wait(
+                {cancel_task, idle_task},
+                return_when=asyncio.FIRST_COMPLETED,
+            )
+            for t in pending:
+                t.cancel()
+            if cancel_task in done and idle_task not in done:
+                was_aborted = True
+                logger.info("session.abort() — new input queued")
+                await session.abort()
+        finally:
+            for t in (cancel_task, idle_task):
+                if not t.done():
+                    t.cancel()
+
+        reply = "".join(reply_parts)
+
+    # ── Phase 3: Save result + decide suspended-state envelope ────
+    output = {
+        "invocation_id": invocation_id,
+        "reply": reply,
+        "partial": was_aborted,
+    }
+
+    if was_aborted:
+        invocation_store.save(
+            invocation_id,
+            {
+                "status": "superseded",
+                "reason": "steered_mid_stream",
+                "output": output,
+            },
+        )
+        return None
+    if ctx.cancel.is_set():
+        invocation_store.save(
+            invocation_id,
+            {
+                "status": "superseded",
+                "reason": "steered_post_completion",
+                "output": output,
+            },
+        )
+        return None
+    invocation_store.save(invocation_id, {"status": "completed", "output": output})
+    return output
+
+
+async def _stream_and_persist(
+    stream: Any,
+    invocation_id: str,
+    delta: str,
+    parts: list[str],
+) -> None:
+    """Push a streaming delta and persist the running text snapshot."""
+
+    await stream.emit({"type": "text_delta", "delta": delta})
+    invocation_store.save(
+        invocation_id,
+        {
+            "status": "streaming",
+            "text": "".join(parts),
+        },
+    )
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/app.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/app.py
new file mode 100644
index 000000000000..9e88379ae828
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/app.py
@@ -0,0 +1,175 @@
+"""HTTP host for the Copilot durable agent with steering and streaming.
+
+Wires the Copilot durable task (``agent.py``) to the invocations framework.
+With ``steerable=True``, calling ``start()`` on an in-progress task queues
+the new input — no manual cancel/wait/restart logic needed.
+
+**Streaming**: If the POST request includes ``Accept: text/event-stream``,
+the response is an SSE stream of text deltas as they are generated.  If the
+client disconnects mid-stream, it can fall back to ``GET /invocations/<id>``
+which returns the full text snapshot at that moment.
+
+Requires the **GitHub Copilot SDK** (``pip install github-copilot-sdk``)
+and the Copilot CLI installed and authenticated (``gh auth login``).
+
+Usage::
+
+    pip install -r requirements.txt
+
+    python -m durable_copilot.app
+
+    # Turn 1 (async)
+    curl -X POST "http://localhost:8088/invocations?agent_session_id=demo-001" \\
+        -H "Content-Type: application/json" \\
+        -d '{"message": "Explain Python decorators"}'
+
+    # Turn 1 (streaming)
+    curl -N -X POST "http://localhost:8088/invocations?agent_session_id=demo-001" \\
+        -H "Content-Type: application/json" \\
+        -H "Accept: text/event-stream" \\
+        -d '{"message": "Explain Python decorators"}'
+
+    # Poll (recovery after disconnect)
+    curl "http://localhost:8088/invocations/<inv-1>"
+
+    # Steer (while turn 1 is still running)
+    curl -X POST "http://localhost:8088/invocations?agent_session_id=demo-001" \\
+        -H "Content-Type: application/json" \\
+        -d '{"message": "Actually, explain async/await instead"}'
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from collections.abc import AsyncGenerator
+
+from starlette.requests import Request
+from starlette.responses import JSONResponse, Response, StreamingResponse
+
+from azure.ai.agentserver.core.streaming import (
+    EventStream,
+    EventStreamNotFoundError,
+    streams,
+)
+from azure.ai.agentserver.invocations import InvocationAgentServerHost
+
+from .agent import copilot_session, invocation_store
+
+logger = logging.getLogger(__name__)
+
+# Default broadcast (in-memory live) backing: every subscriber attaches
+# BEFORE the producer starts (subscribe-before-start — see line 121),
+# so we don't need replay catch-up. The recovery path after a streaming
+# disconnect is GET /invocations/<id>, which returns the full snapshot
+# the Copilot SDK has accumulated upstream — the framework-level replay
+# buffer was redundant on top of that and held memory unnecessarily.
+# Stream id is the per-turn ``invocation_id`` per streaming.md §7.8.
+streams.use_in_memory_live()
+
+app = InvocationAgentServerHost()
+
+
+async def _sse_from_iter(
+    subscription, invocation_id: str, *, initial_status: str = "queued"
+) -> AsyncGenerator[bytes, None]:
+    """Convert an already-attached subscriber iterator into SSE bytes."""
+
+    yield (
+        f"data: {json.dumps({'type': 'lifecycle', 'status': initial_status, 'invocation_id': invocation_id})}\n\n"
+    ).encode()
+
+    try:
+        async for chunk in subscription:
+            yield f"data: {json.dumps(chunk)}\n\n".encode()
+        done_data = {"type": "done", "invocation_id": invocation_id}
+        yield f"event: done\ndata: {json.dumps(done_data)}\n\n".encode()
+    except EventStreamNotFoundError:
+        yield (
+            f"event: superseded\n" f"data: {json.dumps({'type': 'superseded', 'invocation_id': invocation_id})}\n\n"
+        ).encode()
+    except Exception as exc:  # pylint: disable=broad-except
+        error_data = {
+            "type": "error",
+            "invocation_id": invocation_id,
+            "error": str(exc),
+        }
+        yield f"event: error\ndata: {json.dumps(error_data)}\n\n".encode()
+
+
+@app.invoke_handler
+async def handle_invoke(request: Request) -> Response:
+    """Start or steer a Copilot session.
+
+    If ``Accept: text/event-stream`` is set, returns an SSE stream.
+    Otherwise returns ``202 Accepted`` for async polling.
+    """
+    data = await request.json()
+    invocation_id: str = request.state.invocation_id
+    session_id: str = request.state.session_id
+    message: str = data.get("message", "")
+    task_id = f"session-{session_id}"
+
+    task_input = {
+        "session_id": session_id,
+        "message": message,
+        "invocation_id": invocation_id,
+    }
+
+    invocation_store.save(invocation_id, {"status": "queued"})
+
+    wants_stream = "text/event-stream" in request.headers.get("accept", "")
+
+    # Subscribe-before-start (streaming.md §5.1): with use_in_memory_live()
+    # late subscribers see no prior events. We must attach the per-
+    # subscriber queue BEFORE the producer starts emitting. Calling
+    # iter() on the result of subscribe() forces __aiter__ to register
+    # the queue immediately so any emit() that lands between
+    # task.start() and the StreamingResponse iteration is captured.
+    stream = await streams.get_or_create(invocation_id)
+    subscription = None
+    if wants_stream:
+        subscription = stream.subscribe()
+        # Force the subscriber queue to register NOW by invoking
+        # __aiter__ directly (subscription is an async iterator, not a
+        # plain iterable — sync ``iter()`` would reject it).
+        subscription = subscription.__aiter__()
+
+    await copilot_session.start(task_id=task_id, input=task_input)
+
+    if wants_stream:
+        return StreamingResponse(
+            _sse_from_iter(subscription, invocation_id),
+            media_type="text/event-stream",
+            headers={"Cache-Control": "no-cache", "Connection": "keep-alive"},
+        )
+
+    # Async mode
+    stored = invocation_store.load(invocation_id)
+    status = stored["status"] if stored else "queued"
+
+    return JSONResponse(
+        {"invocation_id": invocation_id, "status": status},
+        status_code=202,
+    )
+
+
+@app.get_invocation_handler
+async def poll_invocation(request: Request) -> Response:
+    """Poll a specific invocation's result.
+
+    Returns the current snapshot — during streaming this includes the
+    full text generated so far.  This is the recovery path after a
+    streaming disconnect.
+    """
+    invocation_id: str = request.state.invocation_id
+
+    result = invocation_store.load(invocation_id)
+    if result is None:
+        return JSONResponse({"error": "Invocation not found"}, status_code=404)
+
+    return JSONResponse({"invocation_id": invocation_id, **result})
+
+
+if __name__ == "__main__":
+    app.run()
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/requirements.txt b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/requirements.txt
new file mode 100644
index 000000000000..a5c8adee9c42
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/requirements.txt
@@ -0,0 +1,5 @@
+github-copilot-sdk
+azure-ai-agentserver-core
+azure-ai-agentserver-invocations
+starlette
+uvicorn
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/store.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/store.py
new file mode 100644
index 000000000000..b754b2bf7fa2
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot/store.py
@@ -0,0 +1,57 @@
+"""File-based key→JSON store for powering the invocation API.
+
+This module provides a minimal persistence layer that the HTTP host uses to
+store per-invocation results.  It is **not** part of the durable task
+framework — it is the developer's own persistence for powering the API
+contract (``GET /invocations/{invocation_id}``).
+
+.. warning::
+
+    For demonstration only.  In production, use a database (Redis, Cosmos DB,
+    PostgreSQL, etc.).
+"""
+
+from __future__ import annotations
+
+import json
+import tempfile
+from pathlib import Path
+from typing import Any
+
+
+class FileStore:
+    """Minimal file-backed key→JSON store.
+
+    Each entry is a single JSON file.  Writes are atomic (temp + rename).
+    """
+
+    def __init__(self, base_dir: Path) -> None:
+        self._base = base_dir
+        self._base.mkdir(parents=True, exist_ok=True)
+
+    def save(self, key: str, data: dict[str, Any]) -> None:
+        """Atomically write *data* as JSON — temp file + rename."""
+        target = self._base / f"{key}.json"
+        fd, tmp_path = tempfile.mkstemp(dir=str(self._base), suffix=".tmp", prefix=f"{key}_")
+        try:
+            with open(fd, "w", encoding="utf-8") as f:
+                json.dump(data, f, indent=2)
+            Path(tmp_path).replace(target)
+        except BaseException:
+            Path(tmp_path).unlink(missing_ok=True)
+            raise
+
+    def load(self, key: str) -> dict[str, Any] | None:
+        """Return the stored dict, or ``None`` if the key does not exist."""
+        path = self._base / f"{key}.json"
+        if path.exists():
+            return json.loads(path.read_text())
+        return None
+
+    def delete(self, key: str) -> bool:
+        """Remove the entry for *key*.  Returns ``True`` if it existed."""
+        path = self._base / f"{key}.json"
+        if path.exists():
+            path.unlink()
+            return True
+        return False
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/__init__.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/__init__.py
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/agent.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/agent.py
new file mode 100644
index 000000000000..e32d51f85e63
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/agent.py
@@ -0,0 +1,412 @@
+"""LangGraph conversation agent with durable task lifecycle and steering.
+
+Wraps a LangGraph ``StateGraph`` in a steerable durable task.
+Demonstrates the **checkpoint-and-fork** cancel pattern:
+
+1. Pre-entry check  — short-circuit if cancel is pre-set
+2. Inter-node check — ``_invoke_cancellable`` checks between graph nodes
+3. Fork-on-steer    — roll back to the last stable checkpoint and fork
+   with the new message
+
+LangGraph owns the conversation flow; the durable task owns crash
+resilience and steering orchestration.
+"""
+
+import asyncio
+import logging
+import sqlite3
+import typing
+from pathlib import Path
+from typing import Any
+
+from langchain_core.messages import AIMessage, HumanMessage
+from langgraph.checkpoint.sqlite import SqliteSaver
+from langgraph.graph import END, START, StateGraph, add_messages
+from langgraph.types import Command, interrupt
+from typing_extensions import TypedDict
+
+from azure.ai.agentserver.core.durable import TaskContext, multi_turn_task
+from azure.ai.agentserver.core.streaming import streams
+
+from .store import FileStore
+
+logger = logging.getLogger(__name__)
+
+_DATA_DIR = Path.home() / ".durable-sessions"
+
+# Invocation result store — written inside the durable task so it survives crashes
+invocation_store = FileStore(_DATA_DIR / "invocations")
+
+
+# ---------------------------------------------------------------------------
+# Graph state
+# ---------------------------------------------------------------------------
+
+
+class ConversationState(TypedDict):
+    """Graph state for a multi-turn conversation.
+
+    Uses LangGraph's built-in ``add_messages`` reducer for message
+    accumulation across turns.
+    """
+
+    messages: typing.Annotated[list, add_messages]
+    is_complete: bool
+
+
+# ---------------------------------------------------------------------------
+# Graph nodes
+# ---------------------------------------------------------------------------
+
+# Simulated step delay — distributed across nodes so inter-node
+# cancellation (via ``graph.stream()``) can bail out quickly.
+_STEP_DELAY = 2  # seconds per processing node
+
+
+def analyze_input(state: ConversationState) -> dict[str, Any]:
+    """Simulate analysing the user's message (e.g., intent detection)."""
+    import time  # pylint: disable=import-outside-toplevel
+
+    _ = state  # Would inspect messages in a real implementation
+    time.sleep(_STEP_DELAY)
+    return {}  # No state change — analysis is an internal step
+
+
+def generate_response(state: ConversationState) -> dict[str, Any]:
+    """Generate an AI response.  Replace stub with a real LLM call."""
+    import time  # pylint: disable=import-outside-toplevel
+
+    time.sleep(_STEP_DELAY)
+
+    messages = state["messages"]
+    user_messages = [m for m in messages if isinstance(m, HumanMessage)]
+    turn = len(user_messages)
+    last_msg = user_messages[-1].content if user_messages else ""
+
+    if turn == 1:
+        reply = f"Thanks for reaching out! You said: '{last_msg}'. " "I'd love to help — could you share more details?"
+    elif turn == 2:
+        reply = (
+            f"Great context: '{last_msg}'. Building on our earlier "
+            "exchange, here are some initial thoughts. What else "
+            "would you like to explore?"
+        )
+    else:
+        reply = (
+            f"Turn {turn}: incorporating '{last_msg}' — I now have " f"context from {turn} turns. How shall we proceed?"
+        )
+
+    return {"messages": [AIMessage(content=reply)]}
+
+
+def refine_response(state: ConversationState) -> dict[str, Any]:
+    """Simulate post-processing (e.g., safety checks, formatting)."""
+    import time  # pylint: disable=import-outside-toplevel
+
+    _ = state  # Would inspect the generated reply in a real implementation
+    time.sleep(_STEP_DELAY // 2 or 1)
+    return {}  # No state change — refinement is an internal step
+
+
+def wait_for_user(state: ConversationState) -> dict[str, Any]:
+    """Pause the graph and wait for the next human message."""
+    messages = state["messages"]
+    user_count = len([m for m in messages if isinstance(m, HumanMessage)])
+
+    user_input: str = interrupt(
+        {
+            "prompt": "Please provide your next message (or say 'done' to finish):",
+            "current_turn": user_count,
+        }
+    )
+
+    if user_input.strip().lower() == "done":
+        return {"is_complete": True}
+
+    return {
+        "messages": [HumanMessage(content=user_input)],
+        "is_complete": False,
+    }
+
+
+def _should_continue(state: ConversationState) -> str:
+    """Route: loop back to process_input or end the conversation."""
+    if state.get("is_complete", False):
+        return "end"
+    return "continue"
+
+
+# ---------------------------------------------------------------------------
+# Persistent graph checkpointer (survives restarts)
+# ---------------------------------------------------------------------------
+
+_DATA_DIR.mkdir(parents=True, exist_ok=True)
+_DB_PATH = _DATA_DIR / "langgraph_checkpoints.db"
+
+_conn = sqlite3.connect(str(_DB_PATH), check_same_thread=False)
+_checkpointer = SqliteSaver(_conn)
+_checkpointer.setup()
+
+logger.info("LangGraph checkpoints stored at: %s", _DB_PATH)
+
+
+# ---------------------------------------------------------------------------
+# Build and compile the graph
+# ---------------------------------------------------------------------------
+
+
+def _build_graph() -> Any:
+    """Construct the LangGraph StateGraph for multi-turn conversation.
+
+    Processing is split across three nodes (``analyze_input`` →
+    ``generate_response`` → ``refine_response``) so that stream-based
+    cancellation can bail out between any two steps (~2 s granularity).
+    """
+    builder = StateGraph(ConversationState)
+
+    builder.add_node("analyze_input", analyze_input)
+    builder.add_node("generate_response", generate_response)
+    builder.add_node("refine_response", refine_response)
+    builder.add_node("wait_for_user", wait_for_user)
+
+    builder.add_edge(START, "analyze_input")
+    builder.add_edge("analyze_input", "generate_response")
+    builder.add_edge("generate_response", "refine_response")
+    builder.add_edge("refine_response", "wait_for_user")
+
+    builder.add_conditional_edges(
+        "wait_for_user",
+        _should_continue,
+        {
+            "continue": "analyze_input",
+            "end": END,
+        },
+    )
+
+    return builder.compile(checkpointer=_checkpointer)
+
+
+_graph = _build_graph()
+
+
+# ---------------------------------------------------------------------------
+# Steering — cancellable graph invocation and state forking
+# ---------------------------------------------------------------------------
+
+
+def _invoke_cancellable(
+    graph: Any,
+    graph_input: Any,
+    config: dict[str, Any],
+    cancel_event: asyncio.Event,
+    on_node: Any = None,
+) -> bool:
+    """Run the graph using ``stream()`` with inter-node cancellation.
+
+    Instead of ``graph.invoke()`` which blocks until the full graph
+    completes, this streams node-by-node and checks ``cancel_event``
+    between nodes.  If cancellation is detected, execution stops before
+    the next node runs.
+
+    Returns ``True`` if the graph ran to completion (or interrupt),
+    ``False`` if cancelled mid-graph.
+    """
+    for chunk in graph.stream(graph_input, config):
+        if on_node is not None:
+            on_node(chunk)
+        if cancel_event.is_set():
+            return False
+    return True
+
+
+def _fork_from_checkpoint(
+    graph: Any,
+    config: dict[str, Any],
+    target_checkpoint_id: str,
+    new_message: str,
+) -> bool:
+    """Fork the graph from a previous checkpoint with a new message.
+
+    Uses LangGraph's native state forking: ``update_state`` called with
+    an old checkpoint's config creates a new branch.  The graph's head
+    pointer moves to the fork, discarding any state that was added after
+    the target checkpoint.
+
+    After forking the graph is positioned after ``wait_for_user`` with
+    the new message injected, so the next step is ``process_input``.
+
+    Returns ``True`` if the fork was created.
+    """
+    # Load the target checkpoint to get its full config (includes checkpoint_ns)
+    target_config = {
+        "configurable": {
+            **config["configurable"],
+            "checkpoint_id": target_checkpoint_id,
+        }
+    }
+    target = graph.get_state(target_config)
+    if not target or not target.config:
+        return False
+
+    # Fork: update_state at the old checkpoint creates a new branch
+    graph.update_state(
+        target.config,
+        values={"messages": [HumanMessage(content=new_message)]},
+        as_node="wait_for_user",
+    )
+    return True
+
+
+def _build_turn_output(state: Any) -> dict[str, Any]:
+    """Extract turn output from graph state at an interrupt."""
+    messages = state.values.get("messages", [])
+    ai_messages = [m for m in messages if isinstance(m, AIMessage)]
+    user_messages = [m for m in messages if isinstance(m, HumanMessage)]
+    last_reply = ai_messages[-1].content if ai_messages else ""
+    return {"reply": last_reply, "turn": len(user_messages)}
+
+
+def _build_session_output(state: Any) -> dict[str, Any]:
+    """Build final output when the graph conversation is complete."""
+    messages = state.values.get("messages", [])
+    user_count = len([m for m in messages if isinstance(m, HumanMessage)])
+    return {
+        "finished": True,
+        "turn_count": user_count,
+        "total_messages": len(messages),
+        "summary": f"Session complete after {user_count} turns.",
+    }
+
+
+async def _finalize_invocation(
+    ctx: TaskContext[dict],
+    thread_config: dict[str, Any],
+    invocation_id: str,
+) -> dict[str, Any] | Any:
+    """Save results and suspend/return after a graph invoke completes."""
+    state = await asyncio.to_thread(_graph.get_state, thread_config)
+
+    new_cp_id = state.config["configurable"]["checkpoint_id"]
+    ctx.metadata.set("stable_checkpoint_id", new_cp_id)
+    ctx.metadata.set("last_applied_invocation_id", invocation_id)
+
+    if state.next:
+        output = _build_turn_output(state)
+        invocation_store.save(invocation_id, {"status": "completed", "output": output})
+        return output
+    result = _build_session_output(state)
+    invocation_store.save(invocation_id, {"status": "completed", "output": result})
+    return result
+
+
+# ---------------------------------------------------------------------------
+# Durable task — bridges LangGraph with HTTP lifecycle
+# ---------------------------------------------------------------------------
+
+
+@multi_turn_task(name="langgraph_session", steerable=True)
+async def langgraph_session(ctx: TaskContext[dict]) -> dict[str, Any]:
+    """Run one LangGraph conversation turn with steering support.
+
+    Input schema: ``{"session_id": str, "message": str, "invocation_id": str}``
+    """
+    session_id: str = ctx.input["session_id"]
+    message: str = ctx.input["message"]
+    invocation_id: str = ctx.input["invocation_id"]
+
+    invocation_store.save(invocation_id, {"status": "running"})
+    stream = await streams.get_or_create(invocation_id)
+    await stream.emit({"type": "lifecycle", "status": "running"})
+
+    thread_config: dict[str, Any] = {"configurable": {"thread_id": session_id}}
+
+    if ctx.entry_mode == "recovered":
+        logger.warning("Recovered stale task for session %s", session_id)
+
+    # ── Fork-on-steer: rollback to stable checkpoint ────────────────
+    # If the previous invocation was cancelled mid-flight, the graph may
+    # have drifted past the stable checkpoint.  Fork from the stable
+    # checkpoint with the new message so the graph processes it cleanly.
+    stable_cp = ctx.metadata.get("stable_checkpoint_id")
+    if stable_cp:
+        state = await asyncio.to_thread(_graph.get_state, thread_config)
+        if state and state.values.get("messages"):
+            current_cp = state.config["configurable"].get("checkpoint_id")
+            if current_cp and current_cp != stable_cp:
+                forked = await asyncio.to_thread(
+                    _fork_from_checkpoint,
+                    _graph,
+                    thread_config,
+                    stable_cp,
+                    message,
+                )
+                if forked:
+                    logger.info(
+                        "Forked session %s from stable checkpoint %s",
+                        session_id,
+                        stable_cp,
+                    )
+                    completed = await asyncio.to_thread(
+                        _invoke_cancellable,
+                        _graph,
+                        None,
+                        thread_config,
+                        ctx.cancel,
+                    )
+
+                    if not completed or ctx.cancel.is_set():
+                        invocation_store.save(
+                            invocation_id,
+                            {"status": "cancelled", "reason": "steered"},
+                        )
+                        return None
+                    return await _finalize_invocation(ctx, thread_config, invocation_id)
+
+    # ── Phase 1: Pre-entry cancel ───────────────────────────────────
+    if ctx.cancel.is_set():
+        invocation_store.save(invocation_id, {"status": "cancelled", "reason": "steered"})
+        return None
+    # ── Phase 2: Invoke graph with inter-node cancellation ──────────
+    state = await asyncio.to_thread(_graph.get_state, thread_config)
+
+    if state.next:
+        graph_input = Command(resume=message)
+    else:
+        graph_input = {
+            "messages": [HumanMessage(content=message)],
+            "is_complete": False,
+        }
+
+    loop = asyncio.get_event_loop()
+
+    def _on_node(chunk: dict) -> None:
+        """Stream node progress events from the sync graph thread."""
+        node_names = list(chunk.keys())
+        for name in node_names:
+            asyncio.run_coroutine_threadsafe(
+                stream.emit({"type": "node_progress", "node": name}),
+                loop,
+            )
+        invocation_store.save(
+            invocation_id,
+            {
+                "status": "streaming",
+                "last_node": node_names[-1] if node_names else None,
+            },
+        )
+
+    completed = await asyncio.to_thread(
+        _invoke_cancellable,
+        _graph,
+        graph_input,
+        thread_config,
+        ctx.cancel,
+        _on_node,
+    )
+
+    # ── Phase 3: Post-completion cancel check ───────────────────────
+    if not completed or ctx.cancel.is_set():
+        invocation_store.save(invocation_id, {"status": "cancelled", "reason": "steered"})
+        return None
+    # Normal completion
+    return await _finalize_invocation(ctx, thread_config, invocation_id)
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/app.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/app.py
new file mode 100644
index 000000000000..ae660f0ce4ac
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/app.py
@@ -0,0 +1,176 @@
+"""HTTP host for the LangGraph durable agent with streaming and steering.
+
+Wires the LangGraph durable task (``agent.py``) to the invocations framework.
+Per-invocation results are written by the durable task itself (inside the
+crash-resilient execution boundary), not by a background collector.
+
+Streaming
+~~~~~~~~~
+
+Pass ``Accept: text/event-stream`` on POST to receive an SSE stream of node
+progress events (``node_progress``) plus lifecycle events (``queued``,
+``running``).  Without the header you get the standard 202 JSON response for
+async polling via GET.
+
+Steering is handled by the framework: the durable task is declared with
+``steerable=True``, so calling ``start()`` on an in-progress task **queues**
+the new input instead of raising ``TaskConflictError``.  The running function
+sees ``ctx.cancel`` set and short-circuits.  The framework then drains the
+queue and re-enters the function with the next input.
+
+Usage::
+
+    pip install -r requirements.txt
+
+    python -m durable_langgraph.app
+    # — or —
+    python app.py
+
+    # Turn 1 — async
+    curl -X POST "http://localhost:8088/invocations?agent_session_id=demo-001" \\
+        -H "Content-Type: application/json" \\
+        -d '{"message": "I need help planning a trip to Tokyo"}'
+    # → 202  (x-agent-invocation-id: <inv-1>)
+
+    # Turn 1 — streaming
+    curl -N -X POST "http://localhost:8088/invocations?agent_session_id=demo-001" \\
+        -H "Content-Type: application/json" \\
+        -H "Accept: text/event-stream" \\
+        -d '{"message": "I need help planning a trip to Tokyo"}'
+    # → SSE stream: lifecycle:queued → lifecycle:running → node_progress → done
+
+    # Poll that invocation (snapshot — always available)
+    curl "http://localhost:8088/invocations/<inv-1>"
+    # → {"invocation_id": "<inv-1>", "status": "completed", "output": {...}}
+
+    # Steer — send a new invocation while a turn is still running.
+    curl -X POST "http://localhost:8088/invocations?agent_session_id=demo-001" \\
+        -H "Content-Type: application/json" \\
+        -d '{"message": "Actually, let us go to Paris instead"}'
+
+    # End session
+    curl -X POST "http://localhost:8088/invocations?agent_session_id=demo-001" \\
+        -H "Content-Type: application/json" \\
+        -d '{"message": "done"}'
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from collections.abc import AsyncGenerator
+
+from starlette.requests import Request
+from starlette.responses import JSONResponse, Response, StreamingResponse
+
+from azure.ai.agentserver.core.streaming import (
+    EventStream,
+    EventStreamNotFoundError,
+    streams,
+)
+from azure.ai.agentserver.invocations import InvocationAgentServerHost
+
+from .agent import invocation_store, langgraph_session
+
+logger = logging.getLogger(__name__)
+
+# In-memory multi-subscriber replay buffer; 10-min sliding window for
+# reconnects within the recovery window. Per streaming.md §7.8 the
+# stream id is the per-turn ``invocation_id``.
+streams.use_in_memory_replay(ttl_seconds=600)
+
+app = InvocationAgentServerHost()
+
+
+async def _sse_from_stream(
+    stream: EventStream, invocation_id: str, *, initial_status: str = "queued"
+) -> AsyncGenerator[bytes, None]:
+    """Convert an EventStream's payloads into SSE-formatted bytes."""
+
+    yield (
+        f"data: {json.dumps({'type': 'lifecycle', 'status': initial_status, 'invocation_id': invocation_id})}\n\n"
+    ).encode()
+
+    try:
+        async for chunk in stream.subscribe():
+            yield f"data: {json.dumps(chunk)}\n\n".encode()
+        done_data = {"type": "done", "invocation_id": invocation_id}
+        yield f"event: done\ndata: {json.dumps(done_data)}\n\n".encode()
+    except EventStreamNotFoundError:
+        yield (
+            f"event: superseded\n" f"data: {json.dumps({'type': 'superseded', 'invocation_id': invocation_id})}\n\n"
+        ).encode()
+    except Exception as exc:  # pylint: disable=broad-except
+        error_data = {
+            "type": "error",
+            "invocation_id": invocation_id,
+            "error": str(exc),
+        }
+        yield f"event: error\ndata: {json.dumps(error_data)}\n\n".encode()
+
+
+@app.invoke_handler
+async def handle_invoke(request: Request) -> Response:
+    """Start or steer a LangGraph session.
+
+    If ``Accept: text/event-stream`` is set, returns an SSE stream of node
+    progress events.  Otherwise returns ``202 Accepted`` for async polling.
+    """
+    data = await request.json()
+    invocation_id: str = request.state.invocation_id
+    session_id: str = request.state.session_id
+    message: str = data.get("message", "")
+    task_id = f"session-{session_id}"
+
+    task_input = {
+        "session_id": session_id,
+        "message": message,
+        "invocation_id": invocation_id,
+    }
+
+    invocation_store.save(invocation_id, {"status": "queued"})
+
+    # Subscribe-before-start (streaming.md §5.1): attach SSE subscriber
+    # BEFORE starting the task. Handler reads invocation_id from
+    # ctx.input and obtains the SAME registry-cached stream.
+    stream = await streams.get_or_create(invocation_id)
+    await langgraph_session.start(task_id=task_id, input=task_input)
+
+    # SSE streaming mode — return live node progress
+    wants_stream = "text/event-stream" in request.headers.get("accept", "")
+    if wants_stream:
+        return StreamingResponse(
+            _sse_from_stream(stream, invocation_id),
+            media_type="text/event-stream",
+            headers={"X-Agent-Invocation-Id": invocation_id},
+        )
+
+    # Standard async mode — return 202 with status from store
+    stored = invocation_store.load(invocation_id)
+    status = stored["status"] if stored else "queued"
+
+    return JSONResponse(
+        {"invocation_id": invocation_id, "status": status},
+        status_code=202,
+    )
+
+
+@app.get_invocation_handler
+async def poll_invocation(request: Request) -> Response:
+    """Poll a specific invocation's snapshot.
+
+    Returns the durable snapshot from the invocation store.  During streaming
+    this includes ``last_node``; after completion it includes full output.
+    Use this as the recovery path after an SSE disconnect.
+    """
+    invocation_id: str = request.state.invocation_id
+
+    result = invocation_store.load(invocation_id)
+    if result is None:
+        return JSONResponse({"error": "Invocation not found"}, status_code=404)
+
+    return JSONResponse({"invocation_id": invocation_id, **result})
+
+
+if __name__ == "__main__":
+    app.run()
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/requirements.txt b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/requirements.txt
new file mode 100644
index 000000000000..79260e068214
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/requirements.txt
@@ -0,0 +1,4 @@
+azure-ai-agentserver-invocations
+langgraph>=0.2
+langgraph-checkpoint-sqlite>=2.0
+langchain-core>=0.3
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/store.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/store.py
new file mode 100644
index 000000000000..b754b2bf7fa2
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph/store.py
@@ -0,0 +1,57 @@
+"""File-based key→JSON store for powering the invocation API.
+
+This module provides a minimal persistence layer that the HTTP host uses to
+store per-invocation results.  It is **not** part of the durable task
+framework — it is the developer's own persistence for powering the API
+contract (``GET /invocations/{invocation_id}``).
+
+.. warning::
+
+    For demonstration only.  In production, use a database (Redis, Cosmos DB,
+    PostgreSQL, etc.).
+"""
+
+from __future__ import annotations
+
+import json
+import tempfile
+from pathlib import Path
+from typing import Any
+
+
+class FileStore:
+    """Minimal file-backed key→JSON store.
+
+    Each entry is a single JSON file.  Writes are atomic (temp + rename).
+    """
+
+    def __init__(self, base_dir: Path) -> None:
+        self._base = base_dir
+        self._base.mkdir(parents=True, exist_ok=True)
+
+    def save(self, key: str, data: dict[str, Any]) -> None:
+        """Atomically write *data* as JSON — temp file + rename."""
+        target = self._base / f"{key}.json"
+        fd, tmp_path = tempfile.mkstemp(dir=str(self._base), suffix=".tmp", prefix=f"{key}_")
+        try:
+            with open(fd, "w", encoding="utf-8") as f:
+                json.dump(data, f, indent=2)
+            Path(tmp_path).replace(target)
+        except BaseException:
+            Path(tmp_path).unlink(missing_ok=True)
+            raise
+
+    def load(self, key: str) -> dict[str, Any] | None:
+        """Return the stored dict, or ``None`` if the key does not exist."""
+        path = self._base / f"{key}.json"
+        if path.exists():
+            return json.loads(path.read_text())
+        return None
+
+    def delete(self, key: str) -> bool:
+        """Remove the entry for *key*.  Returns ``True`` if it existed."""
+        path = self._base / f"{key}.json"
+        if path.exists():
+            path.unlink()
+            return True
+        return False
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/__init__.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/__init__.py
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/agent.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/agent.py
new file mode 100644
index 000000000000..aa0ce72925f8
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/agent.py
@@ -0,0 +1,113 @@
+"""Durable multi-turn session agent (invocations protocol).
+
+Defines the durable task that powers a sticky conversation session.
+Each invocation runs this function from the top — ``ctx.entry_mode``
+tells us whether this is a fresh start, a resume, or a crash recovery.
+
+This sample demonstrates the **named-namespace metadata** facility:
+
+- ``ctx.metadata`` (default namespace) holds invocation-level state —
+  the most-recent reply and turn count for the *current* invocation.
+- ``ctx.metadata("session")`` (named namespace) holds session-level
+  state — the full conversation history that persists across many
+  invocations of the same session.
+
+Both namespaces are durable. On ``ctx.entry_mode == "recovered"`` the
+handler reads the session history out of the named namespace (it was
+already flushed by a prior lifetime), appends the current turn, and
+flushes again before suspending. There is no external file-store
+involved — the durable primitive owns the persistence.
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+from azure.ai.agentserver.core.durable import TaskContext, multi_turn_task
+
+logger = logging.getLogger(__name__)
+
+
+def _generate_reply(turn: int, last_msg: str) -> str:
+    """Placeholder for an LLM call.  Replace with your model of choice."""
+
+    if turn == 1:
+        return f"Thanks for reaching out! You said: '{last_msg}'. " "Could you share more details so I can help?"
+    if turn == 2:
+        return (
+            f"Great, noted: '{last_msg}'. Based on our conversation "
+            "so far, here are some initial thoughts. What else?"
+        )
+    return f"Turn {turn}: incorporating '{last_msg}' — " f"I now have context from {turn} turns of conversation."
+
+
+@multi_turn_task(name="session_workflow")
+async def session_workflow(ctx: TaskContext[dict]) -> dict[str, Any]:
+    """Single durable function for the entire session.
+
+    Each invocation runs this function from the top.
+    ``ctx.entry_mode`` tells us why we were entered.
+
+    Two metadata namespaces are used:
+
+    - default (``ctx.metadata``) — per-invocation state.
+    - ``"session"`` — conversation history that survives across many
+      invocations of the same session.
+    """
+
+    session_id: str = ctx.input["session_id"]
+    message: str = ctx.input["message"]
+    invocation_id: str = ctx.input["invocation_id"]
+
+    # Session-level state (history + turn count) lives in a named namespace
+    # so it is logically separated from per-invocation state.
+    session = ctx.metadata("session")
+    history: list[dict[str, str]] = session.get("history", [])
+    turn_count: int = session.get("turn_count", 0)
+
+    ctx.metadata["invocation_id"] = invocation_id
+    ctx.metadata["status"] = "running"
+    await ctx.metadata.flush()
+
+    if ctx.entry_mode == "recovered":
+        logger.warning("Recovered stale task for session %s", session_id)
+
+    # Handle explicit session end
+    if message.strip().lower() == "done":
+        summary = f"Session complete after {turn_count} turns. " f"Total messages exchanged: {len(history)}."
+        # Clear the session history so a future session_id reuse starts clean.
+        session["history"] = []
+        session["turn_count"] = 0
+        await session.flush()
+
+        result = {"reply": summary, "turn": turn_count, "finished": True}
+        ctx.metadata["status"] = "completed"
+        ctx.metadata["output"] = result
+        await ctx.metadata.flush()
+        return result
+
+    # Process this turn
+    history.append({"role": "user", "content": message})
+    turn_count += 1
+
+    reply = _generate_reply(turn_count, message)
+    history.append({"role": "assistant", "content": reply})
+
+    # Checkpoint session state — survives crash.
+    session["history"] = history
+    session["turn_count"] = turn_count
+    await session.flush()
+
+    # Persist invocation result BEFORE suspending (inside durable boundary).
+    output = {"reply": reply, "turn": turn_count}
+    ctx.metadata["status"] = "completed"
+    ctx.metadata["output"] = output
+    await ctx.metadata.flush()
+
+    # Suspend — the client will resume with the next turn.
+    # multi-turn `return X` is the implicit-suspend signal.
+    # The chain stays alive across turns; ctx.suspend() is not part of
+    # the public surface. The output value flows through
+    # `return output` to the caller's `.result()`.
+    return output
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/app.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/app.py
new file mode 100644
index 000000000000..0e3235daae18
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/app.py
@@ -0,0 +1,119 @@
+"""HTTP host for the durable multi-turn agent.
+
+Wires the durable task (``agent.py``) to the invocations framework.
+Per-invocation results are written by the durable task itself (inside the
+crash-resilient execution boundary), not by a background collector.
+
+Usage::
+
+    pip install azure-ai-agentserver-invocations
+
+    python -m durable_multiturn.app
+    # — or —
+    python app.py
+
+    # Turn 1
+    curl -X POST "http://localhost:8088/invocations?agent_session_id=trip-001" \\
+        -H "Content-Type: application/json" \\
+        -d '{"message": "I want to plan a vacation to Japan"}'
+    # → 202  (x-agent-invocation-id: <inv-1>)
+
+    # Poll that invocation
+    curl "http://localhost:8088/invocations/<inv-1>"
+    # → {"invocation_id": "<inv-1>", "status": "completed", "output": {...}}
+
+    # Turn 2
+    curl -X POST "http://localhost:8088/invocations?agent_session_id=trip-001" \\
+        -H "Content-Type: application/json" \\
+        -d '{"message": "Budget is $5000, 2 weeks"}'
+
+    # End session
+    curl -X POST "http://localhost:8088/invocations?agent_session_id=trip-001" \\
+        -H "Content-Type: application/json" \\
+        -d '{"message": "done"}'
+"""
+
+from __future__ import annotations
+
+from starlette.requests import Request
+from starlette.responses import JSONResponse, Response
+
+from azure.ai.agentserver.core.durable import TaskConflictError
+from azure.ai.agentserver.invocations import InvocationAgentServerHost
+
+from .agent import session_workflow
+
+app = InvocationAgentServerHost()
+
+
+@app.invoke_handler
+async def handle_invoke(request: Request) -> Response:
+    """Start or resume a durable session task.
+
+    Each POST is one invocation.  The durable task is an internal detail
+    — the caller only sees ``invocation_id`` (from platform headers).
+
+    The task itself writes the invocation result to the store inside the
+    durable execution boundary — no background collector needed.
+    """
+    data = await request.json()
+    invocation_id: str = request.state.invocation_id
+    session_id: str = request.state.session_id
+    message: str = data.get("message", "")
+    task_id = f"session-{session_id}"
+
+    try:
+        await session_workflow.start(
+            task_id=task_id,
+            input={
+                "session_id": session_id,
+                "message": message,
+                "invocation_id": invocation_id,
+            },
+        )
+    except TaskConflictError as e:
+        return JSONResponse({"error": str(e)}, status_code=409)
+
+    return JSONResponse(
+        {"invocation_id": invocation_id, "status": "running"},
+        status_code=202,
+    )
+
+
+@app.get_invocation_handler
+async def poll_invocation(request: Request) -> Response:
+    """Poll a specific invocation's result.
+
+    Reads the per-invocation result out of ``ctx.metadata`` for the
+    current session-level durable task — it was written by the durable
+    handler itself inside the execution boundary, so it survives
+    crashes.
+    """
+    invocation_id: str = request.state.invocation_id
+    session_id: str = request.state.session_id
+    task_id = f"session-{session_id}"
+
+    # Task.get + TaskSnapshot removed. Use the
+    # provider directly for read-only inspection (returns TaskInfo).
+    from azure.ai.agentserver.core.durable._manager import get_task_manager
+
+    mgr = get_task_manager()
+    info = await mgr.provider.get(task_id)
+    if info is None:
+        return JSONResponse({"error": "Invocation not found"}, status_code=404)
+
+    payload = info.payload or {}
+    if payload.get("invocation_id") != invocation_id:
+        return JSONResponse({"error": "Invocation not found for this session"}, status_code=404)
+
+    return JSONResponse(
+        {
+            "invocation_id": invocation_id,
+            "status": payload.get("status", info.status),
+            "output": payload.get("output"),
+        }
+    )
+
+
+if __name__ == "__main__":
+    app.run()
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/async_invoke_agent/requirements.txt b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/requirements.txt
similarity index 100%
rename from sdk/agentserver/azure-ai-agentserver-invocations/samples/async_invoke_agent/requirements.txt
rename to sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/requirements.txt
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/store.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/store.py
new file mode 100644
index 000000000000..c1d147d4d408
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn/store.py
@@ -0,0 +1,55 @@
+"""File-based key→JSON store for powering the invocation API.
+
+This module provides a minimal persistence layer that the HTTP host uses to
+store per-invocation results.  It is **not** part of the durable task
+framework — it is the developer's own persistence for powering the API
+contract (``GET /invocations/{invocation_id}``).
+
+.. warning::
+
+    For demonstration only.  In production, use a database (Redis, Cosmos DB,
+    PostgreSQL, etc.).
+"""
+
+from __future__ import annotations
+
+import json
+import tempfile
+from pathlib import Path
+from typing import Any
+
+
+class FileStore:
+    """Minimal file-backed key→JSON store.
+
+    Each entry is a single JSON file.  Writes are atomic (temp + rename).
+    """
+
+    def __init__(self, base_dir: Path) -> None:
+        self._base = base_dir
+        self._base.mkdir(parents=True, exist_ok=True)
+
+    def save(self, key: str, data: dict[str, Any]) -> None:
+        """Atomically write *data* as JSON — temp file + rename."""
+        target = self._base / f"{key}.json"
+        fd, tmp_path = tempfile.mkstemp(dir=str(self._base), suffix=".tmp", prefix=f"{key}_")
+        try:
+            with open(fd, "w") as f:
+                json.dump(data, f, indent=2)
+            Path(tmp_path).replace(target)
+        except BaseException:
+            Path(tmp_path).unlink(missing_ok=True)
+            raise
+
+    def load(self, key: str) -> dict[str, Any] | None:
+        """Return the stored dict, or ``None`` if the key does not exist."""
+        path = self._base / f"{key}.json"
+        if path.exists():
+            return json.loads(path.read_text())
+        return None
+
+    def delete(self, key: str) -> None:
+        """Remove the entry for *key* (no-op if missing)."""
+        path = self._base / f"{key}.json"
+        if path.exists():
+            path.unlink()
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/__init__.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/__init__.py
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/agent.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/agent.py
new file mode 100644
index 000000000000..76c057dbd20a
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/agent.py
@@ -0,0 +1,586 @@
+"""The durable research task — crash-resilient, steerable, long-running.
+
+This is the standalone-sample shape of the larger
+``samples/durable-agent-demo/src/durable-research-agent`` reference
+demo. The reference demo includes deployment scaffolding (Dockerfile,
+agent.yaml) for the Foundry hosting platform; this sample strips all
+of that away and ships only the three files every invocations sample
+ships: ``agent.py``, ``app.py``, and ``requirements.txt`` (plus a
+small co-located ``store.py``). The reference demo remains in tree
+for users who want to see the full hosting layout.
+
+Streaming uses the SDK ``streams`` registry: events for a given turn
+are emitted to ``streams.get_or_create(invocation_id)``. The HTTP
+layer subscribes to the same stream by id (see ``app.py``). On crash
+recovery, ``stream.last_cursor()`` rehydrates the in-process sequence
+counter from disk so we resume numbering from where we left off — no
+gap, no duplicate cursor value.
+
+Per the durable-task primitive's persistence model (see
+``core/docs/durable-task-guide.md``), ``ctx.metadata`` is a
+*small-watermark* store — never a bulk-data store. This handler
+keeps only three small integer watermarks in ``ctx.metadata``
+(``completed_phases``, ``in_progress_phase``, ``completed_subcalls``)
+and parks the in-flight subcall text (potentially several KB) in a
+separate file-backed :class:`CheckpointStore` keyed by the per-turn
+``invocation_id``. The checkpoint-store entry, the wire stream, and
+the metadata watermarks are all reset together at every turn-
+completion boundary (normal completion AND wind-down-via-suspend) so
+the next turn — steered re-entry or otherwise — starts cleanly. We
+explicitly do NOT reset on crash paths: the watermarks left behind
+are exactly what the recovery re-entry needs to resume mid-turn.
+
+Steering is transparent: a new POST while a turn is running enqueues
+the input on the framework's steering queue and sets ``ctx.cancel``.
+The handler observes the cancel at the next checkpoint, winds down
+via `return None` ,
+and the framework re-enters the body with the new ``ctx.input``.
+Because state was cleared at suspend, the re-entered handler naturally
+starts the new topic at phase 0 — no ``is_steered_turn`` check needed
+in handler code.
+
+Input schema: ``{"topic": str, "invocation_id": str}``
+
+Environment:
+
+- ``FOUNDRY_PROJECT_ENDPOINT`` — Azure AI Foundry project endpoint.
+- ``AZURE_AI_MODEL_DEPLOYMENT_NAME`` — model deployment name
+  (default: ``gpt-4.1-mini``).
+- ``NUM_PHASES`` — number of research phases (default: 15).
+- ``CALLS_PER_PHASE`` — sub-calls per phase (default: 4, max 4).
+- ``TARGET_OUTPUT_TOKENS`` — soft cap for per-subcall LLM output
+  (default: 1500).
+- ``INTRA_PHASE_COOLDOWN_SEC`` — wait between subcalls in a phase
+  (default: 10).
+- ``INTER_PHASE_COOLDOWN_SEC`` — wait between phases (default: 20).
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+import time
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any, Awaitable, Callable
+
+from azure.ai.agentserver.core.durable import TaskContext, multi_turn_task
+from azure.ai.agentserver.core.streaming import streams
+
+from .store import CheckpointStore
+
+logger = logging.getLogger(__name__)
+
+
+# --- Server wall-clock helpers ---------------------------------------------
+
+_APP_STARTED_MONOTONIC = time.monotonic()
+
+
+def _now_iso() -> str:
+    """UTC ISO-8601 timestamp with millisecond precision and Z suffix."""
+    now = datetime.now(timezone.utc)
+    return now.strftime("%Y-%m-%dT%H:%M:%S.") + f"{now.microsecond // 1000:03d}Z"
+
+
+def _server_uptime_sec() -> float:
+    """Seconds since this Python process started (resets to ~0 after crash)."""
+    return round(time.monotonic() - _APP_STARTED_MONOTONIC, 1)
+
+
+# --- Azure AI client setup -------------------------------------------------
+
+_endpoint = os.environ.get("FOUNDRY_PROJECT_ENDPOINT")
+_model = os.environ.get("AZURE_AI_MODEL_DEPLOYMENT_NAME", "gpt-4.1-mini")
+
+_openai_client: Any = None
+
+
+def _get_client() -> Any:
+    """Lazy Azure AI client construction — kept out of import-time so the
+    sample can be imported in test / static-analysis contexts that don't
+    have an Azure endpoint configured."""
+
+    global _openai_client  # pylint: disable=global-statement
+    if _openai_client is not None:
+        return _openai_client
+    if not _endpoint:
+        raise EnvironmentError("FOUNDRY_PROJECT_ENDPOINT is required to run the deep-research sample.")
+    from azure.ai.projects.aio import (  # pylint: disable=import-outside-toplevel
+        AIProjectClient,
+    )
+
+    # Local-dev escape hatch: ``AZURE_AI_CREDENTIAL=cli`` forces use of
+    # AzureCliCredential alone. Useful in environments where IMDS is
+    # available but the assigned MSI doesn't have access to the target
+    # Foundry resource (e.g., dev VMs with their own MSI), so
+    # DefaultAzureCredential would grab the wrong identity from the
+    # chain. Production / hosted runs leave the env var unset and use
+    # the standard DefaultAzureCredential chain.
+    cred_mode = os.environ.get("AZURE_AI_CREDENTIAL", "").strip().lower()
+    if cred_mode == "cli":
+        from azure.identity.aio import (  # pylint: disable=import-outside-toplevel
+            AzureCliCredential,
+        )
+
+        credential: Any = AzureCliCredential()
+    else:
+        from azure.identity.aio import (  # pylint: disable=import-outside-toplevel
+            DefaultAzureCredential,
+        )
+
+        credential = DefaultAzureCredential()
+
+    project = AIProjectClient(endpoint=_endpoint, credential=credential)
+    _openai_client = project.get_openai_client()
+    return _openai_client
+
+
+# --- File-backed checkpoint store (heavy artifacts live here) --------------
+
+_CHECKPOINT_DIR = Path.home() / ".durable" / "_checkpoints"
+_checkpoint_store = CheckpointStore(_CHECKPOINT_DIR)
+
+
+# --- Research phase plan ---------------------------------------------------
+
+PHASE_TITLES = [
+    "Decomposing topic into focused research questions",
+    "Surveying foundational literature and key concepts",
+    "Identifying leading researchers and institutions",
+    "Mapping the historical trajectory of the field",
+    "Analyzing recent breakthroughs and publications",
+    "Examining competing theories and methodological debates",
+    "Evaluating experimental evidence and data quality",
+    "Mapping connections to adjacent fields",
+    "Identifying open problems and knowledge gaps",
+    "Assessing real-world applications and current adoption",
+    "Analyzing funding landscape and research trends",
+    "Surveying ethical considerations and societal implications",
+    "Projecting near-term and long-term outlook",
+    "Synthesizing findings into a coherent narrative",
+    "Generating key insights and concrete recommendations",
+]
+
+_SUB_CALL_ROLES = [
+    (
+        "research",
+        "Conduct an in-depth investigation of the assigned aspect. Include "
+        "specific findings, examples, and references where you can. Aim for "
+        "substantive, multi-paragraph content.",
+    ),
+    (
+        "critique",
+        "Critically evaluate the research above. Identify weak claims, gaps, "
+        "competing interpretations, and quality concerns. Be specific.",
+    ),
+    (
+        "refine",
+        "Revise the original research, incorporating the critique. Strengthen "
+        "weak claims, address gaps, and clarify uncertainty. Produce a "
+        "tightened, more rigorous version.",
+    ),
+    (
+        "synthesize",
+        "Distill the refined material into 2-3 paragraphs of key takeaways "
+        "suitable for someone briefing a decision-maker on this phase.",
+    ),
+]
+
+NUM_PHASES = max(1, int(os.environ.get("NUM_PHASES", str(len(PHASE_TITLES)))))
+CALLS_PER_PHASE = max(1, min(len(_SUB_CALL_ROLES), int(os.environ.get("CALLS_PER_PHASE", "4"))))
+TARGET_OUTPUT_TOKENS = int(os.environ.get("TARGET_OUTPUT_TOKENS", "1500"))
+INTRA_PHASE_COOLDOWN_SEC = float(os.environ.get("INTRA_PHASE_COOLDOWN_SEC", "10"))
+INTER_PHASE_COOLDOWN_SEC = float(os.environ.get("INTER_PHASE_COOLDOWN_SEC", "20"))
+
+
+def _phase_title(i: int) -> str:
+    return PHASE_TITLES[i] if i < len(PHASE_TITLES) else f"Continued research (phase {i + 1})"
+
+
+# --- The durable task ------------------------------------------------------
+
+# Type alias: the per-turn emit function the helpers below take. It
+# wraps stream.emit() with auto-increment of ``sequence_number``.
+EmitFn = Callable[[dict], Awaitable[None]]
+
+
+async def _finish_turn(stream: Any, ctx: TaskContext, inv_id: str) -> None:
+    """Tear down per-turn resources at every non-crash exit.
+
+    Steered re-entries, operator cancels, timeouts, and normal
+    completions all flow through here. We:
+
+    1. Close the wire stream so SSE subscribers see the terminator
+       before the framework reports the turn as suspended / completed.
+    2. Wipe ``ctx.metadata`` watermarks so the NEXT turn — steered
+       re-entry on the same task, or a fresh ``start()`` — naturally
+       starts at phase 0 without any "is this a steered turn?"
+       branching.
+    3. Delete this invocation's checkpoint-store entry so disk
+       usage doesn't grow with completed turns.
+
+    We explicitly do NOT call this on crash paths: the wire stream
+    must stay OPEN (per the orchestrator's
+    ``leave_stream_open_for_recovery`` contract) and the watermarks
+    must remain so the recovery re-entry can resume mid-turn.
+    """
+    await stream.close()
+    ctx.metadata.pop("completed_phases", None)
+    ctx.metadata.pop("in_progress_phase", None)
+    ctx.metadata.pop("completed_subcalls", None)
+    _checkpoint_store.delete(inv_id)
+
+
+@multi_turn_task(name="deep_research", steerable=True)
+async def deep_research(ctx: TaskContext[dict]) -> None:
+    """Long-running deep-research task: crash-resilient, steerable.
+
+    Checkpointing is **per subcall**, not just per phase. After each
+    LLM subcall finishes we (a) advance the three small integer
+    watermarks on ``ctx.metadata`` and (b) write the in-flight phase
+    text to the file-backed checkpoint store keyed by the
+    per-invocation id. On recovery we resume the in-progress phase at
+    the next un-finished subcall, re-using the text we had streamed
+    before the crash — so the worst case is one wasted subcall (the
+    one that was actively streaming when the container died).
+
+    The body returns ``None`` on normal completion (and also on the
+    steered-wind-down path — bare ``return`` is the
+    implicit-suspend signal; the chain stays alive across turns).
+    Clients read progress + final content from the per-turn SSE
+    stream, not from the task's terminal output, so there is no
+    return-value payload to construct.
+    """
+    topic: str = ctx.input["topic"]
+    inv_id: str = ctx.input["invocation_id"]
+
+    stream = await streams.get_or_create(inv_id)
+    # On crash recovery, last_cursor() returns the highest
+    # sequence_number that made it to disk before the crash.
+    last_cursor = await stream.last_cursor()
+    seq = last_cursor or 0
+
+    async def emit(payload: dict) -> None:
+        nonlocal seq
+        seq += 1
+        await stream.emit({"sequence_number": seq, **payload})
+
+    await _emit_run_start(emit, ctx, topic=topic)
+
+    try:
+        completed: int = ctx.metadata.get("completed_phases", 0)
+
+        if ctx.entry_mode == "recovered" and completed > 0:
+            await emit(
+                {
+                    "type": "recovered",
+                    "completed_phases": completed,
+                    "total_phases": NUM_PHASES,
+                    "server_time_utc": _now_iso(),
+                    "server_uptime_sec": _server_uptime_sec(),
+                }
+            )
+
+        for phase_idx in range(completed, NUM_PHASES):
+            if ctx.cancel.is_set():
+                return await _wind_down(emit, stream, ctx, inv_id, phase_idx)
+
+            phase_started_mono = time.monotonic()
+            title = _phase_title(phase_idx)
+
+            await emit(
+                {
+                    "type": "phase_start",
+                    "phase": phase_idx + 1,
+                    "total": NUM_PHASES,
+                    "title": title,
+                    "server_time_utc": _now_iso(),
+                    "server_uptime_sec": _server_uptime_sec(),
+                }
+            )
+
+            await _run_phase(emit, ctx, inv_id, phase_idx, topic, title)
+
+            # --- PHASE-COMPLETE CHECKPOINT ---
+            ctx.metadata["completed_phases"] = phase_idx + 1
+            ctx.metadata["in_progress_phase"] = None
+            ctx.metadata["completed_subcalls"] = 0
+            _checkpoint_store.delete(inv_id)
+            await ctx.metadata.flush()
+
+            phase_duration = round(time.monotonic() - phase_started_mono, 1)
+            await emit(
+                {
+                    "type": "phase_end",
+                    "phase": phase_idx + 1,
+                    "total": NUM_PHASES,
+                    "title": title,
+                    "server_time_utc": _now_iso(),
+                    "server_uptime_sec": _server_uptime_sec(),
+                    "duration_sec": phase_duration,
+                }
+            )
+
+            if ctx.cancel.is_set():
+                return await _wind_down(emit, stream, ctx, inv_id, phase_idx + 1)
+
+            if phase_idx + 1 < NUM_PHASES and INTER_PHASE_COOLDOWN_SEC > 0:
+                await _cooldown(
+                    emit,
+                    ctx,
+                    INTER_PHASE_COOLDOWN_SEC,
+                    stage="inter_phase",
+                    phase=phase_idx + 2,
+                    total=NUM_PHASES,
+                )
+                if ctx.cancel.is_set():
+                    return await _wind_down(emit, stream, ctx, inv_id, phase_idx + 1)
+
+        await emit(
+            {
+                "type": "run_complete",
+                "server_time_utc": _now_iso(),
+                "server_uptime_sec": _server_uptime_sec(),
+                "phases_completed": NUM_PHASES,
+            }
+        )
+        # Normal completion: close stream + wipe watermarks + clear
+        # checkpoint entry. Skipped on crash (the handler exits via an
+        # exception and the orchestrator's leave_stream_open_for_recovery
+        # path keeps the stream open for the next-lifetime recovery).
+        await _finish_turn(stream, ctx, inv_id)
+    except Exception as exc:  # pylint: disable=broad-except
+        # Logical-failure path: a downstream call (e.g. the LLM) raised.
+        # Emit a terminal SSE frame so subscribers fast-fail instead of
+        # hanging on the open stream, then close the stream and re-raise
+        # so the framework records the task as failed.
+        #
+        # We catch ``Exception`` (not ``BaseException``) so cooperative
+        # cancellation (``asyncio.CancelledError``) and process death
+        # (SIGKILL, where the handler doesn't run at all) still flow
+        # through their normal paths — the orchestrator's
+        # ``leave_stream_open_for_recovery`` contract still holds for
+        # true crashes.
+        logger.exception("deep_research task failed; emitting terminal SSE frame")
+        try:
+            await emit(
+                {
+                    "type": "run_failed",
+                    "error": {
+                        "type": type(exc).__name__,
+                        "message": str(exc)[:2000],
+                    },
+                    "server_time_utc": _now_iso(),
+                    "server_uptime_sec": _server_uptime_sec(),
+                }
+            )
+            await _finish_turn(stream, ctx, inv_id)
+        except Exception:  # pylint: disable=broad-except
+            # If terminal-frame emission itself fails (e.g. stream is
+            # already gone) we still want to surface the original task
+            # failure rather than swallow it.
+            logger.exception("failed to emit terminal run_failed frame")
+        raise
+
+
+# --- Helpers ---------------------------------------------------------------
+
+
+async def _emit_run_start(emit: EmitFn, ctx: TaskContext, *, topic: str) -> None:
+    await emit(
+        {
+            "type": "run_start",
+            "topic": topic,
+            "entry_mode": ctx.entry_mode,
+            "total_phases": NUM_PHASES,
+            "calls_per_phase": CALLS_PER_PHASE,
+            "server_time_utc": _now_iso(),
+            "server_uptime_sec": _server_uptime_sec(),
+        }
+    )
+
+
+async def _wind_down(
+    emit: EmitFn,
+    stream,
+    ctx: TaskContext,
+    inv_id: str,
+    completed_phases: int,
+):
+    """Cooperative wind-down at a phase boundary.
+
+    Tears down per-turn resources (stream close + metadata wipe +
+    checkpoint-store clear) via :func:`_finish_turn` BEFORE the handler
+    returns. The multi-turn ``return`` is the
+    implicit-suspend signal — so the SSE subscriber observes a clean
+    terminator before the framework reports the turn as suspended, and
+    the steered re-entry (or any future ``start()``) finds metadata wiped.
+    """
+    if ctx.timeout_exceeded:
+        cause = "timeout"
+    elif ctx.cancel_requested:
+        cause = "operator_cancel"
+    else:
+        cause = "steering"
+
+    await emit(
+        {
+            "type": "winding_down",
+            "cause": cause,
+            "completed_phases": completed_phases,
+            "total_phases": NUM_PHASES,
+            "pending_steering_inputs": ctx.pending_input_count,
+            "server_time_utc": _now_iso(),
+            "server_uptime_sec": _server_uptime_sec(),
+        }
+    )
+
+    await _finish_turn(stream, ctx, inv_id)
+    # multi-turn `return` is the implicit-suspend signal.
+    # The chain stays alive across turns; ctx.suspend() is not part of
+    # the public surface.
+    return None
+
+
+async def _cooldown(
+    emit: EmitFn,
+    ctx: TaskContext,
+    duration_sec: float,
+    *,
+    stage: str,
+    phase: int,
+    total: int,
+    subcall=None,
+    of=None,
+) -> None:
+    """Cooldown wait with a visible client-side marker."""
+    payload = {
+        "type": "cooldown",
+        "duration_sec": duration_sec,
+        "stage": stage,
+        "phase": phase,
+        "total": total,
+        "server_time_utc": _now_iso(),
+        "server_uptime_sec": _server_uptime_sec(),
+    }
+    if subcall is not None:
+        payload["subcall"] = subcall
+    if of is not None:
+        payload["of"] = of
+    await emit(payload)
+    try:
+        await asyncio.wait_for(ctx.cancel.wait(), timeout=duration_sec)
+    except asyncio.TimeoutError:
+        pass
+
+
+async def _run_phase(
+    emit: EmitFn,
+    ctx: TaskContext,
+    inv_id: str,
+    phase_idx: int,
+    topic: str,
+    phase_title: str,
+) -> None:
+    """Run the sub-call loop for one phase.
+
+    Checkpoints after each completed subcall so a crash mid-phase
+    recovers at the next un-finished subcall (loses at most the one
+    that was actively streaming). The in-flight phase text lives in
+    the file-backed checkpoint store keyed by ``inv_id``; the
+    subcall index lives in ``ctx.metadata`` as a small watermark.
+    """
+    in_progress = ctx.metadata.get("in_progress_phase")
+    if in_progress == phase_idx:
+        start_sub = int(ctx.metadata.get("completed_subcalls", 0) or 0)
+        current_text = _checkpoint_store.get(inv_id)
+    else:
+        start_sub = 0
+        current_text = ""
+        ctx.metadata["in_progress_phase"] = phase_idx
+        ctx.metadata["completed_subcalls"] = 0
+        _checkpoint_store.delete(inv_id)
+        await ctx.metadata.flush()
+
+    for sub_idx in range(start_sub, CALLS_PER_PHASE):
+        role_name, role_prompt = _SUB_CALL_ROLES[sub_idx]
+        instructions = (
+            "You are a research analyst working on the topic: '" + topic + "'.\n"
+            "Current phase: '" + phase_title + "'.\n"
+            "Your role in this sub-step: " + role_name + ".\n\n" + role_prompt
+        )
+        if current_text:
+            user_input = (
+                "Topic: " + topic + "\nPhase: " + phase_title + "\n\n" "Previous sub-step output:\n" + current_text
+            )
+        else:
+            user_input = "Topic: " + topic + "\nPhase: " + phase_title
+
+        await emit(
+            {
+                "type": "subcall_start",
+                "role": role_name,
+                "index": sub_idx + 1,
+                "of": CALLS_PER_PHASE,
+                "server_time_utc": _now_iso(),
+            }
+        )
+
+        sub_text = await _stream_llm(
+            emit,
+            instructions=instructions,
+            user_input=user_input,
+        )
+
+        await emit(
+            {
+                "type": "subcall_end",
+                "role": role_name,
+                "index": sub_idx + 1,
+                "of": CALLS_PER_PHASE,
+                "server_time_utc": _now_iso(),
+            }
+        )
+
+        current_text = sub_text
+
+        _checkpoint_store.put(inv_id, current_text)
+        ctx.metadata["completed_subcalls"] = sub_idx + 1
+        await ctx.metadata.flush()
+
+        if sub_idx + 1 < CALLS_PER_PHASE and INTRA_PHASE_COOLDOWN_SEC > 0:
+            await _cooldown(
+                emit,
+                ctx,
+                INTRA_PHASE_COOLDOWN_SEC,
+                stage="intra_phase",
+                phase=phase_idx + 1,
+                total=NUM_PHASES,
+                subcall=sub_idx + 2,
+                of=CALLS_PER_PHASE,
+            )
+            if ctx.cancel.is_set():
+                break
+
+
+async def _stream_llm(emit: EmitFn, *, instructions: str, user_input: str) -> str:
+    """One streaming LLM call. Forwards token deltas via the per-turn stream."""
+    full_text = ""
+    client = _get_client()
+    async for event in await client.responses.create(
+        model=_model,
+        instructions=instructions,
+        input=user_input,
+        store=False,
+        stream=True,
+        max_output_tokens=TARGET_OUTPUT_TOKENS,
+    ):
+        if event.type == "response.output_text.delta":
+            full_text += event.delta
+            await emit({"type": "token", "content": event.delta})
+    return full_text
+
+
+__all__ = ["deep_research", "PHASE_TITLES", "NUM_PHASES", "CALLS_PER_PHASE"]
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/app.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/app.py
new file mode 100644
index 000000000000..5a764e4b1fbf
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/app.py
@@ -0,0 +1,299 @@
+"""HTTP host for the durable deep-research agent.
+
+Exposes the ``deep_research`` durable task over the invocations
+protocol with the FULL pattern matrix:
+
+- ``POST /invocations`` with body ``{"topic": "..."}`` and an
+  ``Accept: text/event-stream`` header — returns a live SSE stream of
+  events as the research progresses.
+- ``POST /invocations`` without the header — returns ``202`` with the
+  ``invocation_id``; clients then connect to the GET endpoint to
+  stream OR poll.
+- ``GET /invocations/{id}`` with ``Accept: text/event-stream`` and an
+  optional ``?last_event_id=N`` query — streams the per-turn events,
+  skipping anything the client already saw (the cursor is the
+  event's monotonic ``sequence_number``). Works for both freshly-
+  started turns and turns that have been running for a while.
+- ``GET /invocations/{id}`` without the SSE accept header — returns a
+  JSON snapshot of the task's current status / payload.
+- ``POST /invocations/{id}/cancel`` — operator cancel of the
+  per-session task (steering is automatic via re-POSTing instead).
+
+Streaming wiring ():
+
+- ``streams.use_file_backed_replay(...)`` is called once at module
+  import (app startup) per streaming.md §7.8. The file-backed
+  backing persists events to disk so a subscriber reconnecting after
+  a container crash + restart sees the pre-crash + post-crash
+  events with no gap.
+- ``cursor_fn`` reads the event's ``sequence_number`` (stamped by
+  the agent's ``emit`` closure) so ``?last_event_id=N`` reconnects
+  skip exactly the events the client already received.
+- The HTTP layer extracts ``invocation_id`` from
+  ``request.state.invocation_id`` (per-turn identifier per §7.8),
+  reserves the stream id BEFORE starting the task, and propagates
+  ``invocation_id`` to the handler via
+  ``task.start(input={"invocation_id": inv_id, ...})``.
+- The handler reads ``ctx.input["invocation_id"]`` and calls
+  ``await streams.get_or_create(inv_id)`` — gets the SAME
+  registry-cached instance.
+
+Recovery: if the container crashes mid-research and is restarted,
+the framework re-invokes ``deep_research`` with
+``ctx.entry_mode == "recovered"`` and the same input. The same
+``invocation_id`` is preserved; the file-backed stream is rehydrated
+from disk so reconnecting subscribers (including the original POST-
+SSE client if it reattaches via GET) see the pre-crash events plus
+a fresh ``type: "recovered"`` marker plus the post-crash continuation.
+
+Steering: a new POST while a turn is running enqueues the input as a
+steering input — the agent winds down the current turn at the next
+checkpoint via ``_finish_turn`` (which closes the per-turn stream
+cleanly) and the framework re-enters with the new ``ctx.input``.
+The new turn gets a new ``invocation_id`` from the platform; the
+new ``invocation_id`` is the new stream id. The HTTP layer does not
+need to distinguish steered turns from fresh turns — see
+``agent.py`` for the discipline.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+from collections.abc import AsyncGenerator
+from pathlib import Path
+from typing import Any
+
+from starlette.requests import Request
+from starlette.responses import JSONResponse, Response, StreamingResponse
+
+from azure.ai.agentserver.core.streaming import (
+    EventStream,
+    EventStreamNotFoundError,
+    EventStreamNotFoundError,
+    streams,
+)
+from azure.ai.agentserver.invocations import InvocationAgentServerHost
+
+from .agent import deep_research
+
+logger = logging.getLogger(__name__)
+
+# --- Streams bootstrap (run once at module import) -------------------------
+
+# Per-turn streams persist to disk so they survive a container crash +
+# restart. ``cursor_fn`` reads the agent's natural sequence number so
+# ``?last_event_id=N`` reconnects skip already-delivered events.
+# ``ttl_seconds=600`` bounds disk usage: once a stream is closed and
+# all its events have aged out, the registry destroys it and removes
+# the file.
+# (Spec 024 Phase 3a) Default streams dir lives under the unified
+# AGENTSERVER_DURABLE_ROOT layout at ``<root>/streams/`` — same place
+# the responses package puts its SSE event store.
+from azure.ai.agentserver.core.storage_paths import resolve_durable_subdir
+
+_STREAM_DIR = Path(os.environ.get("AGENTSERVER_STREAMS_DIR", str(resolve_durable_subdir("streams"))))
+_STREAM_DIR.mkdir(parents=True, exist_ok=True)
+
+streams.use_file_backed_replay(
+    storage_dir=_STREAM_DIR,
+    cursor_fn=lambda ev: ev["sequence_number"],
+    ttl_seconds=600,
+)
+
+app = InvocationAgentServerHost()
+
+
+# --- SSE rendering ---------------------------------------------------------
+
+
+async def _sse_from_stream(
+    stream: EventStream,
+    invocation_id: str,
+    *,
+    skip_after: int | None = None,
+) -> AsyncGenerator[bytes, None]:
+    """Render a stream's events as SSE-formatted bytes.
+
+    Each event's ``sequence_number`` becomes the SSE ``id:`` field so
+    a reconnecting client can pass it back as ``Last-Event-ID`` (or
+    ``?last_event_id=N``) and pick up from there. The terminator
+    payload is emitted on clean stream close; ``EventStreamNotFoundError``
+    (the stream was destroyed under us) flushes a ``superseded``
+    event so the consumer can tell stream-end from "you got cut off".
+    """
+    try:
+        async for chunk in stream.subscribe(after=skip_after):
+            seq = chunk.get("sequence_number", "")
+            yield f"id: {seq}\ndata: {json.dumps(chunk)}\n\n".encode()
+        done = {"type": "done", "invocation_id": invocation_id}
+        yield f"event: done\ndata: {json.dumps(done)}\n\n".encode()
+    except EventStreamNotFoundError:
+        superseded = {"type": "superseded", "invocation_id": invocation_id}
+        yield f"event: superseded\ndata: {json.dumps(superseded)}\n\n".encode()
+
+
+# --- Invocation handlers ---------------------------------------------------
+
+
+@app.invoke_handler
+async def handle_invoke(request: Request) -> Response:
+    """Dispatch a research task with full pattern coverage.
+
+    Body: ``{"topic": "<topic>"}``.
+
+    If ``Accept: text/event-stream`` is set, returns a live SSE
+    stream of the new turn's events (POST-SSE pattern). Otherwise
+    returns ``202 Accepted`` with the ``invocation_id`` for clients
+    that prefer to connect via GET (poll OR GET-SSE pattern).
+
+    A POST while a steerable run is already in progress on this
+    session enqueues the input as a steering input — the running
+    turn winds down at the next checkpoint and the framework
+    re-enters with the new topic. The new turn streams to the new
+    ``invocation_id`` reserved here.
+    """
+    body = await request.body()
+    try:
+        data = json.loads(body) if body else {}
+    except json.JSONDecodeError:
+        data = {}
+    topic = str(data.get("topic") or data.get("message") or "").strip()
+    if not topic:
+        return JSONResponse(
+            {"error": "Provide a 'topic' field"},
+            status_code=400,
+        )
+
+    invocation_id: str = request.state.invocation_id
+    session_id: str = request.state.session_id
+    # ONE durable task per session so steering finds the active run.
+    # invocation_id labels THIS turn; session_id labels the long-
+    # lived task.
+    task_id = f"research-{session_id}"
+
+    # Reserve the per-turn stream id BEFORE starting the task. The
+    # file-backed replay backing means even if no subscriber attaches
+    # before the handler emits, the events go to disk and a later
+    # subscriber catches up via ``?last_event_id=N``.
+    stream = await streams.get_or_create(invocation_id)
+
+    # Steering is transparent: for a ``steerable=True`` task,
+    # ``task.start()`` queues the input on the in-progress task's
+    # steering queue WITHOUT raising. See ``agent.py`` for the
+    # ``_finish_turn`` discipline that makes this safe.
+    await deep_research.start(
+        task_id=task_id,
+        input={"topic": topic, "invocation_id": invocation_id},
+    )
+
+    if "text/event-stream" in request.headers.get("accept", ""):
+        return StreamingResponse(
+            _sse_from_stream(stream, invocation_id),
+            media_type="text/event-stream",
+            headers={"Cache-Control": "no-cache", "Connection": "keep-alive"},
+        )
+
+    return JSONResponse(
+        {
+            "status": "started",
+            "invocation_id": invocation_id,
+            "session_id": session_id,
+            "task_id": task_id,
+        },
+        status_code=202,
+    )
+
+
+@app.get_invocation_handler
+async def handle_get(request: Request) -> Response:
+    """Stream OR poll the per-invocation state.
+
+    With ``Accept: text/event-stream``: SSE stream of the turn's
+    events. ``?last_event_id=N`` (or the standard ``Last-Event-ID``
+    header) skips events whose ``sequence_number`` <= N — the
+    file-backed replay backing serves the gap from disk before
+    live-tailing.
+
+    Without the SSE accept header: returns the task's current
+    snapshot from ``deep_research.get(task_id)``.
+
+    HTTP mapping (from  streaming.md §exceptions table):
+      - 404 if the invocation id was never seen
+        (``EventStreamNotFoundError``).
+      - 410 if the stream was destroyed via TTL eviction or explicit
+        ``streams.delete`` (``EventStreamNotFoundError``).
+    """
+    invocation_id: str = request.state.invocation_id
+
+    wants_stream = "text/event-stream" in request.headers.get("accept", "")
+    if wants_stream:
+        last_event_id_q = request.query_params.get("last_event_id", "")
+        last_event_id_h = request.headers.get("last-event-id", "")
+        raw = last_event_id_q or last_event_id_h
+        skip_after: int | None = int(raw) if raw.isdigit() else None
+
+        try:
+            stream = await streams.get(invocation_id)
+        except EventStreamNotFoundError:
+            return JSONResponse(
+                {"status": "not_found", "message": "No stream for this invocation id."},
+                status_code=404,
+            )
+        except EventStreamNotFoundError:
+            return JSONResponse(
+                {"status": "gone", "message": "Stream for this invocation id has been destroyed."},
+                status_code=410,
+            )
+
+        return StreamingResponse(
+            _sse_from_stream(stream, invocation_id, skip_after=skip_after),
+            media_type="text/event-stream",
+            headers={"Cache-Control": "no-cache", "Connection": "keep-alive"},
+        )
+
+    # JSON-snapshot path (polling clients).
+    session_id: str = request.state.session_id
+    task_id = f"research-{session_id}"
+    # Task.get + TaskSnapshot removed. Use the
+    # provider directly for read-only inspection (returns TaskInfo).
+    from azure.ai.agentserver.core.durable._manager import get_task_manager
+
+    mgr = get_task_manager()
+    info: Any = await mgr.provider.get(task_id)
+    if info is None:
+        return JSONResponse({"error": "Task not found"}, status_code=404)
+    return JSONResponse(
+        {
+            "task_id": task_id,
+            "invocation_id": invocation_id,
+            "status": info.status,
+            "payload": info.payload,
+        }
+    )
+
+
+@app.cancel_invocation_handler
+async def handle_cancel(request: Request) -> Response:
+    """Cancel the running research task.
+
+    Cancel applies to the per-session durable task (``task_id ==
+    f"research-{session_id}"``). The handler observes
+    ``ctx.cancel.is_set()`` and runs its cooperative wind-down at
+    the next checkpoint, which closes the per-turn stream before
+    suspending.
+    """
+    session_id: str = request.state.session_id
+    task_id = f"research-{session_id}"
+
+    run = await deep_research.get_active_run(task_id)  # type: ignore[attr-defined]
+    if run is None:
+        return JSONResponse({"status": "not_found", "message": "No active task to cancel."})
+
+    await run.cancel()
+    return JSONResponse({"status": "cancelled", "message": "Task cancellation requested."})
+
+
+if __name__ == "__main__":
+    app.run()
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/requirements.txt b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/requirements.txt
new file mode 100644
index 000000000000..7418b4f2e1be
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/requirements.txt
@@ -0,0 +1,5 @@
+azure-ai-agentserver-core
+azure-ai-agentserver-invocations
+azure-ai-projects
+azure-identity
+openai
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/store.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/store.py
new file mode 100644
index 000000000000..4c8a9a004f7a
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research/store.py
@@ -0,0 +1,63 @@
+"""File-backed checkpoint store for in-flight LLM content.
+
+``ctx.metadata`` on the durable-task primitive is a *small-watermark*
+store, not a bulk-data store (see ``core/docs/durable-task-guide.md``
+§"Persistence Model"). For anything heavier than a few bytes — e.g.
+the partially-streamed text of the current phase's in-flight subcall
+chain — the application is expected to maintain its own per-app
+checkpoint store and just keep a *reference* in metadata.
+
+This file is the minimal local checkpoint store for the durable
+research agent. Each in-flight invocation's text is a JSON blob keyed
+by ``invocation_id``. Writes are atomic (tempfile + rename) so a
+crash mid-write leaves either the old value or the new value, never a
+truncated file. The store is deliberately tiny — no metrics, no
+contention handling — because this is a sample, not a production
+component. In production, swap this for a real durable blob store
+(Cosmos, blob storage, etc.).
+
+The store survives container restarts via the same on-disk directory
+used by the streams registry; it does not survive task deletion.
+"""
+
+from __future__ import annotations
+
+import json
+import tempfile
+from pathlib import Path
+
+
+class CheckpointStore:
+    """File-backed key->str blob store with atomic writes."""
+
+    def __init__(self, base_dir: Path) -> None:
+        self._base = base_dir
+        self._base.mkdir(parents=True, exist_ok=True)
+
+    def _path(self, key: str) -> Path:
+        return self._base / f"{key}.json"
+
+    def get(self, key: str) -> str:
+        """Return the stored text, or empty string if absent."""
+        path = self._path(key)
+        if not path.exists():
+            return ""
+        return json.loads(path.read_text(encoding="utf-8"))
+
+    def put(self, key: str, value: str) -> None:
+        """Atomically write *value* — temp file + rename."""
+        target = self._path(key)
+        fd, tmp = tempfile.mkstemp(dir=str(self._base), prefix=f"{key}_", suffix=".tmp")
+        try:
+            with open(fd, "w", encoding="utf-8") as fh:
+                json.dump(value, fh)
+            Path(tmp).replace(target)
+        except BaseException:
+            Path(tmp).unlink(missing_ok=True)
+            raise
+
+    def delete(self, key: str) -> None:
+        """Remove *key* if present; no-op otherwise."""
+        path = self._path(key)
+        if path.exists():
+            path.unlink()
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/multiturn_invoke_agent/multiturn_invoke_agent.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/multiturn_invoke_agent/multiturn_invoke_agent.py
deleted file mode 100644
index 96fa857bf02c..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-invocations/samples/multiturn_invoke_agent/multiturn_invoke_agent.py
+++ /dev/null
@@ -1,102 +0,0 @@
-"""Multi-turn session invoke agent example.
-
-Demonstrates session-based conversations where context accumulates
-across multiple invocations via the ``agent_session_id`` query parameter.
-
-.. warning::
-
-    **In-memory demo only.**  Session history is stored in process memory
-    and is lost on restart.  For production use, persist history to
-    durable storage (Redis, Cosmos DB, etc.).
-
-Usage::
-
-    # Start the agent
-    python multiturn_invoke_agent.py
-
-    # Turn 1 — start planning
-    curl -X POST "http://localhost:8088/invocations?agent_session_id=trip-001" \
-        -H "Content-Type: application/json" \
-        -d '{"message": "I want to plan a vacation"}'
-    # -> {"reply": "Welcome! Where would you like to go, and for how long?", ...}
-
-    # Turn 2 — provide details
-    curl -X POST "http://localhost:8088/invocations?agent_session_id=trip-001" \
-        -H "Content-Type: application/json" \
-        -d '{"message": "Japan for 2 weeks, interested in culture and food"}'
-    # -> {"reply": "Great choice! What is your budget ...?", ...}
-
-    # Turn 3 — add constraints
-    curl -X POST "http://localhost:8088/invocations?agent_session_id=trip-001" \
-        -H "Content-Type: application/json" \
-        -d '{"message": "Budget is $5000, prefer direct flights"}'
-    # -> {"reply": "Here is a suggested itinerary ...", ...}
-"""
-from starlette.requests import Request
-from starlette.responses import JSONResponse, Response
-
-from azure.ai.agentserver.invocations import InvocationAgentServerHost
-
-
-app = InvocationAgentServerHost()
-
-# In-memory session store — keyed by session ID.
-_sessions: dict[str, list[dict[str, str]]] = {}
-
-
-def _build_reply(history: list[dict[str, str]]) -> str:
-    """Generate a contextual reply based on conversation history.
-
-    In production this would call a language model with the full history.
-
-    :param history: List of message dicts with ``role`` and ``content`` keys.
-    :type history: list[dict[str, str]]
-    :return: The assistant reply text.
-    :rtype: str
-    """
-    turn = len([m for m in history if m["role"] == "user"])
-    if turn == 1:
-        return "Welcome! Where would you like to go, and for how long?"
-    if turn == 2:
-        return (
-            "Great choice! Could you share your budget range "
-            "and any travel preferences (direct flights, accommodation type)?"
-        )
-    return (
-        f"Thanks for all the details! Based on our {turn}-turn conversation, "
-        "here is a suggested itinerary. Let me know if you'd like to adjust anything."
-    )
-
-
-@app.invoke_handler
-async def handle_invoke(request: Request) -> Response:
-    """Process a conversational turn, accumulating session context.
-
-    The session ID comes from the ``agent_session_id`` query parameter
-    (set automatically on ``request.state.session_id`` by the framework).
-
-    :param request: The raw Starlette request.
-    :type request: starlette.requests.Request
-    :return: JSON reply with session metadata.
-    :rtype: starlette.responses.JSONResponse
-    """
-    data = await request.json()
-    session_id = request.state.session_id
-    user_message = data.get("message", "")
-
-    # Retrieve or create session history
-    history = _sessions.setdefault(session_id, [])
-    history.append({"role": "user", "content": user_message})
-
-    reply = _build_reply(history)
-    history.append({"role": "assistant", "content": reply})
-
-    return JSONResponse({
-        "reply": reply,
-        "session_id": session_id,
-        "turn": len([m for m in history if m["role"] == "user"]),
-    })
-
-
-if __name__ == "__main__":
-    app.run()
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/multiturn_invoke_agent/requirements.txt b/sdk/agentserver/azure-ai-agentserver-invocations/samples/multiturn_invoke_agent/requirements.txt
deleted file mode 100644
index bc5cf4644e14..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-invocations/samples/multiturn_invoke_agent/requirements.txt
+++ /dev/null
@@ -1 +0,0 @@
-azure-ai-agentserver-invocations
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/simple_invoke_agent/requirements.txt b/sdk/agentserver/azure-ai-agentserver-invocations/samples/simple_invoke_agent/requirements.txt
deleted file mode 100644
index bc5cf4644e14..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-invocations/samples/simple_invoke_agent/requirements.txt
+++ /dev/null
@@ -1 +0,0 @@
-azure-ai-agentserver-invocations
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/simple_invoke_agent/simple_invoke_agent.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/simple_invoke_agent/simple_invoke_agent.py
deleted file mode 100644
index a2e7fdb32d3b..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-invocations/samples/simple_invoke_agent/simple_invoke_agent.py
+++ /dev/null
@@ -1,32 +0,0 @@
-"""Simple invoke agent example.
-
-Accepts JSON requests, echoes back with a greeting.
-
-Usage::
-
-    # Start the agent
-    python simple_invoke_agent.py
-
-    # Send a greeting request
-    curl -X POST http://localhost:8088/invocations -H "Content-Type: application/json" -d '{"name": "Alice"}'
-    # -> {"greeting": "Hello, Alice!"}
-"""
-from starlette.requests import Request
-from starlette.responses import JSONResponse, Response
-
-from azure.ai.agentserver.invocations import InvocationAgentServerHost
-
-
-app = InvocationAgentServerHost()
-
-
-@app.invoke_handler
-async def handle_invoke(request: Request) -> Response:
-    """Process the invocation by echoing a greeting."""
-    data = await request.json()
-    greeting = f"Hello, {data['name']}!"
-    return JSONResponse({"greeting": greeting})
-
-
-if __name__ == "__main__":
-    app.run()
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/streaming_invoke_agent/requirements.txt b/sdk/agentserver/azure-ai-agentserver-invocations/samples/streaming_invoke_agent/requirements.txt
deleted file mode 100644
index bc5cf4644e14..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-invocations/samples/streaming_invoke_agent/requirements.txt
+++ /dev/null
@@ -1 +0,0 @@
-azure-ai-agentserver-invocations
diff --git a/sdk/agentserver/azure-ai-agentserver-invocations/samples/streaming_invoke_agent/streaming_invoke_agent.py b/sdk/agentserver/azure-ai-agentserver-invocations/samples/streaming_invoke_agent/streaming_invoke_agent.py
deleted file mode 100644
index a207a93cca0d..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-invocations/samples/streaming_invoke_agent/streaming_invoke_agent.py
+++ /dev/null
@@ -1,79 +0,0 @@
-"""Streaming invoke agent example (SSE).
-
-Demonstrates returning results incrementally via Server-Sent Events.
-Callers receive real-time partial output as tokens are generated.
-
-Usage::
-
-    # Start the agent
-    python streaming_invoke_agent.py
-
-    # Send a streaming request
-    curl -N -X POST http://localhost:8088/invocations \
-        -H "Content-Type: application/json" \
-        -d '{"prompt": "Write a Calculator class with an Add method"}'
-    # -> data: {"token": "class"}
-    # -> data: {"token": " Calculator"}
-    # -> ...
-    # -> event: done
-    # -> data: {"invocation_id": "..."}
-"""
-import asyncio
-import json
-from collections.abc import AsyncGenerator  # pylint: disable=import-error
-
-from starlette.requests import Request
-from starlette.responses import Response, StreamingResponse
-
-from azure.ai.agentserver.invocations import InvocationAgentServerHost
-
-
-app = InvocationAgentServerHost()
-
-# Simulated tokens — in production these would come from a model.
-_SIMULATED_TOKENS = [
-    "class", " Calculator", ":", "\n",
-    "    ", "def", " add", "(", "self", ",", " a", ",", " b", ")", ":", "\n",
-    "        ", "return", " a", " +", " b", "\n",
-]
-
-
-async def _generate_tokens(
-    invocation_id: str, prompt: str  # pylint: disable=unused-argument
-) -> AsyncGenerator[bytes, None]:
-    """Yield SSE-formatted token events with simulated latency.
-
-    Each token is sent as a ``data:`` line per the SSE specification.
-    A final ``event: done`` signals stream completion.
-
-    :param invocation_id: The invocation ID for this request.
-    :type invocation_id: str
-    :param prompt: The user prompt (unused in this demo).
-    :type prompt: str
-    """
-    for token in _SIMULATED_TOKENS:
-        payload = json.dumps({"token": token})
-        yield f"data: {payload}\n\n".encode()
-        await asyncio.sleep(0.15)  # simulate model latency
-
-    # Signal completion
-    done_payload = json.dumps({"invocation_id": invocation_id})
-    yield f"event: done\ndata: {done_payload}\n\n".encode()
-
-
-@app.invoke_handler
-async def handle_invoke(request: Request) -> Response:
-    """Stream code-generation tokens back to the caller via SSE."""
-    data = await request.json()
-    invocation_id = request.state.invocation_id
-    prompt = data.get("prompt", "")
-
-    return StreamingResponse(
-        _generate_tokens(invocation_id, prompt),
-        media_type="text/event-stream",
-        headers={"Cache-Control": "no-cache", "Connection": "keep-alive"},
-    )
-
-
-if __name__ == "__main__":
-    app.run()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/README.md b/sdk/agentserver/azure-ai-agentserver-responses/README.md
index da041d5d926b..434adf4a0dde 100644
--- a/sdk/agentserver/azure-ai-agentserver-responses/README.md
+++ b/sdk/agentserver/azure-ai-agentserver-responses/README.md
@@ -197,23 +197,15 @@ To report an issue with the client library, or request additional features, plea
 
 ## Next steps
 
-Visit the [Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples) folder for complete working examples:
+Visit the [Samples](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-responses/samples) folder for the **durable** examples, and the [`durable-responses-agent-demo/local/`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local) kit to run a crash → recover demo locally:
 
 | Sample | Description |
 |---|---|
-| [Getting Started](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_01_getting_started.py) | Minimal echo handler using `TextResponse` |
-| [Streaming Text Deltas](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_02_streaming_text_deltas.py) | Token-by-token streaming with `configure` callback |
-| [Full Control](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_03_full_control.py) | Convenience, streaming, and builder — three ways to emit output |
-| [Function Calling](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_04_function_calling.py) | Two-turn function calling with convenience and builder variants |
-| [Conversation History](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_05_conversation_history.py) | Multi-turn study tutor with `context.get_history()` |
-| [Multi-Output](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_06_multi_output.py) | Reasoning + message in a single response |
-| [Streaming Upstream](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_10_streaming_upstream.py) | Forward to upstream streaming LLM via `openai` SDK |
-| [Non-Streaming Upstream](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_11_non_streaming_upstream.py) | Forward to upstream non-streaming LLM, emit items via builders |
-| [Image Generation](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_12_image_generation.py) | Image gen convenience, streaming partials, and full-control builder |
-| [Image Input](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_13_image_input.py) | Receive images via URL, base64 data URL, or file ID |
-| [File Inputs](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_14_file_inputs.py) | Receive files via base64 data URL, URL, or file ID |
-| [Annotations](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_15_annotations.py) | Attach file_path, file_citation, and url_citation annotations |
-| [Structured Outputs](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_16_structured_outputs.py) | Return structured JSON as a `structured_outputs` item |
+| [Durable Copilot](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_18_durable_copilot.py) | Copilot SDK durable + steerable session flow |
+| [Durable Streaming](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_19_durable_streaming.py) | Phase watermarks; skip completed phases on recovery |
+| [Durable Steering](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_20_durable_steering.py) | `context.is_steered_turn` mid-turn steering drain |
+| [Durable LangGraph](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_21_durable_langgraph.py) | LangGraph thread id = `context.conversation_chain_id` |
+| [Durable Multiturn](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_22_durable_multiturn.py) | Per-turn counters in `context.conversation_chain_metadata` |
 
 - [Handler implementation guide](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/docs/handler-implementation-guide.md) — Detailed reference for building handlers
 
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/docs/durability-contract.md b/sdk/agentserver/azure-ai-agentserver-responses/docs/durability-contract.md
new file mode 100644
index 000000000000..8d0a15745029
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/docs/durability-contract.md
@@ -0,0 +1,383 @@
+# Durability Contract — Conformance Specification
+
+**Status**: Authoritative conformance contract for the durability behaviour of
+`azure-ai-agentserver-responses`. This document defines the per-row × per-path
+guarantees that the durability-contract conformance suite
+(`tests/e2e/durability_contract/`) enforces. It is the test-facing companion
+to the design source-of-truth `docs/responses-durability-spec.md`: where that
+document explains *why* and *how* durability works, this one states the
+precise, testable promises and binds each to its conformance test.
+
+**Normative ownership (single edit point).** This document is the **single
+normative source** for the dispatch matrix and its per-cell dispositions, the
+streaming sub-contract, the recovered-entry precondition, and the
+handler/framework obligations — they are parsed by the conformance meta-tests
+and pinned by the Constitution. `responses-durability-spec.md` may summarize
+these clauses for readability, but the normative edit for any of them is made
+**here**; on conflict, this contract is authoritative. The design spec is
+authoritative for everything this contract does not carry (terminology, chain
+identity, the reserved metadata namespace, perpetual-task internals,
+cancellation, steering, and the worked sequences).
+
+**Audience**: Framework maintainers, handler authors, SDK reviewers, and the
+conformance meta-test.
+
+This document defines:
+
+- The **flags and server option** that select a durability behaviour.
+- The **termination lifecycle** — the three paths a server lifetime can take
+  when a request is in flight.
+- The **matrix** — for each flag combination, what the framework promises on
+  each termination path.
+- The **developer checkpoint-write contract** (Row 11) — the
+  `yield stream.checkpoint()` write point and its recovery semantics.
+- The **streaming sub-contract** layered on top when `stream=true`.
+- The **composition rules** (which flag combinations require which providers).
+- The **test discipline** the conformance suite follows.
+
+---
+
+## How to read this document
+
+1. Handler authors asking "what happens if the server dies?" read **The
+   matrix**, then their row's **Per-row contract**, then **Handler obligations**.
+2. Maintainers changing anything near durability read the whole document and
+   keep every row × applicable-path behaviour intact (see **Test discipline**).
+
+The terms `MUST`, `MUST NOT`, `SHOULD`, `MAY` follow RFC 2119.
+
+---
+
+## Concepts
+
+### Request flags
+
+Three boolean flags on the request select the durability shape:
+
+- **`store`** *(request body, default `true`)* — whether the response and its
+  events are persisted to the configured `ResponseStore`.
+- **`background`** *(request body, default `false`)* — whether the request
+  returns immediately with an `in_progress` response that clients poll or
+  stream-reconnect to observe.
+- **`stream`** *(request body, default `false`)* — whether the response is
+  delivered as SSE events on the original connection. Independent of the
+  durability shape; see the **Streaming sub-contract**.
+
+### Server option
+
+- **`durable_background`** *(server option, default `False`)* — whether the
+  framework engages full crash-recovery for `background=true, store=true`
+  requests. When `True`, the supporting providers MUST be present (see
+  **Composition rules**); the server fails loud at startup otherwise.
+
+### Termination paths
+
+Every in-flight request faces one of three paths from the moment the process
+receives a termination signal (or crashes). The matrix specifies a contract
+per path.
+
+- **Path A — graceful shutdown, handler reaches terminal within grace.** New
+  requests are refused; in-flight handlers continue; the handler reaches a
+  terminal state before grace expires. The happy path; identical across rows.
+- **Path B — graceful shutdown, grace exhausted with handler still running.**
+  The framework MUST act in-process before the runtime exits, per the row's
+  contract, and respond to waiting clients in this lifetime.
+- **Path C — crash, or a graceful shutdown whose Path-B action did not run**
+  (SIGKILL, OOM, power loss, a hang during the shutdown loop). On the next
+  process lifetime the framework scans persisted state and applies the row's
+  restart contract. Path C is the complete fallback for Path B.
+
+A single termination event is handled by exactly one path.
+
+### Durable record
+
+Every accepted `store=true` request is registered with the underlying
+durable-task primitive at acceptance time. The registration carries the
+response id, the row's Path-C disposition (`re-invoke` for Row 1,
+`mark-failed` for Rows 2 and 3), and (for re-invocation rows) the handler
+reference. `store=false` requests have no durable record; Path C does not
+apply.
+
+### Recovered entry
+
+On a recovered re-invocation (Row 1 Path B post-restart, or Path C) the
+handler observes `context.is_recovery == True`. Its cross-turn checkpoint
+store is `context.conversation_chain_metadata`; its single-turn,
+per-response watermark surface is the `internal_metadata` map. The handler
+seeds its resumption from `context.persisted_response` (the last durably
+persisted snapshot — see Row 11).
+
+**Recovery precondition (persisted response required).** The framework
+re-invokes the handler only if the response was durably created in the
+response store. If the response is **definitively absent** on recovery
+(a typed not-found from the store), the original `POST /responses`
+connection closed without ever returning a response id, so no client can
+fetch it — the framework MUST drop the durable execution (no
+re-invocation, no `response.*` stream events, no terminal write) and settle
+the task so the recovery scan does not re-select it. This applies to **both
+`stream=false` and `stream=true`** durable background recovery — the gate
+runs before the stream-vs-non-stream dispatch. A transient/ambiguous store
+error is NOT a definitive absence and MUST NOT trigger a drop.
+
+---
+
+## The matrix
+
+The matrix is the per-row × per-path contract. Rows 1–4 are keyed on the three
+flags (`store`, `background`, `durable_background`); `stream` is intentionally
+NOT a row key (the contract is mode-flag agnostic with respect to `stream`,
+and the streaming sub-contract specifies how it is delivered). Row 11 is a
+**checkpoint-write extension of Row 1** — it has Row 1's flags and adds the
+developer `stream.checkpoint()` write point; its cutpoints are detailed in its
+per-row contract.
+
+| Row | `store` | `background` | `durable_background` | Path A (within-grace) | Path B (grace exhausted) | Path C (crash / Path-B failure) |
+|----:|---------|--------------|----------------------|-----------------------|--------------------------|---------------------------------|
+|  1  | `true`  | `true`       | `True`               | natural terminal      | hand the in-flight handler to the durable-task primitive's recovery; runtime exits; next lifetime re-invokes the handler with `is_recovery=True` | next lifetime re-invokes the handler with `is_recovery=True` |
+|  2  | `true`  | `true`       | `False`              | natural terminal      | mark response `failed` (`code=server_error`) in-process before exit; respond to waiting clients | next lifetime marks response `failed` (`code=server_error`) |
+|  3  | `true`  | `false`      | any                  | natural terminal      | mark response `failed` (`code=server_error`) in-process before exit; respond to waiting clients | next lifetime marks response `failed` (`code=server_error`) |
+|  4  | `false` | any          | any                  | natural terminal      | best-effort `failed` marker in-process; original HTTP connection may already be closing | no recovery applies (no persisted state) |
+| 11  | `true`  | `true`       | `True`               | all phases checkpoint + complete; final `response.output` reflects every phase | handler at a checkpoint boundary calls `await context.exit_for_recovery()`; recovery resumes from the last checkpointed snapshot | SIGKILL at a checkpoint boundary; recovery resumes from the last checkpointed snapshot |
+
+Read every cell as a MUST for the framework. Path A is identical across Rows
+1–4 because no framework intervention is needed.
+
+---
+
+## Per-row contracts
+
+### Row 1 — Full recovery (`store=true, background=true, durable_background=True`)
+
+**Path A.** Handler completes within grace. Standard happy path.
+
+**Path B.** Grace expires with the handler still running. The framework MUST
+hand the in-flight handler to the durable-task primitive's recovery (NOT mark
+it `failed`) and exit; the next lifetime re-invokes the handler with
+`context.is_recovery == True`.
+
+**Path C.** SIGKILL or a Path-B action that did not complete. On the next
+lifetime the framework finds the durable record and re-invokes the handler
+with `context.is_recovery == True`.
+
+**Recovered handler entry contract** (Path B post-restart and Path C):
+
+- `context.is_recovery == True`.
+- `context.conversation_chain_metadata` carries any cross-turn checkpoint
+  state the handler flushed in a prior lifetime.
+- The framework does not impose a watermark schema. The handler chooses what
+  it stores and how it resumes.
+- For streaming, the recovered handler emits a `response.in_progress` reset
+  event as its first event (see **Streaming sub-contract**).
+- Graceful-shutdown recovery is requested with the single uniform primitive
+  `await context.exit_for_recovery()`, which works in every handler shape
+  (coroutine, async generator, sync).
+
+### Row 2 — Marked failed (`store=true, background=true, durable_background=False`)
+
+A stored, observable response without crash recovery.
+
+**Path A.** Handler completes within grace. Standard.
+
+**Path B.** The in-process shutdown loop MUST mark the response `failed`
+(`code=server_error`, path cause in `message`), persist any final events, and
+respond to waiting clients in this lifetime.
+
+**Path C.** On the next lifetime the framework finds the durable record
+(disposition `mark-failed`) and marks the response `failed`
+(`code=server_error`) with a synthetic terminal event so subsequent polling
+and stream-reconnect see terminal.
+
+### Row 3 — Marked failed, foreground (`store=true, background=false`, any `durable_background`)
+
+A stored response observable over the original (foreground) HTTP connection.
+`durable_background` is a free axis — foreground responses do not benefit from
+durable handler recovery because the client connection is gone. Path A/B/C
+have the same shape as Row 2; all failure markers use `code=server_error` with
+the path-specific cause in `message`.
+
+### Row 4 — Best-effort (`store=false`, any `background`, any `durable_background`)
+
+In-memory-only, no persistence, no recovery.
+
+**Path A.** Handler completes within grace. Standard.
+
+**Path B.** The shutdown loop MAY write a best-effort `failed` event to the
+open connection. No persistence is required (there is nowhere to persist).
+
+**Path C.** No persisted state, so no next-lifetime action applies.
+
+### Row 11 — Developer checkpoint write (extension of Row 1)
+
+Row 11 covers the `yield stream.checkpoint()` write point used by the
+**one-OutputItem-per-phase** durable pattern. A handler emits one output item
+per logical phase and checkpoints at each phase boundary; the checkpoint
+persists a snapshot whose `output` holds exactly the phases completed so far.
+On recovery the handler **seeds the stream** from `context.persisted_response`
+(so the already-checkpointed phases' items are present in
+`stream.response.output`, keeping their original lifetime marker) and resumes
+at `len(stream.response.output)`, running only the remaining phases. This makes
+the recovery resume-point directly observable in the recovered
+`response.output`.
+
+`checkpoint()` is gated to durable background responses
+(`durable_background=True` + `store=true` + `background=true`) and is a no-op
+otherwise.
+
+**Cutpoints** (the failure boundaries the contract guarantees, expressed in
+the one-item-per-phase model):
+
+- **C1 — crash after a successful checkpoint.** Phase N's item is emitted and
+  its `checkpoint()` succeeds, then the process is lost before phase N+1's item
+  is emitted. Recovery's `persisted_response.output` holds N+1 items; the
+  handler resumes at phase N+1. Phase N survives with its original lifetime
+  marker; only later phases re-run. No data loss, no duplication.
+- **C3 — crash before a checkpoint.** Phase N's item is emitted but the handler
+  is lost *before* calling `checkpoint()`. The snapshot still holds N items
+  (the un-checkpointed item N never persisted); recovery re-runs phase N.
+  **This is the central guarantee of the one-item-per-phase pattern.**
+- **C2 — crash mid-checkpoint-write (provider-atomicity limitation).** The
+  `FileResponseStore` provider commits the response envelope via an atomic
+  `os.replace`, and writes each output item to the shared `items/` store
+  **before** the envelope (items-first). Items are immutable by id
+  (re-stores are idempotent same-content), so a crash during
+  `update_response` exposes either the prior committed snapshot or the newly
+  committed one — **never a torn snapshot** (and never an envelope pointing
+  at a missing item). Whether recovery sees N or N+1 items therefore depends
+  on the provider's commit point, not on a torn write. The contract
+  guarantees *no corruption*; it does NOT promise "prior snapshot only" for a
+  mid-write crash with this provider. No torn-write recovery is asserted.
+- **C4 — checkpoint after terminal.** A checkpoint event yielded after the
+  terminal event is dropped (the terminal write is authoritative); no
+  overwrite, no exception.
+- **C5 — provider failure swallowed.** A transient `update_response` failure
+  during `checkpoint()` is swallowed; the handler does not observe it and
+  recovery sees the prior snapshot.
+
+**Path A.** All phases checkpoint and the handler reaches a natural terminal;
+the final `response.output` reflects every phase produced by the fresh entry.
+
+**Path B.** The handler is parked at a checkpoint cutpoint when grace is
+exhausted; it observes `context.shutdown`, calls
+`await context.exit_for_recovery()`, and the framework leaves the response
+`in_progress`. On restart the handler resumes from the checkpointed snapshot.
+The deferral MUST NOT overwrite the last checkpoint snapshot with a
+pre-terminal record.
+
+**Path C.** SIGKILL at a checkpoint cutpoint; on restart recovery resumes from
+the last checkpointed snapshot.
+
+**Contract-surface depth (Principle XI).** Row 11 conformance tests assert the
+recovered `response.output` *content* using per-lifetime-identifiable markers
+(`L{lifetime}_phase{n}`) so the resume-point — and the absence of loss or
+duplication — is directly visible (e.g. C1 →
+`[L0_phase0, L0_phase1, L1_phase2]` vs C3 →
+`[L0_phase0, L1_phase1, L1_phase2]`), not just terminal `status`.
+
+---
+
+## Streaming sub-contract
+
+When `stream=true`, the row's contract applies as written, PLUS:
+
+1. **Event persistence (Rows 1, 11).** Every emitted SSE event MUST be appended
+   to the durable stream provider in order BEFORE being flushed to the
+   original connection, so a reconnecting client is served the same prefix.
+2. **Resumable reconnect endpoint.** `GET /responses/{id}?stream=true&starting_after=<event_id>`
+   MUST return durable events strictly after `<event_id>` and then live-tail
+   (or return the terminal event if the response is complete).
+3. **`response.in_progress` reset event.** On re-invocation the recovered
+   handler MUST emit a `response.in_progress` event as its first **client-visible**
+   event, carrying the corrected output items. The recovered handler may still
+   emit `response.created` first (to seed its in-memory stream and satisfy the
+   first-event validator), but the framework MUST NOT append a second
+   `response.created` to the durable stream — see clause 5.
+4. **Stable event ids across recovery.** Pre-crash events retain their ids;
+   recovered events get fresh monotonic ids after the last pre-crash id.
+5. **Single `response.created` per durable stream.** `response.created` is, by
+   definition, the first event of a durable stream. The framework appends it to
+   the durable stream provider **only when the stream is empty** (no events ever
+   appended). On a recovered entry the stream already carries the pre-crash
+   `response.created`, so the re-emitted one is suppressed at the provider
+   write; a reconnecting/replaying client therefore observes `response.created`
+   exactly once across the full (pre-crash + recovered) sequence. The
+   persisted-but-stream-empty window (response created, crash before the first
+   stream emit) correctly re-appends `response.created` because the stream is
+   genuinely empty.
+
+**Client-side rule.** A streaming client MUST reset its accumulator on every
+`response.in_progress` event after the first.
+
+---
+
+## Composition rules
+
+The framework MUST validate at startup and fail loud if a required provider is
+absent; it MUST NOT silently downgrade to a weaker row.
+
+| Server config | Required providers | If missing |
+|---|---|---|
+| `durable_background=True` | `ResponseStore` supporting durable task records; a durable stream provider for streamed durable responses | Startup error naming the missing provider |
+| `store=true` requests accepted (any row) | `ResponseStore` | Startup error |
+| `stream=true` requests accepted (any row) | A streaming-capable transport configuration | Startup error |
+
+---
+
+## Handler obligations
+
+- Emit output via builder events (`add_output_item_*` → `emit_*`); do NOT
+  pre-populate `response.created` with output items on a **fresh** entry. (On a
+  **recovered** entry, seeding the stream from `context.persisted_response` —
+  which carries the already-persisted items on `response.created` — is the
+  intended recovery pattern and is accepted by the framework.)
+- For durable graceful shutdown, call `await context.exit_for_recovery()` to
+  leave the response `in_progress` for next-lifetime recovery.
+- For the checkpoint pattern (Row 11), checkpoint at safe phase boundaries and,
+  on recovery, resume from `context.persisted_response`.
+- For at-most-once side effects across recovery, write a dedup marker to
+  `context.conversation_chain_metadata` and `await ...flush()` before the
+  side effect.
+
+---
+
+## Framework obligations
+
+- Deliver every row × applicable-path cell above as a MUST.
+- Persist the checkpoint snapshot durably on success; on a swallowed provider
+  failure, preserve the prior snapshot (C5).
+- On recovery deferral (`exit_for_recovery`), preserve the last checkpoint
+  snapshot — do NOT overwrite it with a pre-terminal record (Row 11 Path B).
+- **Append `response.created` to the durable stream only when the stream is
+  empty** — never re-append it on a recovered entry (Streaming sub-contract
+  clause 5).
+- **Drop recovery when the response was never durably created** — on a
+  definitive store not-found, do not re-invoke the handler; settle the task
+  (Recovered entry § Recovery precondition).
+- Strip `internal_metadata` (item-level and the response-level reserved key)
+  from every client egress; never persist client-injected internal metadata.
+
+---
+
+## Test discipline
+
+The matrix is the contract, enforced by the behavioural suite at
+`tests/e2e/durability_contract/` and codified by Constitution Principle X.
+
+1. **One test module per (row × path)** — `test_row_<N>_path_{a,b,c}.py`. Each
+   module drives the contract end-to-end through a real HTTP client.
+2. **Real signals only.** Path A uses SIGTERM with a long grace; Path B uses
+   SIGTERM with a deliberately short grace; Path C uses SIGKILL via
+   `_crash_harness` then restart. No mocking, no synthetic-crash shortcuts, no
+   fabricated recovery state.
+3. **`stream` is parametrized** — every module runs both `stream=False` and
+   `stream=True`.
+4. **Completeness meta-test.** `test_contract_completeness.py` parses **The
+   matrix** here and fails if any (row × applicable path) lacks a test module,
+   and requires `CONTRACT_COVERAGE.md` to map every conformance test.
+5. **Contract-surface depth (Principle XI).** Per-cell tests assert on event
+   content / `response.output` / sequence numbers as applicable, not just
+   terminal status. Row 11 uses per-lifetime markers (above).
+
+For Row 11, the real-crash cutpoints **C1** and **C3** are exercised e2e under
+Path B (graceful `exit_for_recovery`) and Path C (SIGKILL); **C2** is the
+documented provider-atomicity limitation above (no torn-write assertion);
+**C4** and **C5** are unit-tested in `tests/unit/test_checkpoint.py`.
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/docs/durable-responses-developer-guide.md b/sdk/agentserver/azure-ai-agentserver-responses/docs/durable-responses-developer-guide.md
new file mode 100644
index 000000000000..687e07f6aa3a
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/docs/durable-responses-developer-guide.md
@@ -0,0 +1,672 @@
+# Durable Responses Developer Guide
+
+This guide explains how to build crash-recoverable response handlers using the
+durable background responses feature. It covers what the framework provides
+automatically, what developers need to implement, and best practices.
+
+## Overview
+
+When `durable_background=True` (opt-in — the default is `False`), the
+framework automatically wraps your response handler in a **durable
+task**. If the server crashes mid-response:
+
+- Background responses are automatically re-invoked on restart
+- Stream events are preserved for client reconnection
+- Conversation state is maintained across crashes
+
+**Opting in (`durable_background=True`) gets you the framework half for
+free**: re-invocation on restart, event replay for reconnecting clients, and
+conversation continuity — with no handler changes. A naive handler re-invoked
+this way still produces a correct response (it just re-runs the whole turn).
+The *handler* half — making the recovered attempt resume *where it left off*
+and not repeat non-idempotent side effects — is optional work you take on when
+you want it; see [Choosing a resume strategy](#choosing-a-resume-strategy).
+
+> **Default**: `durable_background` defaults to `False`. Without the
+> opt-in, a crash mid-handler leaves the response in the
+> "crash-failed" state: the next-lifetime recovery scanner marks it
+> `failed` (`server_error` / `shutdown_reason=crash_recovery`) instead
+> of re-invoking the handler. Set `durable_background=True` on
+> `ResponsesServerOptions` to engage the re-invoke recovery path.
+
+## What the Framework Provides (Zero Code)
+
+| Feature | Behavior |
+|---------|----------|
+| Crash recovery | Handler re-invoked on server restart (requires `durable_background=True`) |
+| Stream replay | Events persisted incrementally; clients reconnect seamlessly |
+| Conversation lock | Prevents conflicting concurrent writes |
+| Non-bg cleanup | Foreground responses marked `failed` on crash (no ghost re-invocation) |
+| TTL-based cleanup | Stream events auto-expire after 10 minutes (framework-internal) |
+
+## Decision Tree
+
+### What is `context.conversation_chain_metadata` for?
+
+`context.conversation_chain_metadata` is a **small key-value store of references
+and watermarks** — it is NOT a place to keep your application's
+checkpoint data.
+
+Use it for things like:
+
+- An upstream session UUID (Copilot session id, a
+  LangGraph thread id).
+- A small pointer to your most recently processed input or output (e.g.
+  `last_processed_input_item_id`).
+- A short workflow step counter (`step: 3`) so the recovered handler
+  knows where to resume.
+
+The actual checkpoint *data* — graph state, conversation history,
+generated content, intermediate work — lives in the upstream framework
+or in your own external storage (Redis, Cosmos DB, files on disk). The
+metadata pointer is what lets the recovered handler find that data.
+
+```python
+@app.response_handler
+async def handler(request, context, cancellation_signal):
+    # Small watermark: which workflow step is next?
+    step = int(context.conversation_chain_metadata.get("workflow_step", 0))
+
+    for i in range(step, total_steps):
+        # Do work — write any bulk data to your upstream store directly,
+        # NOT to context.conversation_chain_metadata.
+        await upstream_store.write_step_result(i, result)
+        # Advance the watermark, then explicitly flush so the next
+        # process lifetime (after a crash) skips the already-committed
+        # step. Persistence is not implicit — flush before any side
+        # effect whose effect must survive a crash.
+        context.conversation_chain_metadata["workflow_step"] = i + 1
+        await context.conversation_chain_metadata.flush()
+```
+
+Why this distinction matters: metadata is persisted alongside the
+durable task — small writes are cheap and fast, but bulk writes will
+hit task-store payload limits and slow down recovery. Treating metadata
+as a checkpoint *index* (not a checkpoint *store*) keeps it fast and
+keeps your actual durable data in the storage system best suited to it.
+
+### Do you need multi-turn conversations?
+
+Enable steerable conversations for agents that maintain context across turns:
+
+```python
+options = ResponsesServerOptions(
+    durable_background=True,
+    steerable_conversations=True,
+)
+```
+
+With steering enabled:
+- Each turn shares the same durable task (conversation continuity)
+- New turns can cancel the current in-progress turn
+- The `pending_input_count` field tells you how many turns are queued
+
+### Do you need a custom acceptance hook?
+
+When a new turn is queued onto an **already-active steerable conversation**
+(steering pressure — never the first turn of a conversation), the framework
+returns a "queued" response to that POST. By default it's a minimal
+`status="queued"` envelope. Register `@app.response_acceptor` to customize it
+— the hook returns a strongly-typed `ResponseObject`:
+
+```python
+from azure.ai.agentserver.responses import (
+    CreateResponse, ResponseContext, ResponseObject,
+)
+
+@app.response_acceptor
+def my_acceptor(request: CreateResponse, context: ResponseContext) -> ResponseObject:
+    return ResponseObject(
+        {
+            "id": context.response_id,
+            "object": "response",
+            "status": "queued",
+        }
+    )
+```
+
+This is optional — the default queued envelope is fine for most agents. See
+the handler guide's
+[steering API](handler-implementation-guide.md#steering-api) for the hook
+mechanics.
+
+## Configuration
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `durable_background` | `False` | Opt INTO crash-recoverable background responses |
+| `steerable_conversations` | `False` | Enable multi-turn steering with cooperative cancel |
+
+## Configuration Matrix
+
+Recovery semantics depend on three request flags and one server option. The
+table below is a quick orientation. For the **normative** specification — the
+exact behaviour you can rely on per row, per termination path, and per
+stream/poll mode — see
+[`responses-durability-spec.md`](responses-durability-spec.md). That document
+is the source of truth; this section summarises it for developer ergonomics.
+
+| `store` | `background` | `durable_background` | Summary |
+|---|---|---|---|
+| `true` | `true` | `True` | **Full recovery.** Handler is re-invoked with `context.is_recovery == True`. Persisted events replay to reconnecting clients. See [Crash Recovery](#crash-recovery). |
+| `true` | `true` | `False` (default) | **Failed marker.** Response is marked `failed` on restart. Handler is NOT re-invoked. Pre-crash persisted events remain replayable until TTL expires. |
+| `true` | `false` (foreground) | any | **Failed marker.** Response is marked `failed` with `code=server_error`. Handler is NOT re-invoked (the client's HTTP connection is already dead). Persisted events remain queryable. |
+| `false` | any | any | **Best-effort failed marker** during shutdown grace period only. No persistence. Recovery does not apply. |
+
+Each row × termination-path cell — Path A (handler completes within grace),
+Path B (grace exhausted, in-process marker fires), Path C (crash or Path-B
+failure, next-lifetime recovery fires) — is covered by a dedicated
+conformance test in `tests/e2e/durability_contract/`. If something behaves
+differently from what the spec says, that's a bug in either the implementation
+or the spec — open an issue.
+
+`steerable_conversations=True` composes orthogonally: it enables multi-turn
+steering on top of any row above. Recovery composes with steering — see the
+[handler guide's Recovery × Cancellation Composition](handler-implementation-guide.md#recovery--cancellation-composition).
+
+> **`conversation_id` chains**: when a request supplies
+> `conversation_id`, sequential turns extend the chain even when
+> `steerable_conversations=False`. Only **concurrent overlap** (a new
+> turn arriving while a prior turn's handler is still in progress)
+> returns 409 `conversation_locked`. This is independent of the
+> `steerable_conversations` option — that option only controls whether
+> mid-turn inputs are queued (steerable) or rejected (non-steerable).
+
+### Steerable conversations: no forking
+
+When `steerable_conversations=True`, each turn after the first must reference
+the previous turn's `response_id` via `previous_response_id`. The framework
+rejects forks with HTTP 409:
+
+```json
+{
+  "error": {
+    "message": "Conversation forking is not supported — previous_response_id must reference the most recent turn.",
+    "type": "conflict",
+    "code": "conversation_fork_not_supported",
+    "param": "previous_response_id"
+  }
+}
+```
+
+This includes both stale-predecessor cases (you sent a `previous_response_id`
+that refers to a turn other than the most recent one) and concurrent races
+(two POSTs arrive together with the same `previous_response_id` — exactly one
+wins; the other gets the 409). There is no soft path through; a steerable
+conversation cannot be branched.
+
+The check is enforced by the core durable layer's input-precondition primitive
+under the hood — see the core `durable-task-guide.md` §4 (Concepts → "Input-acceptance
+preconditions") for the underlying mechanism. From a
+responses-API consumer's perspective: keep `previous_response_id` pointing at
+the latest `response_id` you have seen for this conversation.
+
+### Provider configuration for local-dev recovery testing
+
+Real cross-process recovery requires durable storage that survives subprocess
+restarts. The framework defaults provide this automatically; the
+sections below describe what they do and how to override them for
+specific scenarios.
+
+- **Durable task store**: in a hosted environment the framework uses
+  the Foundry task storage API; in local development it auto-selects
+  a file-backed task store under
+  `${AGENTSERVER_DURABLE_ROOT:-~/.durable}/tasks/`. Either way, tasks
+  survive process restarts so a recovered handler re-enters its prior
+  task body. Operators can override the auto-selection by setting
+  `AGENTSERVER_TASKS_BACKEND=local` (to force file-backed in hosted)
+  or `AGENTSERVER_TASKS_BACKEND=hosted` (to force the hosted API in
+  local).
+- **Response store**: in a hosted environment the framework uses the
+  Foundry hosted responses storage API; in local development the
+  default is `FileResponseStore` under
+  `${AGENTSERVER_DURABLE_ROOT:-~/.durable}/responses/`. No explicit
+  construction needed in either case. `InMemoryResponseProvider`
+  remains importable for in-memory-specific unit tests. To target a
+  different directory in local development, pass
+  `store=FileResponseStore(storage_dir=…)` to `ResponsesAgentServerHost`.
+- **Stream event store**: configured automatically — file-backed when
+  `durable_background=True`, in-memory otherwise. Files land under
+  `${AGENTSERVER_DURABLE_ROOT:-~/.durable}/streams/`. No per-store env
+  var to set; the unified `AGENTSERVER_DURABLE_ROOT` covers all three
+  local subdirs (`tasks/`, `streams/`, `responses/`).
+
+For production, your deployment hosts the response store externally —
+typically via the Foundry response provider, which is auto-configured
+when `FOUNDRY_PROJECT_ENDPOINT` is set. The stream event store
+continues to use the framework's file-backed registry under
+`${AGENTSERVER_DURABLE_ROOT}/streams/` (the durable-task primitive
+owns the equivalent migration for its task store).
+
+## Recovery + steering surface on `ResponseContext`
+
+When `durable_background=True`, the framework populates flat fields
+on the response context for every handler invocation. The fields
+mirror the underlying task primitive's classifiers and are safe to
+read regardless of `is_recovery`:
+
+```python
+@app.response_handler
+async def handler(request, context, cancellation_signal):
+    # True if this invocation is a re-entry after a crash.
+    if context.is_recovery:
+        # Recovery code path — build a resumption response, emit a
+        # reset response.in_progress event, continue from the last
+        # checkpoint your handler's metadata watermark recorded.
+        ...
+
+    # True only on the drain re-entry that follows a steering input
+    # (steerable_conversations=True). NOT set on the cancelled
+    # current turn that produced the steering pressure.
+    if context.is_steered_turn:
+        ...
+
+    # Number of additional steering inputs queued behind this turn.
+    # Live count — decreases as the framework drains the queue.
+    print(f"{context.pending_input_count} turns waiting")
+
+    # Persistent metadata namespace. Safe across crashes and turns.
+    # The default namespace is `context.conversation_chain_metadata["key"]`;
+    # named namespaces are `context.conversation_chain_metadata("name")["key"]`.
+    # Call `await context.conversation_chain_metadata.flush()` before any side
+    # effect that depends on the write surviving a crash. Snapshots
+    # also happen at lifecycle boundaries automatically.
+    context.conversation_chain_metadata["my_checkpoint_id"] = "abc-123"
+```
+
+These fields are always present on the context (even for `store=false`
+Row 4 responses, where the metadata facade is backed by an in-memory
+mapping that evaporates on restart).
+
+### Conversation chain identity
+
+`ResponseContext.conversation_chain_id: str` is a **derived, stable chain
+identifier**: the framework computes it so that **every turn of the same
+conversation resolves to the same value**, and so it stays constant across all
+attempts of a turn (fresh, recovered, multiply-recovered). It is the same value
+the framework uses internally to partition durable tasks. Think of it as "the
+stable name of this conversation", not as any single request field.
+
+It's derived by anchoring to the conversation's root rather than to the current
+turn: a `conversation_id` (explicit conversation scope) or the head of a
+`previous_response_id` chain pins every turn to one identifier; a first turn that
+has neither falls back to its own `response_id` as the chain root. The point of
+the derivation is that pinning — so you get **one durable key per conversation**,
+not a new one per turn.
+
+Handlers that wrap a stateful upstream framework (Copilot SDK, LangGraph, …) can
+use it as their upstream session id — a convenient way to avoid allocating (and
+persisting) your own UUID, though you're free to use your own identifier:
+
+```python
+session = await upstream_client.create_or_resume_session(
+    session_id=context.conversation_chain_id,
+)
+```
+
+What snapshot does the library hand you on recovery? It depends on your resume
+model (see [Choosing a resume strategy](#choosing-a-resume-strategy)):
+
+- If you use **framework checkpoints** (`stream.checkpoint()`), the library
+  persists the response snapshot at `response.created`, at each checkpoint, and
+  at the terminal event — and exposes the **last** such snapshot on a recovered
+  entry as `context.persisted_response`. That snapshot is your watermark.
+- If your durable state lives in an **upstream framework/store**, the library
+  does not hold a useful in-flight snapshot of the crashed attempt — you build
+  the resumption response from the upstream's state.
+
+Either way, the library never keeps a *running* snapshot of in-flight items
+between persistence points; what it persists is the SSE event stream (for
+client replay) plus the snapshot at each of the points above.
+
+### Notes on `context.conversation_chain_metadata`
+
+- The metadata API is a **callable namespace facade**. Use
+  `context.conversation_chain_metadata["key"] = value` for the default namespace;
+  use `context.conversation_chain_metadata("name")["key"] = value` for a sibling
+  namespace (each namespace tracks dirty state independently and can be
+  `await context.conversation_chain_metadata("name").flush()`-ed in isolation).
+- Persistence is **explicit**, not auto-flushed. Call
+  `await context.conversation_chain_metadata.flush()` (or
+  `await context.conversation_chain_metadata("name").flush()`) before any side
+  effect that depends on a metadata write surviving a crash. The
+  framework also snapshots all touched namespaces at lifecycle
+  boundaries (start/suspend/complete/fail/cancel/terminate), so values
+  written and forgotten will still be visible on a clean recovery — but
+  the fence for at-most-once side-effect patterns is your explicit
+  `flush()`.
+- Keys and namespace names **starting with `_` are rejected** (raise `ValueError`). Those prefixes are reserved for framework-internal namespaces (e.g. `_responses` for the responses orchestrator) — pick your own prefix-free names.
+- Metadata survives crashes — use it for small watermarks (session IDs, checkpoint references, "side effect issued" flags).
+- Keep values JSON-serializable (strings, numbers, lists, dicts).
+- **DO NOT** store conversation history, LLM outputs, or any bulk data in metadata. Use the upstream framework's own storage (session JSONL, checkpoint DB, etc.) for that.
+
+## Choosing a resume strategy
+
+When the framework re-invokes your handler after a crash
+(`context.is_recovery == True`), how the recovered attempt resumes coherently is
+**your choice**, driven by one question: **where does your durable progress
+state live?**
+
+| Where state lives | Strategy | On recovery |
+|---|---|---|
+| Nowhere (cheap to re-run) | **Naive re-run** | Do nothing recovery-specific; the whole turn re-runs. Correct, just duplicative — only unsafe if it repeats non-idempotent side effects. |
+| In the response snapshot | **Framework checkpoint** | Emit one `OutputItem` per phase + `yield stream.checkpoint()`. `context.persisted_response` is the last snapshot — seed the stream from it and resume past the items already there. |
+| In an upstream framework/store | **Upstream-owned** | Rebuild a resumption `ResponseObject` from the upstream's state (Copilot session, LangGraph checkpoint, your DB) and emit it as the reset. |
+
+Minimal skeletons (full templates are in the handler guide's
+[Durability section](handler-implementation-guide.md#durability)):
+
+```python
+# Framework checkpoint — state lives in the response snapshot
+if context.is_recovery and context.persisted_response is not None:
+    stream = ResponseEventStream(response=context.persisted_response,
+                                 response_id=context.response_id)
+    start = len(stream.response.output)          # resume past checkpointed phases
+else:
+    stream = ResponseEventStream(request=request, response_id=context.response_id)
+    start = 0
+
+# Upstream-owned — state lives in your framework/store
+resumption = build_response_from(upstream.load(context.conversation_chain_id))
+stream = ResponseEventStream(response=resumption, response_id=context.response_id)
+```
+
+**Watermark overlay (composable — not a fourth strategy).** Independently of the
+strategy you pick: if your handler makes a **non-idempotent side effect** (sending
+a user message upstream, charging a card) that the upstream can't dedup for you,
+fence it with a metadata watermark so a recovered attempt doesn't repeat it:
+
+```python
+context.conversation_chain_metadata["sent_msg"] = True
+await context.conversation_chain_metadata.flush()   # durable BEFORE the side effect
+await upstream.send_message(...)                    # the non-idempotent call
+del context.conversation_chain_metadata["sent_msg"]
+await context.conversation_chain_metadata.flush()   # clear AFTER it durably committed
+```
+
+These compose: a handler may checkpoint its response output **and** watermark a
+non-response side effect in the same turn.
+
+## Crash recovery — what you get, what you owe
+
+Re-entry is governed by the recovery contract in the
+[handler guide's Durability section](handler-implementation-guide.md#durability)
+(the canonical mental model and worked templates). This section is the
+configuration / decision context.
+
+### What you get on recovered entry
+
+- `context.is_recovery == True`, plus `context.persisted_response` — the last
+  durably-persisted snapshot (last `stream.checkpoint()`, else the
+  `response.created` snapshot, else `None`).
+- `context.conversation_chain_metadata` carrying whatever watermarks you stamped.
+- The cancellation contract from the [Cancellation guide](handler-implementation-guide.md#cancellation) continues to apply. If the prior attempt was cancelled (steering, client cancel, shutdown), the cancel surface is pre-set with the appropriate cause-boolean (`context.client_cancelled` for explicit cancel / non-bg disconnect; `context.shutdown.is_set()` for graceful shutdown; neither for steering pressure) on re-entry.
+- The framework persists the response object at `response.created`, at **each
+  successful `stream.checkpoint()`**, and at the terminal event; the
+  `response.created` and terminal writes are **deduplicated** across recovery
+  attempts keyed on `response_id`, so you never branch for them. The SSE event
+  stream is persisted as you emit it (no dedup) — except that a recovered
+  handler's re-emitted `response.created` is **not** re-appended to the
+  already-non-empty durable stream, so a replaying client sees `response.created`
+  exactly once.
+
+### What you owe on recovered entry (only if you chose a non-naive strategy)
+
+- Seed or build your resumption response (framework-checkpoint: from
+  `context.persisted_response`; upstream-owned: from upstream state).
+- Emit `response.in_progress` early — it is the client-visible reset point.
+- For non-idempotent side effects without upstream idempotency, honour your
+  watermarks: don't re-issue a call whose watermark is still set from the prior
+  attempt.
+
+### Naive opt-out
+
+A handler that does nothing recovery-specific still produces a correct response:
+it re-runs from scratch, the recovered stream's first client-visible event is a
+fresh `response.in_progress` (the duplicate `response.created` is suppressed at
+the durable stream), and everything re-streams. The one real risk is **repeating
+non-idempotent side effects** (a second upstream user message, a double charge) —
+if your handler has any, reach for the watermark overlay or a strategy that
+resumes past them.
+
+## Checkpoint-driven recovery — one item per phase
+
+When your work decomposes into phases, the simplest correct recovery shape
+is **one `OutputItem` per phase + `yield stream.checkpoint()` at each phase
+boundary**. The persisted response *is* the watermark: on recovery you seed
+the stream from `context.persisted_response` and resume from
+`len(stream.response.output)`. A phase that finished (`output_item.done` +
+`checkpoint()`) is already in the seeded output; a phase interrupted before
+its checkpoint never entered the snapshot, so it re-runs cleanly — no
+hand-rolled breadcrumb reconstruction.
+
+```python
+from azure.ai.agentserver.responses import (
+    CreateResponse, ResponseContext, ResponseEventStream,
+)
+
+PHASES = ("gather", "analyze", "synthesize", "review", "publish")
+
+
+@app.response_handler
+async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal):
+    # Recovery branch: seed from the persisted snapshot. The completed
+    # phases' items are already in stream.response.output; count them to
+    # know where to resume.
+    if context.is_recovery and context.persisted_response is not None:
+        stream = ResponseEventStream(
+            response_id=context.response_id, response=context.persisted_response,
+        )
+        done_phases = len(stream.response.output)
+    else:
+        stream = ResponseEventStream(response_id=context.response_id, request=request)
+        done_phases = 0
+
+    yield stream.emit_created()      # framework dedups the duplicate on recovery
+    if context.shutdown.is_set():
+        await context.exit_for_recovery()
+    yield stream.emit_in_progress()  # client-visible reset point on recovery
+
+    prompt = await context.get_input_text()
+    for phase_idx in range(done_phases, len(PHASES)):
+        message = stream.add_output_item_message()
+        message.internal_metadata["phase"] = PHASES[phase_idx]  # stripped on egress
+        yield message.emit_added()
+        text = message.add_text_content()
+        yield text.emit_added()
+        async for token in run_phase(PHASES[phase_idx], prompt):
+            if context.shutdown.is_set():
+                await context.exit_for_recovery()  # item not closed → phase re-runs
+            yield text.emit_delta(token)
+        yield text.emit_text_done()
+        yield text.emit_done()
+        yield message.emit_done()        # item now in stream.response.output
+        yield stream.checkpoint()        # phase durable; on to the next
+
+    yield stream.emit_completed()
+```
+
+`yield stream.checkpoint()` durably persists the current `stream.response`
+snapshot (gated to durable background responses; a no-op otherwise) and is
+backpressured — control does not return from the `yield` until the write
+completes. See the handler guide's
+[Stream Checkpoints](handler-implementation-guide.md#stream-checkpoints) for
+the full semantics and `durability-contract.md` Row 11 for the conformance
+contract.
+
+### Which metadata facility?
+
+There are **two** internal-metadata facilities at **different scopes**:
+
+- **`context.conversation_chain_metadata`** — **cross-turn**, named-scope,
+  explicit-`flush()` durable state over the whole conversation chain. Use it
+  for state a *later turn* needs from an earlier one, or for coordination
+  between layers/parallel nodes spanning the chain.
+- **`internal_metadata`** (on items via `item.internal_metadata`, and on the
+  response via `stream.internal_metadata`) — a **single-turn** live
+  `MutableMapping[str, Any]` that rides on the response/items, is persisted
+  with the response (so it survives recovery, read back via
+  `context.persisted_response`), and is **stripped before every client-facing
+  payload** (egress and ingress). Use it for lightweight per-turn watermarks,
+  id mappings, or in-turn stale-message detection.
+
+**Rule of thumb:** need it in a *later turn* → `conversation_chain_metadata`;
+need it only to reconstruct *this* response on crash →
+`internal_metadata` + `stream.checkpoint()`. Both are distinct from the
+*public* `ResponseObject.metadata` (the client's own metadata — never
+stripped).
+
+## Stream Recovery (client-side reconciliation)
+
+The library persists every SSE event in order — including events emitted
+across multiple recovery attempts. Reconnecting clients use the standard
+`starting_after=` query parameter to resume:
+
+```
+GET /responses/{id}?stream=true&starting_after=42
+```
+
+This returns only events with `sequence_number > 42`.
+
+A durable stream has **exactly one** `response.created` — it is the first
+event of the stream. On a recovered entry the framework does **not** append a
+second `response.created` (it is suppressed at the durable-stream write because
+the stream is non-empty), so the full replayed sequence a reconnecting client
+sees end-to-end is:
+
+```
+response.created
+response.in_progress
+<events emitted before the crash>
+response.in_progress        ← recovery reset: carries the stable
+                              (already-persisted) output items at the
+                              resumption point
+<events emitted after recovery>
+response.completed
+```
+
+The post-recovery part of this guarantee is normative per
+[`responses-durability-spec.md`](responses-durability-spec.md): for
+`(store=true, background=true, durable_background=True, stream=true)` —
+the row that supports handler re-invoke — a client reconnecting AFTER a
+crash receives the events the recovered handler emits, framed by the
+reset-on-`in_progress` rule below. The conformance suite covers this
+under Row 1 Path C.
+
+### The reset-on-`in_progress` rule
+
+Clients that want to support durable+background recovery MUST observe the
+following rule:
+
+> **Any `response.in_progress` event received after the first one in a
+> stream is a snapshot reset.** Replace the local `response.output` with
+> the event's `response.output`. Discard any partial in-flight item
+> content you had been accumulating. Treat subsequent events as additive
+> on top of the new snapshot.
+
+This rule applies whether the client is reading the live stream or
+replaying via `starting_after=`. The reset event is in-band — no
+separate signal is needed.
+
+### Output indexes are slot IDs, not monotonic counters
+
+After a snapshot reset, the handler MAY re-use `output_index` values that
+appeared before the reset. Clients MUST treat indexes as authoritative
+slot identifiers:
+
+- `output_item.added` at an index already present in the snapshot →
+  replace the slot.
+- `output_item.added` at a new index → append a slot.
+- Subsequent `output_item.delta` / `output_item.done` apply to the slot
+  identified by `output_index`.
+
+Clients that assume indexes are strictly monotonic will see a coherent
+final response but may render intermediate states incorrectly.
+
+## Non-Background Response Behavior
+
+When `background=false` (foreground streaming):
+
+- Response is tied to the HTTP connection lifetime.
+- If the server crashes: response is marked `failed` with `code=server_error`.
+- The handler is NOT re-invoked (client is already disconnected).
+- Conversation lock still applies (prevents concurrent modifications).
+
+## Layered Concerns
+
+This guide and the handler guide together describe three layered concerns
+that compose to give you durable response handlers:
+
+- **The durable background runtime** provides the runtime primitives
+  (flat recovery + steering fields on `ResponseContext` —
+  `is_recovery`, `is_steered_turn`, `pending_input_count`,
+  `conversation_chain_metadata` — task store wiring, steerable conversation
+  orchestration).
+- **The cancellation contract** provides two distinct surfaces — the
+  3rd positional handler arg `cancellation_signal: asyncio.Event`
+  (set on client cancel, `/cancel` API, or steering pressure) and
+  `context.shutdown: asyncio.Event` (set on server shutdown), plus
+  the cause flag `context.client_cancelled: bool` and the recovery
+  primitive `await context.exit_for_recovery()`. Pre-entry /
+  mid-stream / post-stream rules: no `cancelled` from steering or
+  shutdown, no `incomplete` from framework, framework-set `failed`
+  for naive-not-handled cancellation.
+- **The recovery contract** provides the multi-attempt
+  reconciliation pattern: resumption response, snapshot reset on
+  `response.in_progress`, watermark-guarded side effects, naive
+  fallback.
+
+The three compose cleanly: the runtime surfaces the recovery hooks, the
+cancellation contract is what recovered handlers must honour, and the
+recovery contract prescribes how the recovered attempt produces coherent
+output.
+
+## Best Practices
+
+These are recommendations, not framework requirements — adapt them to your
+handler. (The genuine hard rules are few: a `ResponseEventStream` handler emits
+`response.created` then `response.in_progress` first and exactly one terminal
+event; a recovered streaming entry emits `response.in_progress` as the reset
+point; and clients supporting durable streams treat any later
+`response.in_progress` as a snapshot reset.)
+
+1. **Keep the recovery branch easy to find.** A recovery-aware handler usually
+   diverges from a fresh handler near the top (`if context.is_recovery:`).
+   Branching early keeps the two paths readable — a readability tip, not a rule.
+
+2. **Prefer your upstream framework's own resume facility** when you have one.
+   Copilot SDK has `create_session(session_id=...)` / `resume_session(...)`;
+   LangGraph has `SqliteSaver` checkpoints. Reconstructing upstream state from
+   your own metadata is usually more work and more fragile.
+
+3. **Watermark non-idempotent side effects — when the upstream can't dedup them.**
+   If a recovered attempt could repeat an observable side effect (sending a user
+   message, charging a card) and the upstream offers no idempotency key or
+   "already done?" query, fence it: stamp + `flush()` `context.conversation_chain_metadata`
+   BEFORE the call, clear + `flush()` AFTER it durably commits. If the upstream is
+   already idempotent, or you use the framework-checkpoint model where the snapshot
+   is your side-effect boundary, you may not need this.
+
+4. **Keep metadata small.** Watermarks, session IDs, checkpoint references —
+   never bulk data (it hits task-store payload limits and slows recovery).
+
+5. **Honour the cancellation contract on recovery.** Recovery doesn't change the
+   cancellation contract from the [Cancellation guide](handler-implementation-guide.md#cancellation):
+   the same pre-entry / mid-stream / shutdown rules apply on recovered entries.
+
+6. **Don't store secrets in metadata.** The task store persists it.
+
+## Examples
+
+See the `samples/` directory for canonical durable handler shapes:
+
+- `sample_18_durable_copilot.py` — Stateful GitHub Copilot SDK conversation
+  (session resume on recovery).
+- `sample_19_durable_streaming.py` — Handler-managed checkpointing
+  (no upstream framework).
+- `sample_20_durable_steering.py` — Steerable variant of 19, demonstrating
+  cancellation × recovery composition.
+- `sample_21_durable_langgraph.py` — LangGraph with `SqliteSaver`
+  checkpointer (upstream-framework-owned durability).
+- `sample_22_durable_multiturn.py` — Multi-turn conversation with
+  `durable_background=True, steerable_conversations=False`.
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/docs/handler-implementation-guide.md b/sdk/agentserver/azure-ai-agentserver-responses/docs/handler-implementation-guide.md
index b6b2d7d9dbba..73a7bf53f31b 100644
--- a/sdk/agentserver/azure-ai-agentserver-responses/docs/handler-implementation-guide.md
+++ b/sdk/agentserver/azure-ai-agentserver-responses/docs/handler-implementation-guide.md
@@ -34,6 +34,17 @@
 - [Configuration](#configuration)
   - [Distributed Tracing](#distributed-tracing)
   - [SSE Keep-Alive](#sse-keep-alive)
+- [Durability](#durability)
+  - [Mental Model](#mental-model)
+  - [The Recovery Loop](#the-recovery-loop)
+  - [Stream Checkpoints](#stream-checkpoints)
+  - [Item and Response `internal_metadata`](#item-and-response-internal_metadata)
+  - [Which metadata facility?](#which-metadata-facility)
+  - [Default Pattern (recovery-aware)](#default-pattern-recovery-aware)
+  - [Fallback Pattern (no opt-in)](#fallback-pattern-no-opt-in)
+  - [Upstream History Pattern](#upstream-history-pattern)
+  - [Watermark Pattern](#watermark-pattern)
+  - [Resumption Response Construction](#resumption-response-construction)
 - [Best Practices](#best-practices)
 - [Common Mistakes](#common-mistakes)
 
@@ -82,7 +93,7 @@ app = ResponsesAgentServerHost()
 
 
 @app.response_handler
-async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal):
+async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
     text = await context.get_input_text()
     return TextResponse(context, request, text=f"Echo: {text}")
 ```
@@ -117,7 +128,7 @@ When you have the full text available at once:
 
 ```python
 @app.response_handler
-async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal):
+async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
     text = await context.get_input_text()
     return TextResponse(context, request, text=f"Echo: {text}")
 ```
@@ -126,7 +137,7 @@ async def handler(request: CreateResponse, context: ResponseContext, cancellatio
 
 ```python
 @app.response_handler
-async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal):
+async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
     async def _build():
         text = await context.get_input_text()
         answer = await model.generate(text)
@@ -144,7 +155,7 @@ When an LLM produces tokens incrementally, pass an `AsyncIterable[str]` to
 import asyncio
 
 @app.response_handler
-def handler(request: CreateResponse, context: ResponseContext, cancellation_signal):
+async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
     async def generate_tokens():
         tokens = ["Hello", ", ", "world", "!"]
         for token in tokens:
@@ -192,7 +203,7 @@ The primary way to register a handler is the `@app.response_handler` decorator:
 app = ResponsesAgentServerHost()
 
 @app.response_handler
-def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
+async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
     return TextResponse(context, request, text="Hello!")
 
 app.run()
@@ -240,7 +251,7 @@ from starlette.routing import Mount
 responses_app = ResponsesAgentServerHost()
 
 @responses_app.response_handler
-def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
+async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
     return TextResponse(context, request, text="Hello!")
 
 app = Starlette(routes=[
@@ -284,7 +295,7 @@ no custom provider registration is needed.
 
 ```python
 @app.response_handler
-def handler(
+async def handler(
     request: CreateResponse,
     context: ResponseContext,
     cancellation_signal: asyncio.Event,
@@ -295,13 +306,22 @@ def handler(
 | Parameter | Description |
 |-----------|-------------|
 | `request` | The deserialized `CreateResponse` body from the client (model, input, tools, instructions, etc.) |
-| `context` | Provides the response ID, history resolution, and ID generation helpers |
-| `cancellation_signal` | An `asyncio.Event` set on cancellation (explicit `/cancel` call or client disconnection for non-background) |
+| `context` | The handler-facing `ResponseContext` — request-scoped state, async input/history helpers, the shutdown signal (`context.shutdown`), cancellation cause flags (`context.client_cancelled`), and recovery + steering fields (`context.is_recovery`, `context.is_steered_turn`, `context.pending_input_count`, `context.conversation_chain_metadata`, `context.exit_for_recovery()`) |
+| `cancellation_signal` | An `asyncio.Event` set on client cancel (`/cancel` API or non-bg POST disconnect) or steering pressure. Distinct from `context.shutdown` — shutdown does NOT fire this signal; handlers that care about both must observe each independently. |
+
+Handlers MUST be `async def` and take exactly three positional
+parameters `(request, context, cancellation_signal)`. Sync handlers and
+the 2-arg signature `(request, context)` are hard-rejected at
+decoration time with `TypeError`. Observe cancellation via
+`cancellation_signal.is_set()`; observe shutdown via
+`context.shutdown.is_set()`; see the [Cancellation](#cancellation)
+section for the cause-boolean shape and the
+[Shutdown](#shutdown-and-recovery) section for the recovery primitive.
 
 Your handler can either:
 
 1. **Return a `TextResponse`** — the simplest approach for text-only responses.
-2. **Be a Python generator** — `yield` events one at a time for full control.
+2. **Be an async generator** — `yield` events one at a time for full control.
 
 The library consumes the events, assigns sequence numbers, manages the response
 lifecycle, and delivers them to the client.
@@ -312,25 +332,28 @@ Use `return` — no generator yield needed:
 
 ```python
 @app.response_handler
-def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
+async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
     return TextResponse(context, request, text="Hello!")
 ```
 
 ### Generator handlers (ResponseEventStream)
 
-Use `yield` for full control. Can be **sync** or **async**:
+Use `yield` for full control. Handlers are always `async def`; they
+can be plain async functions that return an iterable, or async
+generators that `yield` events directly:
 
 ```python
-# Sync handler
+# Async generator — yields events one at a time
 @app.response_handler
-def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
+async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
     stream = ResponseEventStream(response_id=context.response_id, request=request)
     yield stream.emit_created()
     yield stream.emit_in_progress()
-    yield from stream.output_item_message("Hello!")
+    for event in stream.output_item_message("Hello!"):
+        yield event
     yield stream.emit_completed()
 
-# Async handler
+# Async generator with an async builder (token streaming)
 @app.response_handler
 async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
     stream = ResponseEventStream(response_id=context.response_id, request=request)
@@ -502,11 +525,32 @@ order. This prevents protocol violations at development time.
 
 ```python
 class ResponseContext:
-    response_id: str                        # Library-generated response ID
-    is_shutdown_requested: bool             # True when host is shutting down
-    request: CreateResponse | None          # Parsed request model
-    client_headers: dict[str, str]          # x-client-* headers from request (keys lowercase)
-    query_parameters: dict[str, str]        # Query parameters from the HTTP request
+    response_id: str                                # Library-generated response ID
+    conversation_chain_id: str                      # Stable identity for the multi-turn chain (see Durability)
+    request: CreateResponse | None                  # Parsed request model
+    client_headers: dict[str, str]                  # x-client-* headers from request (keys lowercase)
+    query_parameters: dict[str, str]                # Query parameters from the HTTP request
+    isolation: IsolationContext                     # Multi-tenant partition keys (user_key / chat_key)
+
+    # Shutdown surface (distinct from per-request cancellation_signal — see Cancellation)
+    shutdown: asyncio.Event                         # Set on graceful server shutdown
+    client_cancelled: bool                          # True for explicit /cancel call OR non-bg POST disconnect
+
+    async def exit_for_recovery() -> NoReturn
+        # Unified graceful-shutdown recovery primitive — call as a bare
+        # `await context.exit_for_recovery()` in any handler shape. Raises
+        # internally to leave the response in_progress for next-lifetime recovery.
+
+    # Recovery + steering classifiers (see Durability)
+    is_recovery: bool                               # True on a crash-recovered re-entry
+    persisted_response: ResponseObject | None       # Entry-only: last durably-persisted snapshot
+                                                    # (last stream.checkpoint(), else created snapshot,
+                                                    # else None). See Durability → persisted_response.
+    is_steered_turn: bool                           # True on the drain re-entry that follows a steering input
+    pending_input_count: int                        # Live count of queued steering inputs
+    conversation_chain_metadata: ConversationChainMetadataNamespace      # Persistent checkpoint store (Mapping + Callable facade)
+
+    # Async helpers
     async def get_input_items() -> Sequence[Item]   # Resolved input items as Item subtypes
     async def get_input_text() -> str               # Extract all text content from input items
     async def get_history() -> Sequence[OutputItem]  # Conversation history items
@@ -589,7 +633,7 @@ approach.
 
 ```python
 @app.response_handler
-def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
+async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
     return TextResponse(context, request, text="Hello, world!")
 ```
 
@@ -601,7 +645,8 @@ yield stream.emit_created()
 yield stream.emit_in_progress()
 
 # Complete text — full value up-front
-yield from stream.output_item_message("Hello, world!")
+for evt in stream.output_item_message("Hello, world!"):
+    yield evt
 
 yield stream.emit_completed()
 ```
@@ -650,7 +695,8 @@ yield stream.emit_created()
 yield stream.emit_in_progress()
 
 args = json.dumps({"location": "Seattle"})
-yield from stream.output_item_function_call("get_weather", "call_1", args)
+for evt in stream.output_item_function_call("get_weather", "call_1", args):
+    yield evt
 
 yield stream.emit_completed()
 ```
@@ -702,7 +748,8 @@ When your handler itself executes a tool and includes the output in the response
 (no client round-trip):
 
 ```python
-yield from stream.output_item_function_call_output("call_weather_1", weather_json)
+for evt in stream.output_item_function_call_output("call_weather_1", weather_json):
+    yield evt
 ```
 
 Function call outputs have no deltas — only `output_item.added` and
@@ -720,10 +767,12 @@ yield stream.emit_created()
 yield stream.emit_in_progress()
 
 # Output 0: Reasoning
-yield from stream.output_item_reasoning_item("Let me think about this...")
+for evt in stream.output_item_reasoning_item("Let me think about this..."):
+    yield evt
 
 # Output 1: Message with the answer
-yield from stream.output_item_message("The answer is 42.")
+for evt in stream.output_item_message("The answer is 42."):
+    yield evt
 
 yield stream.emit_completed()
 ```
@@ -752,10 +801,12 @@ yield stream.emit_created()
 yield stream.emit_in_progress()
 
 # Output 0
-yield from stream.output_item_message("First message.")
+for evt in stream.output_item_message("First message."):
+    yield evt
 
 # Output 1
-yield from stream.output_item_message("Second message.")
+for evt in stream.output_item_message("Second message."):
+    yield evt
 
 yield stream.emit_completed()
 ```
@@ -795,20 +846,23 @@ avoid the builder ceremony entirely:
 
 ```python
 # Image generation — emits full lifecycle automatically
-yield from stream.output_item_image_gen_call(result_base64)
+for evt in stream.output_item_image_gen_call(result_base64):
+    yield evt
 
 # Structured outputs
-yield from stream.output_item_structured_outputs({"sentiment": "positive", "confidence": 0.95})
+for evt in stream.output_item_structured_outputs({"sentiment": "positive", "confidence": 0.95}):
+    yield evt
 
 # Message with annotations
 from azure.ai.agentserver.responses.models import FilePath, UrlCitationBody
-yield from stream.output_item_message(
+for evt in stream.output_item_message(
     "Here are your sources.",
     annotations=[
         FilePath(file_id="/reports/summary.pdf", index=0),
         UrlCitationBody(url="https://example.com", start_index=0, end_index=5, title="Link"),
     ],
-)
+):
+    yield evt
 ```
 
 All convenience generators have async variants (prefixed with `a`):
@@ -854,107 +908,206 @@ The `CreateResponse` object also provides:
 
 ## Cancellation
 
-The `cancellation_signal` (`asyncio.Event`) is set when:
-
-- A client calls `POST /responses/{id}/cancel` (background mode only)
-- A client disconnects the HTTP connection (non-background mode)
-
-### TextResponse Handlers
-
-`TextResponse` handlers use `return TextResponse(...)`. Cancellation is propagated
-automatically — if the signal fires while producing text, remaining events are
-suppressed and the library handles the winddown.
-
-For streaming, check cancellation between chunks:
-
-```python
-@app.response_handler
-def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
-    async def stream_tokens():
-        async for token in model.stream(prompt):
-            if cancellation_signal.is_set():
-                return
-            yield token
-
-    return TextResponse(context, request, text=stream_tokens())
-```
-
-### ResponseEventStream Handlers — Sync
-
-Check the signal between iterations:
+The handler observes cancellation via two **distinct** surfaces and a
+cause-flag boolean:
+
+- **`cancellation_signal`** (3rd positional handler arg, `asyncio.Event`)
+  — set when the request itself is being cancelled. Three triggers fire
+  this signal: an explicit `POST /v1/responses/{id}/cancel` API call, a
+  non-background POST whose client disconnects mid-stream, or steering
+  pressure (a new turn arriving on the same steerable chain). This is
+  the wake-up signal handlers await / poll on inside their work loop.
+- **`context.shutdown`** (`asyncio.Event`) — set when the server is
+  shutting down (e.g. SIGTERM). Shutdown is a **separate** surface —
+  it does NOT fire the cancellation signal. The handler expectation
+  for shutdown is different from cancel: durable handlers should call
+  `await context.exit_for_recovery()` to leave the response
+  `in_progress` for re-entry on restart; non-durable handlers should
+  emit `response.failed` quickly. Handlers that care about both must
+  inspect each surface independently.
+- **`context.client_cancelled`** (`bool`) — cause flag stamped at the
+  HTTP boundary when the cancellation was an explicit client
+  cancellation (the `/cancel` endpoint OR a non-bg POST disconnect).
+  When `cancellation_signal` fires but `client_cancelled` is False
+  and `context.shutdown` is not set, the cause is steering pressure.
+
+| Cause | `cancellation_signal` | `context.shutdown` | `context.client_cancelled` | Framework Behaviour | What Handler Should Do |
+|-------|:---:|:---:|:---:|---|---|
+| **Steering** | set | not set | False | If no terminal emitted → auto-emit `response.failed`. If terminal emitted → honour it. | Break loop → close builders → `emit_completed()` |
+| **Client Cancel** | set | not set | True | Framework forces `cancelled` regardless of handler output. Output items abandoned. | Return as soon as cleanup is done. |
+| **Shutdown** | not set | set | False | Hard cutoff after `shutdown_grace_period_seconds`. Durable+bg: `await context.exit_for_recovery()` leaves the response `in_progress` for re-entry. Others: mark failed. | Checkpoint progress → `await context.exit_for_recovery()`. Or complete quickly. |
+| **Shutdown + Client Cancel race** | set | set | True | Each surface reflects its independent cause; framework prefers the cancel-status path. | Inspect each surface as needed; typically prefer shutdown's `exit_for_recovery()` for durable bg. |
+
+**Key status rules:**
+- `cancelled` is ONLY produced by explicit client cancellation (`/cancel` or non-bg POST disconnect). Never by steering or shutdown.
+- `incomplete` is NEVER set by the framework — it's exclusively developer-controlled.
+- `context.exit_for_recovery()` is the single, uniform graceful-shutdown recovery primitive — **it works in every handler shape** (coroutine, async generator, sync). Call it as a bare statement: `await context.exit_for_recovery()`. It raises internally (never returns), so there is no `return <value>` form to trip the async-generator `SyntaxError`. (A bare `return` without a terminal while `context.shutdown` is set still works as an implicit fallback, but the explicit primitive is the recommended idiom.)
+
+> **On shutdown for durable handlers**: leaving the response `in_progress` makes the framework re-invoke your handler on restart (when `durable_background=True`). Every handler shape uses the same line — `await context.exit_for_recovery()`. See [Durability](#durability) for the recovery contract — what the recovered handler must do, what the library guarantees on re-entry, and how clients reconcile the multi-attempt stream.
+
+### Default Pattern (handles cancel + shutdown)
+
+Most handlers need to observe BOTH `cancellation_signal` and
+`context.shutdown` in their work loop — cancel triggers graceful
+finish, shutdown triggers `exit_for_recovery()`:
 
 ```python
 @app.response_handler
-def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
-    stream = ResponseEventStream(...)
+async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
+    stream = ResponseEventStream(response_id=context.response_id, request=request)
     yield stream.emit_created()
     yield stream.emit_in_progress()
 
-    for chunk in get_chunks():
+    message = stream.add_output_item_message()
+    yield message.emit_added()
+    text = message.add_text_content()
+    yield text.emit_added()
+
+    async for token in model.stream(prompt):
+        if context.shutdown.is_set():
+            # Defer to next-lifetime recovery. The unified primitive
+            # raises internally and works in this async-generator shape.
+            await context.exit_for_recovery()
         if cancellation_signal.is_set():
             break
-        yield text.emit_delta(chunk)
+        yield text.emit_delta(token)
 
+    yield text.emit_text_done()
+    yield text.emit_done()
+    yield message.emit_done()
     yield stream.emit_completed()
 ```
 
-### ResponseEventStream Handlers — Async
+This works for all three causes:
+- **Steering**: partial output is preserved, `completed` status is correct
+- **Client cancel**: framework overrides status to `cancelled` regardless
+- **Shutdown**: if you emit `completed` within the grace period, the response
+  finishes successfully. If you can't finish in time, prefer the advanced pattern.
+
+### Advanced Pattern (pre-entry steering, durable shutdown recovery)
+
+For steerable + durable handlers, either surface may be pre-set when
+the handler is (re)entered: `context.shutdown` if the server is
+mid-shutdown, or `cancellation_signal` if a newer turn is already
+queued (steering) or the client cancelled. **These are distinct,
+(mostly) mutually-exclusive surfaces — shutdown does NOT fire
+`cancellation_signal` (see the table above) — so check each one
+independently, shutdown first.** Routing: for shutdown propagate the
+recovery sentinel; for steering emit `completed` (the turn was
+superseded); for explicit client cancel just return:
 
 ```python
 @app.response_handler
 async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
-    stream = ResponseEventStream(...)
+    stream = ResponseEventStream(response_id=context.response_id, request=request)
     yield stream.emit_created()
+
+    # Pre-entry: shutdown and cancellation are SEPARATE surfaces. Check
+    # shutdown first (it does not set cancellation_signal); this also
+    # resolves the rare both-set race in favour of recovery.
+    if context.shutdown.is_set():
+        # Server is shutting down; defer to next-lifetime recovery.
+        await context.exit_for_recovery()
+    if cancellation_signal.is_set():
+        if context.client_cancelled:
+            # Explicit client cancel — framework forces "cancelled" status.
+            return
+        # Steering — emit completed so the superseded turn finishes cleanly.
+        yield stream.emit_completed()
+        return
+
     yield stream.emit_in_progress()
 
+    message = stream.add_output_item_message()
+    yield message.emit_added()
+    text = message.add_text_content()
+    yield text.emit_added()
+
     async for token in model.stream(prompt):
         if cancellation_signal.is_set():
             break
         yield text.emit_delta(token)
 
+    # Shutdown mid-stream: defer to next-lifetime recovery — the framework
+    # leaves the response in_progress and re-invokes on restart.
+    if context.shutdown.is_set():
+        await context.exit_for_recovery()
+
+    yield text.emit_text_done()
+    yield text.emit_done()
+    yield message.emit_done()
     yield stream.emit_completed()
 ```
 
-### What the Library Does on Cancellation
+After the streaming loop breaks, check for `context.shutdown.is_set()`
+BEFORE closing builders. If shutdown interrupted mid-stream, call
+`await context.exit_for_recovery()` — the response stays `in_progress`
+and the handler is re-entered on the next process lifetime to produce the
+full output (requires
+`durable_background=True`).
 
-Let the handler exit cleanly — the server handles the winddown automatically:
+For all other cases (steering, client cancel, normal completion), close
+builders and emit `completed`:
 
-1. The library sets the `cancellation_signal` event.
-2. It waits up to 10 seconds for the handler to wind down. If the handler doesn't
-   cooperate, the cancel endpoint returns the response in its current state.
-3. Once the handler finishes (within or beyond the grace period), the response
-   transitions to `cancelled` status and a `response.failed` terminal event is
-   emitted and persisted.
+- **Steering/Normal**: `completed` is the correct status.
+- **Client cancel**: framework overrides to `cancelled` regardless.
+- **Shutdown**: handler hasn't finished its work — propagate
+  `await context.exit_for_recovery()` to defer re-entry.
 
-You don't need to emit any terminal event on cancellation — just check the signal
-and exit your generator cleanly.
+### Metadata Usage in Cancellation
 
-### Graceful Shutdown
+`context.conversation_chain_metadata` is appropriate for storing lightweight progress signals
+that help on re-entry — for example `last_processed_item_id` so you can
+take unprocessed items from response history after that point, or a step index
+for multi-phase workflows.
 
-When the host shuts down (e.g., SIGTERM), `context.is_shutdown_requested` is set to
-`True` and the cancellation signal is triggered. Use this to distinguish shutdown
-from explicit cancel:
+**Acceptable**: step counters, message IDs, phase indicators, checkpoint
+references for framework-native stores (e.g., a SqliteSaver checkpoint ID).
+
+**Not acceptable**: full conversation history, LLM outputs, or framework
+checkpoint data. These belong in framework-native stores (SqliteSaver for
+LangGraph, Copilot SDK sessions, or your own backing store).
+
+### TextResponse Handlers
+
+`TextResponse` handlers handle cancellation automatically. For streaming
+text with cancellation awareness:
 
 ```python
 @app.response_handler
 async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
-    stream = ResponseEventStream(...)
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    try:
-        result = await do_long_running_work()
-    except asyncio.CancelledError:
-        if context.is_shutdown_requested:
-            yield stream.emit_incomplete()
-            return
-        raise
+    async def stream_tokens():
+        async for token in model.stream(prompt):
+            if cancellation_signal.is_set():
+                return
+            yield token
 
-    async for event in stream.aoutput_item_message(result):
-        yield event
-    yield stream.emit_completed()
+    return TextResponse(context, request, text=stream_tokens())
 ```
 
+### Rules
+
+1. **MUST emit `response.created` before any early return** — the framework
+   cannot persist or track a response until `emit_created()` is yielded.
+
+2. **MUST emit a terminal event** (`emit_completed()`, `emit_incomplete()`,
+   or `emit_failed()`) in normal and cancellation paths. If the handler exits
+   without a terminal event, the framework forces `failed` status.
+
+3. **Do NOT emit `emit_cancelled()`** — the `cancelled` status is reserved
+   for the framework when the client cancel API is used. Handlers should
+   always emit `completed` (or `incomplete`/`failed` for errors).
+
+4. **Steering and client cancel are fully cooperative** — the framework
+   waits indefinitely for the handler to yield/return. Keep your cleanup fast
+   but you're not racing a deadline.
+
+5. **Shutdown has a hard cutoff** — after `shutdown_grace_period_seconds`
+   the process exits. Keep post-signal work under a few seconds.
+
+6. **`return` in an async generator is a bare statement** — you cannot
+   `return value`. Use `yield` for events, then `return` to exit.
+
 ---
 
 ## Error Handling
@@ -1131,6 +1284,513 @@ to disable nginx buffering.
 
 ---
 
+## Durability
+
+The framework re-invokes your handler when the server crashes mid-response
+(if `durable_background=True` and the request had `store=true, background=true`).
+What that re-invocation gives you, what you have to do to take advantage of it,
+and how clients reconcile a multi-attempt stream is the **recovery contract**.
+
+The deeper "how does this all fit together" view — the four-row dispatch matrix,
+the three termination paths (handler completes within grace, grace exhausted,
+crash), the exact persistence guarantees the framework makes, and the full
+conformance items — is in
+[`responses-durability-spec.md`](responses-durability-spec.md). That document is
+language-agnostic and intentionally exhaustive; this section is the developer
+how-to with worked Python examples. The conformance suite at
+`tests/e2e/durability_contract/` exercises every cell of the matrix.
+
+You can opt out of all of this and your response will still be correct (just
+duplicative). You opt in when you want the recovered attempt to pick up where
+the crashed one left off instead of re-running the whole turn.
+
+### Mental Model
+
+Three layers, each owning a specific slice of state:
+
+| Layer | Owns | On crash recovery, surfaces / provides |
+|---|---|---|
+| **Library** (this SDK) | Persisted SSE event stream (every event you emitted, in order) — used for client replay via `starting_after=`. The library persists the response *object* at the first attempt's `response.created`, at **each successful `yield stream.checkpoint()`**, and at the terminal event; the `response.created` and terminal writes are deduplicated across recovery attempts (idempotent persistence keyed on `response_id`). The last persisted snapshot is exposed on re-entry as `context.persisted_response`. It does NOT keep a *running* snapshot of in-flight state between those persistence points. | Re-invokes the handler. Surfaces `context.is_recovery == True`, `context.persisted_response`, `context.is_steered_turn`, `context.pending_input_count`, and `context.conversation_chain_metadata`. Replays persisted events to reconnecting clients. Rebuilds your `ResponseContext` transparently — the handler sees the same `response_id` it had on the first attempt. |
+| **Handler** (your code) | The "what was safely committed" decision, plus side-effect watermarks in `context.conversation_chain_metadata`. | Decides the resumption point. Constructs the **resumption response**. Emits a fresh `response.in_progress` carrying it. Continues producing new output items. |
+| **Upstream framework** (Copilot SDK, LangGraph, your own LLM client) | The conversational / graph / agent state that has to outlive a process death. | Has its own resume facility (session ID, checkpoint store) that you call from the handler. |
+
+You do NOT own response event durability — that's the library. The library
+does NOT own conversational durability — that's upstream. You glue them
+together.
+
+### The Recovery Loop
+
+When the server restarts after a crash and your handler is re-invoked:
+
+1. The library calls your handler with `context.is_recovery == True`.
+2. You query upstream (and your own `context.conversation_chain_metadata` watermarks) to determine the **resumption point** — the most recent state you are confident is durably committed.
+3. You build a **resumption response**: a `ResponseObject` reflecting only the output items you trust at the resumption point. **In-flight items from the crashed attempt are excluded.** Construct this from upstream framework state + your own metadata watermarks — the library does NOT give you a snapshot of the prior attempt's in-flight state, because none exists in a useful form.
+4. You construct `ResponseEventStream(response=resumption_response, ...)` instead of the usual `request=request` form.
+5. You emit `response.created` exactly as you would on a fresh attempt — the framework dedups the response-store write so it happens exactly once across all recovery attempts. You do not need to branch on `is_recovery` to decide whether to emit `response.created`.
+6. You emit `response.in_progress`. This event's `response` payload IS the resumption response — and the library treats it as a **client-visible snapshot reset**. Reconnecting clients discard any partial in-progress state they had and adopt this payload as authoritative.
+7. You continue producing new output items, potentially at the same `output_index` values you used before the crash. Content does NOT have to match the pre-crash content (LLMs are non-deterministic; that's fine).
+8. You emit your terminal event.
+
+The library guarantees that step 6's `in_progress` is treated as a reset:
+- The persisted response state is REPLACED with the event payload.
+- Subsequent `output_item.added` at indexes already present in the resumption response REPLACE the prior item (don't append a duplicate).
+
+The library does NOT deduplicate handler-emitted events. If you don't emit a
+reset `in_progress`, the persisted state grows by whatever you emit, which
+is the naive fallback (see below).
+
+### What the Library Does
+
+- Persists every SSE event in order. No reordering, no deduplication of stream events — **except** that a recovered handler's re-emitted `response.created` is not re-appended to an already-non-empty durable stream (so a replaying client sees `response.created` exactly once; spec 026).
+- Persists the response *object* at the first attempt's `response.created`, at **each successful `yield stream.checkpoint()`**, and at the terminal event. The `response.created` and terminal writes are deduplicated across recovery attempts (idempotent persistence keyed on `response_id`); the handler does not branch for them. The last persisted snapshot is exposed on re-entry as `context.persisted_response`.
+- Rebuilds your `ResponseContext` transparently on any cross-process recovery — the recovered handler sees the same `response_id`, the same `request`, the same `conversation_chain_id`, and the same cancellation surface (`cancellation_signal` (3rd positional handler arg), `context.shutdown`, `context.client_cancelled`) it had on the first attempt. Id generation is a fresh-entry-only concern.
+- Surfaces flat recovery + steering classifiers on `ResponseContext`: `context.is_recovery`, `context.persisted_response`, `context.is_steered_turn`, `context.pending_input_count`, `context.conversation_chain_metadata`. For the framework-checkpoint model, `context.persisted_response` is the last durably-checkpointed snapshot; for upstream-owned recovery, the library holds no useful in-flight snapshot and you consult your upstream framework for resumption state.
+- Treats any `response.in_progress` event after the first one as a snapshot reset.
+- Replays persisted events to reconnecting clients on `starting_after=`. The reset `in_progress` is part of the replay; clients use it as the reconciliation signal.
+- **Surfaces graceful-shutdown recovery via one uniform signal in every handler shape.** The framework leaves the response `in_progress` so the next process lifetime re-invokes your handler with `context.is_recovery=True` when, on `context.shutdown`, the handler calls `await context.exit_for_recovery()`. This single idiom works identically in coroutine/`TextResponse` and streaming async-generator handlers — it raises internally (never returns), so there is no `return <value>` form to trip the async-generator `SyntaxError`. (An implicit fallback also applies: a streaming handler that simply `return`s without a terminal **while `context.shutdown` is set** still recovers — but `await context.exit_for_recovery()` is the recommended explicit idiom. A bare `return` during normal execution still yields the default terminal.)
+- For `background=false` responses (or `durable_background=False` background responses): marks the response `failed` on crash and does NOT re-invoke the handler.
+- For `store=false` responses: best-effort `failed` marker during shutdown grace period; no recovery.
+
+### What the Handler Does
+
+- Branches on `context.is_recovery` to choose fresh-entry vs recovered-entry code paths.
+- Builds the resumption response from upstream-framework state + own metadata watermarks. **Excludes in-flight items.**
+- Constructs `ResponseEventStream(response=resumption_response)` on recovered entry.
+- Emits `response.in_progress` early in the recovered path (this is the reset).
+- Uses upstream framework's native resume facility (e.g. session resume, checkpoint replay) — never re-runs a side-effecting upstream call without checking a watermark first.
+- Watermarks any upstream side-effecting call by writing a small marker to `context.conversation_chain_metadata` **before** the call and clearing it **after** the call has been durably committed upstream. Call `await context.conversation_chain_metadata.flush()` between the watermark write and the side effect to ensure the marker survives a crash.
+- For upstream-session-id needs: `context.conversation_chain_id` is a derived, stable chain identifier — the framework computes it so every turn of the same conversation resolves to the same value (anchored to the conversation's root: a `conversation_id`, or the head of a `previous_response_id` chain, falling back to a first turn's own `response_id`), stable across all attempts of a turn. It's a convenient session id to pass to upstream frameworks (Copilot `session_id`, LangGraph `thread_id`) — using it avoids allocating and persisting your own UUID, though you may use your own identifier if you prefer.
+
+### Stream Checkpoints
+
+For durable background responses you can persist a snapshot of the response at
+explicit, developer-chosen boundaries with `yield stream.checkpoint()`. A
+checkpoint durably writes the current `stream.response` (every output item you
+have finished emitting) via the storage provider, so a crashed attempt can
+resume from the last checkpoint instead of re-running the whole turn.
+
+```python
+@app.response_handler
+async def handler(request, context, cancellation_signal):
+    # On recovery, seed the stream from the last durably-checkpointed
+    # snapshot — the completed phases' items are already in
+    # stream.response.output, so resume from their count.
+    if context.is_recovery and context.persisted_response is not None:
+        stream = ResponseEventStream(
+            response_id=context.response_id, response=context.persisted_response,
+        )
+        start_phase = len(stream.response.output)
+    else:
+        stream = ResponseEventStream(response_id=context.response_id, request=request)
+        start_phase = 0
+
+    yield stream.emit_created()      # recovery: framework suppresses the durable-stream
+                                     # write (stream already has the pre-crash created);
+                                     # this seeds the in-memory stream + first-event validator
+    yield stream.emit_in_progress()  # client-visible reset point on recovery (carries seeded items)
+
+    for phase in range(start_phase, NUM_PHASES):
+        message = stream.add_output_item_message()
+        yield message.emit_added()
+        text = message.add_text_content()
+        yield text.emit_added()
+        yield text.emit_delta(await run_phase(phase))   # the expensive work
+        yield text.emit_done()
+        yield message.emit_done()
+        yield stream.checkpoint()        # phase N is now durable
+
+    yield stream.emit_completed()
+```
+
+Semantics (the full normative list is in
+[`responses-durability-spec.md`](responses-durability-spec.md) and
+[`durability-contract.md`](durability-contract.md) Row 11):
+
+- **Deterministic + developer-driven.** Checkpoints happen ONLY where you yield
+  one. There are no periodic, timer, or implicit checkpoints.
+- **Backpressured.** The handler is suspended at the `yield` until the provider
+  write completes — "I checkpointed" means "it is durable now". The handler
+  cannot race ahead while a slow write is in flight.
+- **No-op unless durable background.** The write happens ONLY when the
+  deployment has `durable_background=True` and the request is `background=true`
+  (which implies `store=true`). In every other configuration the checkpoint
+  event is dropped (no provider write), so you may yield it unconditionally.
+- **Idempotent.** A snapshot byte-identical to the last persisted one is
+  skipped.
+- **Failures swallowed.** A provider error is logged and ignored; recovery
+  falls back to the previously-persisted snapshot.
+- **After terminal.** A checkpoint yielded after a terminal event is dropped
+  (the terminal write is authoritative); no exception.
+
+#### `context.persisted_response`
+
+On a recovered entry, `context.persisted_response` is the last durably-persisted
+`ResponseObject` snapshot (the last checkpoint, or the `response.created`
+snapshot if no checkpoint ran), or `None` if nothing was persisted before the
+crash. It is an **entry-only** cache — read it at the start of a recovered
+invocation to decide where to resume; it is not refreshed mid-execution.
+
+The **one-OutputItem-per-phase** pattern composes naturally with it: emit one
+output item per phase and checkpoint at each boundary, then on recovery **seed
+the stream** with `context.persisted_response` and resume from
+`len(stream.response.output)`. A phase whose `output_item.done` + checkpoint
+completed survives (it is already in the seeded output, carrying its original
+content); a phase interrupted before its checkpoint is re-run — correct by
+construction, with no extra watermark bookkeeping.
+
+> On recovery you seed `ResponseEventStream(response=context.persisted_response)`
+> so the already-checkpointed items are present in `stream.response.output` and
+> the builder's output-index continues past them. You then `yield
+> stream.emit_created()` exactly as on a fresh attempt — the framework
+> recognises the recovered entry and accepts the seeded output (it dedups the
+> response-store write). You emit ONLY the remaining phases via builder events;
+> the persisted response is the watermark, so there is no replay or breadcrumb
+> reconstruction.
+
+### Item and Response `internal_metadata`
+
+`internal_metadata` is a **single-turn**, platform-internal key/value bag that
+rides on output items and on the response, is persisted with the response (so
+it survives crash recovery), and is **always stripped before any client-facing
+HTTP or SSE payload** — clients never see it.
+
+```python
+# Item-level — a live MutableMapping[str, Any], lazily created, never None.
+message = stream.add_output_item_message()
+message.internal_metadata["upstream_msg_id"] = "abc-123"
+message.internal_metadata["attempt"] = 2
+
+# Response-level — read/write/delete via the stream proxy.
+stream.internal_metadata["resume_phase"] = 3
+del stream.internal_metadata["scratch"]
+```
+
+Use it for lightweight per-turn watermarks, id mappings (e.g. an upstream
+framework's message id ↔ the emitted item), or stale-message / crash-recovery
+detection within the turn. It is persisted whenever the response is persisted —
+at `response.created`, at each `yield stream.checkpoint()`, and at terminal — so
+on recovery you read it back from `context.persisted_response`. It is distinct
+from the *public* `ResponseObject.metadata` dict (the client's own metadata,
+which is NOT stripped).
+
+### Which metadata facility?
+
+The context exposes **two** internal-metadata facilities at **different scopes**
+— do not confuse them:
+
+| Aspect | `context.conversation_chain_metadata` | `internal_metadata` (item + response) |
+|---|---|---|
+| **Scope** | **Cross-turn** — persists across turns/responses on the same conversation chain (steerable multi-turn, recovery re-entries). | **Single turn** — lives on this response (or its items) only. |
+| **Best for** | Cross-turn watermarks; state a later turn needs from an earlier one; coordination between layers/nodes spanning the chain. | Lightweight per-turn watermarks; id mappings; in-turn crash-recovery / stale-message detection. |
+| **Structure** | **Named scopes** — `conversation_chain_metadata(name)` returns an isolated sibling namespace, so parallel nodes/layers track + `flush()` independently. | Flat per-object map (use key prefixes if you need grouping). |
+| **Durability trigger** | Explicit `await …flush()` (+ durable-task lifecycle). | Persisted when the owning response is persisted (`created`, each `checkpoint()`, terminal). No separate flush. |
+| **Visibility** | Task/durability state — never on the wire. | Rides on the response/items but **stripped on egress/ingress** — clients never see it. |
+| **Lifetime** | The conversation chain / durable-task lifetime. | This response's persisted record; readable on recovery via `context.persisted_response`. |
+
+**Rule of thumb:** need it in a *later turn* → `conversation_chain_metadata`;
+need it only to reconstruct *this* response on crash recovery →
+`internal_metadata` (+ `stream.checkpoint()`).
+
+### Default Pattern (recovery-aware)
+
+A framework-agnostic recovery-aware handler. The upstream-specific reconciliation
+(how to query upstream for its state, how to resume a session) is in your
+sample's docstring; the pattern below stays uniform.
+
+```python
+from azure.ai.agentserver.responses import (
+    CreateResponse, ResponseContext, ResponseEventStream,
+)
+from azure.ai.agentserver.responses.models._generated import ResponseObject
+
+
+@app.response_handler
+async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
+    # ── Choose between fresh and recovered entry ────────────────────
+    if context.is_recovery:
+        # Ask upstream (or read context.conversation_chain_metadata) for what was
+        # safely committed.
+        resumption = _build_resumption_response(context, request)
+        stream = ResponseEventStream(
+            response_id=context.response_id, response=resumption,
+        )
+    else:
+        stream = ResponseEventStream(
+            response_id=context.response_id, request=request,
+        )
+
+    yield stream.emit_created()  # same call on fresh and recovered; framework dedups
+
+    # The cancellation contract still applies on recovered entry. Shutdown
+    # and cancellation are DISTINCT, (mostly) mutually-exclusive surfaces —
+    # shutdown does NOT fire cancellation_signal — so check each one
+    # independently, shutdown first. Defer to recovery for shutdown; emit
+    # `completed` for steering pressure; return for explicit client cancel.
+    if context.shutdown.is_set():
+        await context.exit_for_recovery()  # defer to next-lifetime recovery
+    if cancellation_signal.is_set():
+        if context.client_cancelled:
+            return  # framework forces "cancelled" status
+        # Steering pressure — emit completed so the superseded turn
+        # finishes cleanly.
+        yield stream.emit_completed()
+        return
+
+    # ── This is the client-visible reset point on recovery ──────────
+    yield stream.emit_in_progress()
+
+    # Now produce new content. Use upstream's resume facility before any
+    # side-effecting call. Watermark before; clear after upstream commit.
+    async for event in _produce_new_output(stream, request, context):
+        yield event
+
+    # On graceful shutdown mid-work, defer to next-lifetime recovery —
+    # the framework leaves the response `in_progress` and re-invokes on
+    # the next process restart (requires durable_background=True).
+    if context.shutdown.is_set():
+        await context.exit_for_recovery()
+
+    yield stream.emit_completed()
+```
+
+### Fallback Pattern (no opt-in)
+
+A handler that does nothing recovery-specific still produces a correct response.
+The library:
+- accepts the duplicate `created` from re-entry,
+- accepts a fresh `in_progress` with empty output as the reset,
+- accumulates the re-streamed content as the new authoritative view.
+
+The cost: clients that reconnected with `starting_after=` see a reset to empty
+and a full re-stream. The final response is correct; the UX is jarring.
+Upstream side-effecting calls (LLM queries, agent session writes) may be
+issued twice — this corrupts upstream session history. If your upstream has
+durable history that matters, you MUST adopt the recovery-aware pattern. If
+your handler has no upstream side effects (e.g. it streams from an
+idempotent source), the fallback is fine.
+
+### Upstream History Pattern (preferred when available)
+
+Many stateful upstream SDKs expose their persisted conversation log directly —
+e.g. `claude_agent_sdk.get_session_messages(session_id)` returns the list of
+messages the SDK has durably committed, and Copilot's `session.get_messages()`
+does the same for its event log. When that API is available, use it as the
+source of truth for "did my prior attempt already send this turn?" — no handler
+metadata, no watermark, no flush ordering.
+
+```python
+async def _send_input_if_not_in_session(session, session_id, user_input):
+    history = await session.get_messages()
+    # If the most recent user message in upstream history matches the current
+    # input, the prior attempt already sent it — skip the upstream call.
+    last_user = next(
+        (evt for evt in reversed(history) if _is_user_message(evt)),
+        None,
+    )
+    if last_user is not None and _extract_user_text(last_user) == user_input:
+        return
+    await session.send(user_input)
+```
+
+Why this beats a handler-managed watermark:
+
+- The detection input is the upstream's own durable log — there is no window
+  between "we sent the call" and "we wrote our watermark" where a crash leaves
+  the handler and the upstream out of sync.
+- No `context.conversation_chain_metadata` write, no `metadata.flush()`, no decision about
+  flush-before vs flush-after.
+- On any attempt (fresh, recovered, multiply-recovered) the same one-liner
+  works: query history, compare, send only if needed.
+
+Edge case to document in your sample: if a prior turn's input was byte-equal to
+the current turn's input AND that prior turn completed normally, the
+"last user message in history equals current input" heuristic incorrectly
+skips. Rare in practice for human-driven conversations; if your domain has
+machine-generated identical-input replays, fall back to the watermark pattern
+below.
+
+### Watermark Pattern (fallback when upstream exposes no persisted history)
+
+When the upstream SDK does **not** expose its committed log — or does not
+distinguish "queued but unacked" from "durably committed" — the framework
+cannot know which of your calls have side effects, so you stamp a marker in
+`context.conversation_chain_metadata` before the call and clear it after the upstream commit.
+
+The strict at-most-once pattern is **write → flush → side effect → write →
+flush**. The explicit `await metadata.flush()` ensures the watermark hits
+durable storage before the side effect runs; without it, the framework only
+snapshots metadata at durable-task lifecycle boundaries
+(start/suspend/complete/fail/cancel), so a crash between "side effect issued"
+and the next lifecycle boundary would leave the watermark in memory only and
+re-issue the side effect on recovery. The explicit `flush()` is the fence.
+
+```python
+#flat context surface — no nested durability object
+# Stamp BEFORE the side-effecting call, and FLUSH to make the marker durable.
+context.conversation_chain_metadata["upstream_query_in_flight"] = True
+await context.conversation_chain_metadata.flush()
+
+await upstream.send_message(prompt)
+
+# Stream the response back…
+async for chunk in upstream.receive_response():
+    if cancellation_signal.is_set():
+        break
+    yield ...emit_delta(chunk)
+
+# Clear AFTER the upstream durably committed the result
+# (e.g. assistant message landed in the upstream's session log), and
+# FLUSH so the cleared marker survives a subsequent crash.
+context.conversation_chain_metadata["upstream_query_in_flight"] = False
+await context.conversation_chain_metadata.flush()
+```
+
+On recovery you check the marker:
+
+- Marker `True`: prior attempt called the upstream API. Use upstream's resume
+  facility (and, if available, fork primitive) to avoid duplicating the
+  message in upstream history. **Do NOT call `upstream.send_message(prompt)` again.**
+- Marker `False` (or missing): no prior side effect. Treat as fresh entry from
+  the upstream's perspective.
+
+The two flushes are the cost of at-most-once. If your side effect is naturally
+idempotent (e.g. it carries a client-supplied request id and the upstream
+dedupes), you can skip both flushes and rely on the upstream's dedup. The
+upstream-history pattern above is preferred whenever it's available because
+it removes the watermark window entirely.
+
+Watermark naming convention (recommended): `<upstream>_<operation>_in_flight: bool`.
+SDK-specific names belong in your sample's docstring.
+
+### Resumption Response Construction
+
+The resumption response is the `ResponseObject` you hand to
+`ResponseEventStream(response=…)` on a recovered entry; its `output` is the
+client-visible reset point. How much you build depends on your resume model.
+
+**Simplest case — return the persisted snapshot as-is.** If you used framework
+checkpoints (`stream.checkpoint()`), `context.persisted_response` already holds
+exactly the items that were durably committed at the last checkpoint. You can
+seed straight from it, no construction needed:
+
+```python
+if context.is_recovery and context.persisted_response is not None:
+    stream = ResponseEventStream(
+        response=context.persisted_response, response_id=context.response_id,
+    )
+    start_phase = len(stream.response.output)   # resume past committed items
+```
+
+**Involved case — trim items you can't trust.** If the snapshot (or your
+upstream's view) may contain items emitted by work that did NOT durably commit,
+you trim `output` down to only the items you trust, then resume. *What* to trim
+is your decision, and you can drive it from any durable signal you stamped:
+
+- **An upstream framework's checkpoint state** (which steps it actually saved).
+- **Item-level `internal_metadata`** — tag each emitted item with, say, the step
+  that produced it (`message.internal_metadata["step"] = step_id`); it rides on
+  the persisted item and is stripped before the client ever sees it.
+- **Response-level `internal_metadata`** (`stream.internal_metadata[...]`).
+- **`context.conversation_chain_metadata`** watermarks.
+
+For example: tag each message with the step that emitted it, then on recovery
+keep only items whose step is in your checkpoint store and drop the rest:
+
+```python
+def _build_resumption_response(context, request) -> ResponseObject:
+    snapshot = context.persisted_response
+    committed_steps = upstream.checkpointed_step_ids(context.conversation_chain_id)
+
+    kept = [
+        item for item in (snapshot.output if snapshot else [])
+        # the step tag we stamped on each item when we first emitted it
+        if (item.get("internal_metadata") or {}).get("step") in committed_steps
+    ]
+    return ResponseObject({
+        "id": context.response_id,
+        "object": "response",
+        "status": "in_progress",
+        "output": kept,          # only items from steps we know were checkpointed
+        "model": request.model,
+    })
+```
+
+The library persists the response object at `response.created`, at **each
+successful `stream.checkpoint()`**, and at the terminal event (the
+`response.created` and terminal writes are deduped across attempts keyed on
+`response_id`). It does not keep a *running* snapshot between those points — so
+for any item whose commit status falls between persistence points, you are the
+source of truth for whether to keep it, via the watermarks above.
+
+### Recovery × Cancellation Composition
+
+The cancellation contract from the [Cancellation](#cancellation) section composes
+with recovery cleanly:
+
+- **Recovered entry + `cancellation_signal` (3rd positional handler arg) pre-set**: same as fresh entry — inspect the cause flags. Steering pressure (no cause flag) emits `completed`; explicit client cancel returns; shutdown propagates `await context.exit_for_recovery()`.
+- **Recovered entry + `cancellation_signal` (3rd positional handler arg) fires mid-stream**: same as fresh entry — break the loop, then check `context.shutdown.is_set()` for the recovery-deferral path; otherwise close builders and `emit_completed`.
+- **Crash during recovery itself**: same code path; each attempt queries upstream for its current state, computes a (possibly different) resumption response, emits a fresh reset `in_progress`. The loop is re-entrant.
+
+### Configuration
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `durable_background` | `False` | Opt INTO crash-recoverable background responses |
+| `steerable_conversations` | `False` | Multi-turn conversation steering (see [Cancellation](#cancellation)) |
+
+See the [Durable Responses Developer Guide](durable-responses-developer-guide.md)
+for the configuration matrix (`store` × `background` × `durable_background`),
+the flat `ResponseContext` recovery + steering surface, and client-side
+reconciliation rules.
+
+---
+
+## Steering API
+
+Steering (`steerable_conversations=True`) lets a new turn arrive on an
+already-active conversation: the framework cancels the in-progress turn via
+`cancellation_signal` (see [Cancellation](#cancellation)), then re-invokes the
+handler to drain the queued input. The handler-facing surface:
+
+- **`context.is_steered_turn: bool`** — `True` on the drain re-entry that
+  follows a steering input (not on the turn that was superseded).
+- **`context.pending_input_count: int`** — live count of additional inputs
+  queued behind the current turn; decreases as the framework drains them.
+- **`@app.response_acceptor`** — the hook that produces the `"queued"`
+  `ResponseObject` returned to the POST that was queued onto an
+  **already-active** steerable conversation (never the first turn).
+
+### `@app.response_acceptor`
+
+When a new turn is queued onto an active steerable conversation, the framework
+immediately returns a `status="queued"` response to that POST while the prior
+turn finishes. By default this is a minimal queued envelope; register a hook to
+customize it. The hook is **synchronous**, receives `(request, context)`, and
+returns a strongly-typed `ResponseObject`:
+
+```python
+from azure.ai.agentserver.responses import (
+    CreateResponse, ResponseContext, ResponseObject,
+)
+
+@app.response_acceptor
+def acceptor(request: CreateResponse, context: ResponseContext) -> ResponseObject:
+    return ResponseObject(
+        {
+            "id": context.response_id,
+            "object": "response",
+            "status": "queued",
+        }
+    )
+```
+
+- The framework ensures `status` defaults to `"queued"` if you omit it.
+- If the hook raises, the framework logs a warning and falls back to the
+  default queued envelope — a buggy hook never breaks queueing.
+- The hook is optional; omit it to use the default envelope.
+
+---
+
 ## Best Practices
 
 ### 1. Start with TextResponse
@@ -1158,7 +1818,7 @@ for word in words:
 
 ### 4. Check Cancellation in Loops
 
-Any long-running loop should check `cancellation_signal`:
+Any long-running loop should check `cancellation_signal.is_set()`:
 
 ```python
 for item in large_collection:
@@ -1179,9 +1839,11 @@ Start with `output_item_message()` / `aoutput_item_message()`. Drop down to
 
 ### 7. Let the Library Handle Mode Negotiation
 
-Never branch on `request.stream` or `request.background` in your handler. The
-library handles these — your handler always produces the same event sequence
-regardless of mode.
+You usually don't need to branch on `request.stream` or `request.background` —
+the library negotiates the wire mode and replays the same event sequence for
+streaming, non-streaming, and background callers. Emit one event sequence and
+let the framework adapt it; reach for mode-specific behaviour only if your
+application genuinely needs it.
 
 ```python
 # ❌ Don't do this
@@ -1193,7 +1855,8 @@ else:
 # ✅ Same event sequence for all modes
 yield stream.emit_created()
 yield stream.emit_in_progress()
-yield from stream.output_item_message("Hello!")
+for evt in stream.output_item_message("Hello!"):
+    yield evt
 yield stream.emit_completed()
 ```
 
@@ -1204,6 +1867,79 @@ yield stream.emit_completed()
 
 ## Common Mistakes
 
+### Returning Without Emitting Events
+
+```python
+# ❌ Handler exits without producing anything — framework forces "failed"
+@app.response_handler
+async def handler(request, context, cancellation_signal):
+    if cancellation_signal.is_set():
+        return  # No events emitted! Response stuck in limbo.
+
+# ✅ Always emit response.created and a terminal event
+@app.response_handler
+async def handler(request, context, cancellation_signal):
+    stream = ResponseEventStream(response_id=context.response_id, request=request)
+    yield stream.emit_created()
+    if cancellation_signal.is_set():
+        yield stream.emit_completed()
+        return
+    # ... normal processing
+    yield stream.emit_completed()
+```
+
+### Not Emitting response.created Before Early Return
+
+```python
+# ❌ Skips emit_created — framework cannot persist or track this response
+@app.response_handler
+async def handler(request, context, cancellation_signal):
+    stream = ResponseEventStream(response_id=context.response_id, request=request)
+    if some_condition:
+        yield stream.emit_completed()  # Created was never emitted!
+        return
+
+# ✅ Always emit_created first, regardless of path
+@app.response_handler
+async def handler(request, context, cancellation_signal):
+    stream = ResponseEventStream(response_id=context.response_id, request=request)
+    yield stream.emit_created()  # ALWAYS first
+    if some_condition:
+        yield stream.emit_completed()
+        return
+```
+
+### Emitting cancelled Status on Steering
+
+```python
+# ❌ "cancelled" is reserved for client cancel API — don't emit it yourself
+if cancellation_signal.is_set():
+    yield stream.emit_cancelled()  # WRONG — only framework sets cancelled
+
+# ✅ Emit completed — steering means "finish this turn, partial output is valid"
+if cancellation_signal.is_set():
+    yield text.emit_text_done()
+    yield text.emit_done()
+    yield message.emit_done()
+    yield stream.emit_completed()
+```
+
+### Returning None from Handler
+
+```python
+# ❌ Returning None (implicit or explicit) produces no events
+@app.response_handler
+async def handler(request, context, cancellation_signal):
+    result = await do_work()
+    # Forgot to return/yield! Python returns None implicitly.
+
+# ✅ Always return TextResponse or yield events from ResponseEventStream
+@app.response_handler
+async def handler(request, context, cancellation_signal):
+    result = await do_work()
+    return TextResponse(context, request, text=result)
+```
+
 ### Using ResponseEventStream When TextResponse Suffices
 
 ```python
@@ -1211,7 +1947,8 @@ yield stream.emit_completed()
 stream = ResponseEventStream(response_id=context.response_id, request=request)
 yield stream.emit_created()
 yield stream.emit_in_progress()
-yield from stream.output_item_message("Hello!")
+for evt in stream.output_item_message("Hello!"):
+    yield evt
 yield stream.emit_completed()
 
 # ✅ Use TextResponse — one line, same result
@@ -1272,6 +2009,100 @@ else:
 # ✅ Same event sequence regardless of mode
 yield stream.emit_created()
 yield stream.emit_in_progress()
-yield from stream.output_item_message("Hello!")
+for evt in stream.output_item_message("Hello!"):
+    yield evt
 yield stream.emit_completed()
 ```
+
+### Expecting a Running Snapshot of the Prior Attempt's In-Flight State
+
+```python
+# ❌ There is no "running" snapshot of in-flight state, and no such attribute.
+# The library persists the response object at created, at each checkpoint,
+# and at terminal — not continuously.
+stream = ResponseEventStream(
+    response_id=context.response_id,
+    response=context.prior_attempt_snapshot,  # AttributeError — no such field
+)
+
+# ✅ Use the snapshot that fits your resume model:
+#  - framework-checkpoint: context.persisted_response is the LAST durably
+#    checkpointed snapshot (or the created snapshot, or None).
+if context.is_recovery and context.persisted_response is not None:
+    stream = ResponseEventStream(
+        response_id=context.response_id, response=context.persisted_response,
+    )
+#  - upstream-owned: build a resumption response from your upstream state.
+else:
+    resumption = _build_resumption_response(context, request)
+    stream = ResponseEventStream(response_id=context.response_id, response=resumption)
+```
+
+The library does not keep a *running* snapshot between persistence points — but
+`context.persisted_response` gives you the last checkpointed one. See
+[Durability](#durability) for both resume models.
+
+### Calling Upstream Side-Effecting APIs on Recovery Without a Watermark
+
+```python
+# ❌ Re-calls upstream.send_message() on every recovery → duplicate user
+# messages in the upstream session history forever.
+async def handler(request, context, cancellation_signal):
+    if context.is_recovery:
+        ... # rebuild stream
+    await upstream.send_message(prompt)  # called on every attempt!
+
+# ✅ Watermark before the side-effecting call; check before re-issuing.
+async def handler(request, context, cancellation_signal):
+    if not context.conversation_chain_metadata.get("upstream_query_in_flight"):
+        context.conversation_chain_metadata["upstream_query_in_flight"] = True
+        await upstream.send_message(prompt)
+    # On recovery with watermark set, skip the send and just receive.
+    async for chunk in upstream.receive_response():
+        ...
+    context.conversation_chain_metadata["upstream_query_in_flight"] = False
+```
+
+See [Durability → Watermark Pattern](#durability).
+
+### Emitting `response.created` Without `response.in_progress` on Recovery
+
+```python
+# ❌ Recovery code path emits created and jumps to output items. No
+# reset point — clients merge new items with pre-crash partial state.
+async def handler(request, context, cancellation_signal):
+    if context.is_recovery:
+        stream = ResponseEventStream(
+            response_id=context.response_id,
+            response=_build_resumption_response(...),
+        )
+        yield stream.emit_created()
+        # Jumps straight to producing output → no reset signal for clients
+
+# ✅ Emit response.in_progress before any output items on recovery.
+# That event IS the snapshot reset point.
+async def handler(request, context, cancellation_signal):
+    if context.is_recovery:
+        stream = ResponseEventStream(
+            response_id=context.response_id,
+            response=_build_resumption_response(...),
+        )
+        yield stream.emit_created()
+        yield stream.emit_in_progress()  # ← client reset point
+        # ... then produce output
+```
+
+### Storing Conversation History in `context.conversation_chain_metadata`
+
+```python
+# ❌ Metadata isn't for bulk data. Hits payload limits, and the upstream
+# framework should be the source of truth for conversation history.
+context.conversation_chain_metadata["messages"] = [m.as_dict() for m in conversation]
+
+# ✅ Stash a small reference (session ID, checkpoint ID) and ask upstream
+# for the actual state when you need it.
+context.conversation_chain_metadata["claude_session_id"] = session_id  # a UUID string
+```
+
+See [Durability → Mental Model](#durability) for why upstream owns
+conversation state.
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/docs/responses-durability-spec.md b/sdk/agentserver/azure-ai-agentserver-responses/docs/responses-durability-spec.md
new file mode 100644
index 000000000000..3111b0f28d6f
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/docs/responses-durability-spec.md
@@ -0,0 +1,1612 @@
+# Responses Durability — Authoritative Specification
+
+> **Status**: Living specification. Authoritative **design** reference for the
+> responses durability surface — the full mental model, internals, cancellation,
+> steering, worked sequences, and the conformance-item index.
+>
+> **Normative ownership (single edit point).** The machine-verified
+> **conformance contract** — the dispatch matrix and its per-cell dispositions,
+> the streaming sub-contract, the recovered-entry precondition, and the
+> handler/framework obligations — is owned by
+> [`durability-contract.md`](durability-contract.md). That doc is parsed by the
+> conformance meta-tests and pinned by the Constitution. Where this spec restates
+> any of those clauses it is a **non-normative summary for readability**; on any
+> conflict, `durability-contract.md` is authoritative, and the normative edit is
+> made there. This spec is authoritative for everything the contract does NOT
+> carry (terminology, chain identity, the reserved metadata namespace, the
+> perpetual-task internals, cancellation §10, steering §11, the worked sequences
+> §12–13, and the C-* conformance index §14).
+>
+> **Audience**: Library implementers porting this contract to another
+> language; framework reviewers verifying behavior against the
+> implementation; integrators building reference clients.
+>
+> **Scope**: The durability, recovery, steering, conversation-locking,
+> and stream-reconciliation contract that the agentserver responses
+> layer adds on top of an underlying durable-task primitive (see
+> `azure-ai-agentserver-core/docs/task-and-streaming-spec.md`). The
+> public OpenAI-compatible Responses HTTP/SSE surface is OUT OF SCOPE
+> here except where this layer adds new headers, error codes, or
+> event semantics on top of it.
+>
+> **Stability promise**: The contract terms (matrix rows, disposition
+> values, reserved namespaces, reset semantics) are normative. The
+> Python class names cited throughout are illustrative — port them as
+> idiomatic in the target language.
+
+This document is intentionally redundant in places (every section can
+be read in isolation; cross-references are hints, not prerequisites)
+to keep each contract surface independently understandable.
+
+---
+
+## §1 — Why this document exists
+
+The responses durability layer sits between (a) the OpenAI-compatible
+Responses HTTP/SSE protocol that end-users call, and (b) the durable
+task primitive that gives the host process crash-recovery. The layer's
+job is to translate the per-request HTTP shape — `(store, background,
+stream, conversation_id, previous_response_id)` plus server options
+`(durable_background, steerable_conversations)` — into one of a small
+set of durability behaviors, and to give recovered handlers the
+context they need to produce a coherent response after a process
+restart.
+
+The *behavior* of each request (when does the framework re-invoke the
+handler? when does it mark `failed`? when does it return HTTP 409?) is
+fully determined by the per-row dispatch matrix in §3 below. Once a
+row is selected, the row's recovery, cancellation, and steering rules
+fall out from the contracts in §§ 6–11. There is no other source of
+behavioral variation a port should need to model.
+
+Anything not explicitly stated here is unspecified and SHOULD NOT be
+relied on; in particular, the layer makes no guarantees about
+multi-replica concurrent recovery (single-node-restart only) or about
+foundry-backed storage providers (the contract is validated against
+the file-based provider and is the same contract the foundry provider
+implements).
+
+---
+
+## §2 — Terminology
+
+| Term | Meaning |
+|---|---|
+| **Response** | A single `POST /v1/responses` call's logical output, identified by a server-issued `response_id`. |
+| **Conversation chain** | A sequence of responses sharing a stable chain identity (see §4) — either via `conversation_id` or via a sequence of `previous_response_id` links. |
+| **Durable task** | A record in the underlying task store representing the perpetual execution loop for a conversation chain. Identified by a deterministic `task_id` (§4). |
+| **Handler** | The user-written response handler — an `async def` function (or async generator) that produces output for one turn of one conversation chain. |
+| **Fresh entry** | A handler invocation that is not a recovery — either the chain's very first turn, or a subsequent turn delivered to a live task body. |
+| **Recovered entry** | A handler invocation triggered by the durable-task recovery scanner, after a previous lifetime's task body did not reach a terminal state. |
+| **Steered turn** | A turn whose input arrived while a previous turn for the same chain was still in progress; the steered turn was queued and is now being delivered. |
+| **Acceptance hook** | Optional developer-provided callback that produces the initial `status="queued"` response object the HTTP caller of a steered turn sees synchronously, before the handler runs. |
+| **Disposition** | Per-task framework metadata key telling the recovery scanner what to do on a recovered entry: `re-invoke` or `mark-failed`. |
+| **Resumption response** | Handler-built `ResponseObject` reflecting the safe-to-resume-from state; carried as the `response` payload of the recovery `response.in_progress` event. |
+| **Reset event** | The second-or-later `response.in_progress` event in a stream — clients MUST treat it as a snapshot reset of the local response view. |
+| **Response store** | The persistent store of `ResponseObject` envelopes; written at `response.created` and at terminal events. |
+| **Stream event store** | The persistent ordered log of SSE events emitted during a response's execution; used for `starting_after=` reconnection. |
+| **Termination path A / B / C** | (A) handler completes within grace window; (B) grace exhausted, in-process marker fires; (C) crash or Path-B failure, next-lifetime recovery scanner fires. |
+| **Row 1 / 2 / 3 / 4** | The four behaviour rows of the matrix (§3). |
+
+---
+
+## §3 — The dispatch matrix
+
+Every `POST /v1/responses` falls in exactly one of four rows, keyed on
+three flags:
+
+- `store` — request-controlled, defaults to `true`.
+- `background` — request-controlled, defaults to `false`.
+- `durable_background` — developer-controlled server option, defaults
+  to `false`. Developers opt INTO crash-recovery re-invocation by
+  setting it to `true`; the default lands the response in
+  "crash-failed" mode (Row 2 disposition), where a crash mid-handler
+  surfaces as a `failed` terminal in the next lifetime rather than
+  re-invoking the handler.The end-user (HTTP caller) sets `store`, `background`, and `stream`.
+The developer sets `durable_background` and `steerable_conversations`
+on `ResponsesServerOptions`. End-users CANNOT override developer
+decisions; developers CANNOT override end-user request flags. This
+separation is normative.
+
+> **Normative source:** the four rows and their per-cell dispositions are the
+> matrix in [`durability-contract.md` § The matrix](durability-contract.md). The
+> table below is a readability summary; the contract is authoritative.
+
+| # | `store` | `background` | `durable_background` | Behaviour |
+|---|---|---|---|---|
+| 1 | true | true  | true  | **Full durability.** Handler runs inside the durable task body. Recovery re-invokes the handler. |
+| 2 | true | true  | false | **Crash-failed durability.** Handler runs inside the durable task body; disposition is `mark-failed`. If the process dies before terminal, recovery marks the response `failed` (no re-invoke). |
+| 3 | true | false | (any) | **Crash-failed durability.** Same shape as Row 2: handler runs inside the durable task body (HTTP request awaits via `TaskRun.result()`); recovery marks the response `failed` on crash. |
+| 4 | false | (any) | (any) | **No durability.** Best-effort failed marker during graceful shutdown. No persistence. No recovery. |
+
+`stream` is orthogonal: it collapses out of the row keys. Each row × `stream`
+combination is its own conformance cell.
+
+`steerable_conversations` is orthogonal to the row but composes only with
+`store=true` (Rows 1, 2, 3) — see §11.
+
+`starting_after=` reconnection is supported only for `store=true` requests
+(any row 1/2/3). For Row 4 there is no persisted event log; reconnection is
+not meaningful.
+
+### §3.1 — Termination paths
+
+Each row × stream cell has three termination paths the framework MUST
+deliver per the table below:
+
+| Path | Trigger | Row 1 (`durable_bg`) | Rows 2/3 (`store`, no `durable_bg`) | Row 4 (no store) |
+|---|---|---|---|---|
+| **A** | Handler returns within grace | Persist terminal; task body returns | Persist terminal; task body returns | Persist terminal (best-effort) |
+| **B** | Grace exhausted (graceful shutdown) | Task left `in_progress`; handler stops; **next lifetime re-invokes** | Task body persists `failed` (server_error, shutdown_reason=grace_exhausted) | Best-effort in-process `failed` marker |
+| **C** | SIGKILL or Path-B failure | Next-lifetime recovery scanner re-fires task → handler re-invoked with `context.is_recovery=True` | Next-lifetime recovery scanner re-fires task → marks response `failed` (server_error, shutdown_reason=crash_recovery) | No recovery applies (no persistence) |
+
+The framework MUST implement Path B and Path C as independent fallbacks
+for each other (Path C is a complete fallback for Path B). A Path-B
+in-process marker that does not durably persist before the process
+exits MUST be backed by a Path-C next-lifetime marker; the row 2/3
+recovery scanner closes that window.
+
+### §3.2 — `stream` × row interaction
+
+`stream` does not alter row selection, but it MUST alter the
+implementation path:
+
+- **`stream=false`** — the handler is invoked, its terminal result is
+  persisted to the response store, and the HTTP caller receives the
+  full `ResponseObject` envelope (background: `200 OK` with the
+  envelope reflecting the current state; foreground: `200 OK` with the
+  terminal envelope).
+- **`stream=true`** — the handler's emitted SSE events are persisted
+  to the stream event store in order, and the HTTP caller receives a
+  live SSE feed. Reconnection via `GET /responses/{id}?stream=true&starting_after=N`
+  returns only events with `sequence_number > N`.
+
+For Row 1 × `stream=true`, recovery MUST re-engage the durable task
+body so the recovered handler's events flow to both the live subject
+and the persisted event log; recovered events appear in the same
+stream after `starting_after=` reconnect.
+
+For Rows 2/3 × `stream=true`, the handler runs inside the task body;
+on crash, the task body's `mark-failed` recovery branch persists the
+`failed` marker as the only post-crash artifact. Clients reading the
+persisted stream see whatever events landed before the crash plus
+no further events.
+
+---
+
+## §4 — Conversation chain identity
+
+The framework computes a deterministic **chain id** for every request,
+and uses it for two purposes:
+
+1. **Partitioning the durable task** — every turn in a chain shares a
+   single `task_id`.
+2. **Exposing identity to handlers** — handlers that wrap a stateful
+   upstream SDK (e.g. an LLM agent SDK with its own session-resume
+   facility) use the chain id as their upstream session identifier
+   without having to allocate their own.
+
+### §4.1 — Derivation
+
+The chain id is derived from the request as follows, in priority
+order:
+
+1. If the request supplies `conversation_id`, return it.
+2. Else if the request supplies `previous_response_id`:
+   - If `steerable_conversations=true`, return `previous_response_id`
+     (so every turn in a steerable chain returns the same value).
+   - If `steerable_conversations=false`, return `response_id` (each
+     fork gets its own chain id).
+3. Else, return `response_id` (so first-turn handlers always get a
+   non-`None` identity).
+
+This rule is normative. A port MUST exhibit the same priority order
+and the same steerable / non-steerable disambiguation for `previous_response_id`.
+
+### §4.2 — The `task_id`
+
+The durable task is keyed on a deterministic `task_id` derived from the
+chain id plus an agent / session salt:
+
+```
+chain_id = derive_chain_id(...)
+partition_key = {
+  "conv:"   if conversation_id was used,
+  "chain:"  if previous_response_id + steerable=true,
+  "fork:"   if previous_response_id + steerable=false,
+  "resp:"   if response_id was used (fallback)
+} + chain_id
+
+composite = "{agent_name}:{session_id}:{partition_key}"
+task_id = "durable-resp-" + sha256(composite).hex()[:32]
+```
+
+The `agent_name` and `session_id` salt prevents cross-agent and
+cross-session task collisions. The `partition_key` prefix is
+diagnostic only — it preserves the derivation in the hash input so
+two chains with different provenance but identical chain id values
+produce different `task_id`s.
+
+### §4.3 — Public surface
+
+The chain id is exposed to handlers as `context.conversation_chain_id`
+(a `str`, never `None`). Handlers wrapping a stateful upstream SDK
+SHOULD use this as their upstream session id rather than allocating a
+fresh UUID. The value is stable across all attempts (fresh, recovered,
+multiply-recovered) of every turn in the chain.
+
+---
+
+## §5 — Reserved framework metadata namespace
+
+The framework persists its own control state alongside the handler's
+`metadata` checkpoint store. The two are isolated by namespace prefix:
+
+- The default namespace and any developer-named namespace MUST NOT
+  start with `_`.
+- The framework reserves namespaces starting with `_`. The responses
+  layer specifically uses **`_responses`**.
+
+The handler-facing `metadata` API MUST raise `ValueError` if a
+developer attempts to set, get, or open a namespace whose name starts
+with `_`. Framework code (the orchestrator) reaches `_responses` via
+the underlying task primitive directly, bypassing the handler-facing
+wrapper.
+
+### §5.1 — Keys in `_responses`
+
+| Key | Value | Written by | Read by |
+|---|---|---|---|
+| `response_id` | The chain's response id stamp (informational; useful for operator triage) | First entry of the task body | Operators (logs / dumps) |
+| `background` | The original `background` request flag at first entry | First entry of the task body | Recovery dispatch (secondary signal; `disposition` is primary) |
+| `disposition` | `"re-invoke"` (Row 1) or `"mark-failed"` (Rows 2, 3) | First entry of the task body, flushed durably before any subsequent await | Recovery dispatch (§7) |
+
+A port MAY add additional reserved keys under `_responses` provided
+they do not collide with the three above and are documented as
+framework-internal.
+
+> **Note — no `last_sequence_number` key.** Earlier drafts reserved a
+> `_responses.last_sequence_number` metadata watermark for streaming
+> reconnection bookkeeping. The implementation does **not** maintain it:
+> the highest persisted sequence number is derived directly from the
+> durable **stream event store's cursor** (`last_cursor()`), which is the
+> single source of truth — a separate metadata watermark could diverge
+> from the events actually persisted. See §9.1.
+
+### §5.2 — Persistence ordering rule
+
+The `disposition` key MUST be flushed durably before the task body
+performs any await that could be interrupted by a crash. Without this
+ordering, a recovered task with no `disposition` defaults to
+`re-invoke` and skips the `mark-failed` branch — losing the
+recovery-marker semantics for Rows 2/3.
+
+The same rule applies to any future key that affects recovery
+dispatch.
+
+---
+
+## §6 — The perpetual conversation-scoped task
+
+For every `store=true` request, the framework engages a durable
+task. The task is **perpetual**: it represents the conversation
+chain's execution loop, not a single response.
+
+**One architecture — unified handler-in-task-body.** The handler
+ALWAYS runs inside the durable task body, for every `store=true`
+row. The"bookkeeping pattern" (where the handler ran
+outside the body for Rows 2/3 and a separate task waited for a
+completion signal) has been deleted. Recovery behaviour is selected
+by the `disposition` written into framework metadata on the first
+entry: `re-invoke` means the recovery scanner re-fires the handler;
+`mark-failed` means the recovery scanner persists `failed` and
+returns without re-invoking.
+
+Internally, the responses layer picks one of two underlying task
+primitives per request based on the `(store, conversation_id,
+previous_response_id, steerable_conversations)` tuple. Single-turn
+requests use a one-shot primitive; multi-turn requests use a chain
+primitive. The choice is invisible to handlers (the flat recovery +
+steering surface — `is_recovery`, `is_steered_turn`,
+`pending_input_count`, `conversation_chain_metadata` — looks the same regardless)
+and to clients (the HTTP/SSE contract is identical). The full table
+is in §6.4.
+
+### §6.1 — Lifecycle (Row 1 — `durable_background=true`, bg+store)
+
+For Row 1 with `steerable_conversations=true`:
+
+1. **First turn** — `start(task_id, input=params, input_id=response_id_1)`
+   creates the task. Task body runs the handler for turn 1.
+2. **Handler returns** — the task body returns `None` (the framework's
+   implicit-suspend signal for multi-turn primitives), keeping the
+   task alive for the next turn.
+3. **Subsequent turn** — `start(task_id, input=params, input_id=response_id_2,
+   if_last_input_id=response_id_1)` resumes the task. The framework's
+   input-precondition primitive enforces sequential chain extension
+   (see §11.2). Task body runs the handler for turn 2.
+4. **Crash mid-handler** — task stays `in_progress` until the
+   recovery scanner re-fires it. The recovered entry runs the handler
+   again with `context.is_recovery=true`. Disposition is `re-invoke`.
+
+For Row 1 with `steerable_conversations=false`, each turn (whether
+forked or sequential) maps to a distinct `task_id` (the `fork:` /
+`resp:` partition disambiguates), so no suspend-and-resume loop is
+needed; each task is one-shot.
+
+### §6.2 — Lifecycle (Rows 2/3 — `durable_background=false` and foreground+store)
+
+Same shape as §6.1: the handler runs inside the durable task body.
+The only differences are:
+
+1. **Disposition is `mark-failed`** — written to framework metadata on
+   first entry, so recovery does NOT re-invoke the handler.
+2. **HTTP request coupling** — for Row 3 (foreground), the HTTP
+   request awaits the task body's terminal via the framework's
+   `TaskRun.result()` API. For Row 2 (background, non-durable
+   recovery), the HTTP request returns immediately after the
+   `response.created` event is observed.
+3. **Crash mid-handler** — task stays `in_progress`. The recovery
+   scanner re-fires it; the recovered entry takes the `mark-failed`
+   branch and persists `failed` (server_error,
+   shutdown_reason=crash_recovery) idempotently. (The idempotency
+   check skips the overwrite if the response is already terminal —
+   see §7.2.) The handler is NOT re-invoked.
+
+### §6.3 — Lifecycle (Row 4 — `store=false`)
+
+No durable task. The handler runs inline (foreground) or via a
+detached background task (background). The graceful-shutdown path
+MAY make a best-effort attempt to persist a `failed` marker in
+whatever transient response store is in use — but this is
+best-effort only and not durable. On SIGKILL there is no recovery.
+
+### §6.4 — Primitive selection (per-request dispatch matrix)
+
+The responses layer dispatches each `store=true` request to one of two
+underlying durable-task primitives, based on the request shape and the
+deployment's `steerable_conversations` option. This is a refinement of
+the top-level 4-row matrix in §3 — Rows 1, 2, and 3 (all `store=true`
+rows) split into sub-rows here according to whether the request
+identifies a multi-turn chain.
+
+| `conversation_id` | `previous_response_id` | `steerable_conversations` | Primitive | Rationale |
+|---|---|---|---|---|
+| absent | absent | (any) | one-shot (`@task`) | Single request, no chain — the task_id is unique per request; auto-deleted on terminal exit. |
+| absent | present | `false` | one-shot (`@task`) | Fork-style: each request gets its own task_id (the `fork:` partition), so no chain semantics needed. |
+| absent | present | `true` | multi-turn (`@multi_turn_task(steerable=true)`) | Steerable chain extension: turns share a task_id (the `chain:` partition); the framework suspends between turns and queues mid-turn inputs. |
+| present | (any) | `false` | multi-turn (`@multi_turn_task(steerable=false)`) | Conversation-scoped chain: turns share a task_id (the `conv:` partition); chain suspends between turns. Concurrent overlap returns 409 `conversation_locked` (no queueing). |
+| present | (any) | `true` | multi-turn (`@multi_turn_task(steerable=true)`) | Same conversation-scoped chain, with mid-turn inputs queued instead of rejected. |
+
+The primitive choice MUST be made at request-dispatch time (not at
+deployment-config time) because the same deployment serves both
+single-turn requests (one-shot primitive) and multi-turn requests
+(multi-turn primitive) — the deployment's `steerable_conversations`
+flag only controls the multi-turn primitive's mid-turn-input behaviour.
+
+The choice is invisible to handlers — `recovery + steering context (flat fields on the response context)` looks
+identical regardless of which primitive carries the body. The choice
+is invisible to clients — the HTTP/SSE contract on `POST /v1/responses`
+and `GET /responses/{id}` is independent of the underlying primitive.
+
+The task_id derivation (§4.2) is also independent of the primitive
+choice — the `conv:` / `chain:` / `fork:` / `resp:` partition prefix
+in the hash input ensures requests routed to different primitives
+also get distinct task_ids when they should.
+
+---
+
+## §7 — Recovery dispatch
+
+> **Normative source:** the per-row recovery dispositions and the
+> recovered-entry precondition (drop when the response was never durably
+> created) are owned by [`durability-contract.md`](durability-contract.md)
+> (§ Recovered entry, Per-row contracts). This section is the design detail.
+
+The recovered entry of any durable task body inspects the
+`_responses.disposition` key and routes:
+
+### §7.1 — `disposition == "re-invoke"` (Row 1)
+
+The handler is invoked again with `context.is_recovery == True`. The
+handler is responsible for building a resumption response and emitting
+a reset `response.in_progress` event (§8). The framework does NOT
+re-execute the handler from a checkpoint; it re-invokes the whole
+handler body.
+
+**Recovery precondition — the response must have been durably created.**
+Before re-invoking, the framework reads the response from the response
+store. If the response is **definitively absent** (a typed not-found:
+`KeyError` from the in-memory / file providers, `FoundryResourceNotFoundError`
+mapped from the hosted store's HTTP 404), the original `POST /responses`
+disconnected before any `response.created` was persisted, so no client ever
+received a response id to fetch or poll. The framework MUST **drop** the
+recovery — do NOT re-invoke the handler, emit no `response.*` events, write
+no terminal — and settle the task so the recovery scanner does not re-select
+it. This gate applies to **both `stream=false` and `stream=true`** durable
+background recovery: it runs on the shared recovered-entry path *before* the
+stream-vs-non-stream dispatch, so a non-streaming response with no persisted
+snapshot is dropped identically to a streaming one. A transient/ambiguous
+store error (`FoundryBadRequestError`, `FoundryApiError`,
+`ServiceRequestError` / `ServiceResponseError` / `OSError`, or any other
+class) is NOT a definitive absence and MUST NOT trigger a drop — recovery
+proceeds with `persisted_response = None`.
+
+The handler-facing `context.conversation_chain_metadata` carries whatever
+watermarks the previous attempt persisted (the framework auto-flushes
+the metadata namespaces it owns at lifecycle boundaries — start /
+suspend / complete / fail / cancel / terminate — so values written
+and forgotten are still visible after a clean recovery; the fence for
+at-most-once side-effect patterns is the handler's explicit
+`conversation_chain_metadata.flush()` call).
+
+### §7.2 — `disposition == "mark-failed"` (Rows 2, 3)
+
+On recovery, the task body:
+
+1. Looks up the response in the response store.
+2. If the response is already terminal (`completed`, `failed`,
+   `cancelled`, `incomplete`), returns without overwriting — the
+   crash happened after terminal persistence and before the
+   task body could complete.
+3. Otherwise, persists a `failed` response with
+   `error.code="server_error"`,
+   `error.additionalInfo.shutdown_reason="crash_recovery"`,
+   `output=[]`.
+4. Returns cleanly. Task → `completed`. The handler is NOT invoked.
+
+For steerable chains (`steerable_conversations=true`), the body
+returns `None` rather than raising an explicit suspend — the framework
+records the implicit-suspend transition for multi-turn primitives
+automatically. The response store's `failed` terminal that step 3
+persisted is the authoritative failure record; the in-process result
+of the body's `return None` is consistent with that. For non-steerable
+chains, returning is correct.
+
+### §7.3 — The `server_error` payload
+
+Every framework-emitted recovery / shutdown marker uses this
+exact shape:
+
+```json
+{
+  "id": "<response_id>",
+  "object": "response",
+  "status": "failed",
+  "output": [],
+  "error": {
+    "type": "server_error",
+    "code": "server_error",
+    "message": "<path-specific human-readable cause>",
+    "additionalInfo": {
+      "shutdown_reason": "crash_recovery" | "grace_exhausted"
+    }
+  }
+}
+```
+
+- `type` and `code` are always `"server_error"` — the user-facing
+  error class is generic.
+- `shutdown_reason` is operator-facing and distinguishes path B
+  (`grace_exhausted` — in-process marker fired) from path C
+  (`crash_recovery` — next-lifetime recovery scanner marker).
+- `message` is human-readable and SHOULD encode the path-specific
+  cause ("Server interrupted before completing this response" /
+  "Server stopped before this response completed"). Ports MAY
+  localise; the structure is what is normative.
+
+---
+
+## §8 — The recovery contract (handler-side)
+
+The handler receives recovery + steering state via flat fields on
+the response context:
+
+| Property | Type | Meaning |
+|---|---|---|
+| `is_recovery` | `Bool` | True when this invocation is a re-entry after a crash; False on every other entry (including new turns in a multi-turn chain). |
+| `is_steered_turn` | `Bool` | True only on the drain re-entry that follows steering pressure — set when the queued steering input is being executed as its own turn. NOT set on the cancelled current turn that produced the steering pressure. |
+| `pending_input_count` | `Int` | Number of queued steering inputs visible to the handler (live count — decreases as the framework drains the queue). |
+| `conversation_chain_metadata` | Mapping + Callable | Cross-turn developer checkpoint store; see §8.1. Typed via the public `ConversationChainMetadataNamespace` Protocol. |
+| `persisted_response` | `ResponseObject` \| `None` | Entry-only — the last durably-persisted snapshot (last `stream.checkpoint()`, or `response.created`), or `None` if nothing persisted before the crash. See §8.4. |
+
+These fields are always present on the response context. For
+`store=true` rows the framework populates them from the underlying
+durable task primitive; for `store=false` (Row 4) the fields
+default to a fresh, non-recovered, non-steered shape with an
+in-memory metadata backing (writes succeed at runtime but evaporate
+on restart).
+
+### §8.1 — `conversation_chain_metadata` semantics
+
+- **Default namespace** — `context.conversation_chain_metadata["key"] = value`.
+- **Named namespace** — `context.conversation_chain_metadata("name")["key"] = value`.
+- **Reserved prefix** — keys and namespace names starting with `_` MUST
+  raise `ValueError` from the handler-facing wrapper.
+- **Persistence** — writes are durable within the namespace's dirty
+  buffer. `await context.conversation_chain_metadata.flush()` (or the
+  namespace's `flush()`) is the at-most-once fence for side effects.
+  The framework auto-flushes at lifecycle boundaries (start, suspend,
+  complete, fail, cancel, terminate); a handler that never flushes
+  still sees its writes on a clean recovery — the fence is only for
+  side effects you cannot afford to repeat.
+- **Size discipline** — `conversation_chain_metadata` is a small key-value store
+  for *references and watermarks*, not a checkpoint *store*. Bulk
+  application state belongs in the handler's own upstream framework
+  (LLM-SDK session JSONL, checkpoint DB, files on disk).
+  Implementations MAY enforce a size cap on the durable task payload.
+
+### §8.2 — The recovery model
+
+The recovery contract has three actors:
+
+1. **Framework** — re-invokes the handler with
+   `context.is_recovery == True`. Persists every SSE event
+   in order (no dedup, except that a recovered handler's re-emitted
+   `response.created` is not re-appended to a non-empty durable stream —
+   see §8.3). Persists the response **envelope** at the first attempt's
+   `response.created`, at **each successful `stream.checkpoint()`**, and at
+   the terminal event. The `response.created` and terminal writes are
+   **deduplicated** across recovery attempts keyed on `response_id` (§9.4);
+   the last persisted envelope is exposed on re-entry as
+   `context.persisted_response` (§8.4).
+2. **Handler** — computes a **resumption point** and resumes from it. Two
+   shipping models (the handler picks based on where its durable progress
+   state lives, and they compose):
+   - **Framework-checkpoint**: emit one `OutputItem` per phase +
+     `stream.checkpoint()` at each boundary; on recovery seed
+     `ResponseEventStream(response=context.persisted_response)` and resume
+     from `len(stream.response.output)`. The persisted snapshot is the
+     watermark — no separate metadata bookkeeping is required when it is the
+     only durable progress/side-effect boundary.
+   - **Upstream-owned**: query an upstream framework/store + own metadata
+     watermarks; build a resumption `ResponseObject` from that state;
+     construct `ResponseEventStream(response=resumption_response)`.
+   Either way the handler emits a `response.in_progress` event carrying the
+   resumption response and continues from the resumption point. Metadata
+   watermarks set BEFORE non-idempotent side-effecting calls protect against
+   duplicate side effects across attempts (a composable overlay on either
+   model).
+3. **Client** — observes the reset-on-`in_progress` rule (§9.3);
+   redraws its local response view from the reset event's payload.
+
+### §8.3 — Naive fallback
+
+A handler that does nothing recovery-specific MUST still produce a
+correct response. The fallback shape is:
+
+1. Handler runs from scratch on every recovery.
+2. Emits `response.created`. On a recovered entry the framework does NOT
+   re-append `response.created` to the durable stream — it appends it only
+   when the stream is empty, and a recovered stream already carries the
+   pre-crash `response.created`. The re-emitted event still seeds the
+   handler's in-memory stream and satisfies the first-event validator, but a
+   reconnecting/replaying client observes `response.created` exactly once.
+3. Emits `response.in_progress` with an empty `response.output` (this
+   serves as the implicit snapshot reset for clients, and is the first
+   stream-visible event of the recovered lifetime).
+4. Re-streams the whole turn.
+5. Emits its terminal event (the framework deduplicates against the
+   first terminal that lands).
+
+The final response is correct. The client UX is jarring (full re-stream
+on every recovery) but consistent.
+
+The naive opt-out is unsafe ONLY when the handler makes upstream
+side-effecting calls without watermarks — duplicate side effects
+(double-sending user input, double-debiting a credit balance, etc.)
+are the handler's responsibility to prevent.
+
+### §8.4 — Checkpoint-driven recovery (`stream.checkpoint()`, `persisted_response`, `internal_metadata`)
+
+Between the naive full-re-stream fallback (§8.3) and hand-rolled
+metadata watermarks, the framework offers a **developer checkpoint write
+point** so a recovered handler can resume from durably-persisted output
+rather than re-running the whole turn.
+
+**`stream.checkpoint()`** — a yielded stream event:
+
+```
+yield stream.checkpoint()
+```
+
+Yielding it durably persists the current `stream.response` snapshot (every
+output item finished so far) via `provider.update_response`. It is a third
+write point alongside `response.created` and the terminal write (§9.1).
+Properties:
+
+- **Deterministic + developer-driven** — checkpoints happen only where the
+  handler yields one. There are NO periodic, timer, or implicit checkpoints.
+- **Backpressured** — because the handler is an async generator consumed
+  lockstep, the provider write completes before control returns from the
+  `yield`. "I checkpointed" means "it is durable now".
+- **Durable-background-gated** — the write happens ONLY for a
+  `durable_background=True`, `background=true` (hence `store=true`) request —
+  the only configuration with a crash-recovery re-invocation path. In every
+  other case the event is dropped (no write), so a handler MAY yield it
+  unconditionally.
+- **Idempotent** — a snapshot byte-identical to the last persisted one is
+  skipped.
+- **Failures swallowed** — a provider error is logged and ignored; recovery
+  falls back to the previously-persisted snapshot.
+- **After terminal** — a checkpoint yielded after a terminal event is dropped
+  (the terminal write is authoritative); no exception.
+- **Deferral preserves the checkpoint** — when a handler defers via
+  `await context.exit_for_recovery()`, the framework MUST NOT overwrite the
+  last checkpoint snapshot with a pre-terminal record; the checkpoint remains
+  authoritative for the next lifetime.
+
+**`context.persisted_response`** — on a recovered entry, the last
+durably-persisted `ResponseObject` snapshot (the last checkpoint, or the
+`response.created` snapshot if none ran), or `None` if nothing persisted
+before the crash. Entry-only: read it at the start of the recovered
+invocation to decide the resume point; it is not refreshed mid-execution.
+
+**The one-OutputItem-per-phase pattern.** Emit one output item per logical
+phase and `yield stream.checkpoint()` at each boundary. On recovery, **seed
+the stream** with `context.persisted_response` and resume from
+`len(stream.response.output)`: a phase whose `output_item.done` + checkpoint
+completed is already present in the seeded output (it survives); a phase
+interrupted before its checkpoint is re-run — correct by construction. The
+recovered handler `yield stream.emit_created()` exactly as on a fresh entry;
+the framework recognises the recovered entry and accepts the seeded output
+(deduping the response-store write). It then emits only the remaining phases
+via builder events — the persisted response is the watermark, so there is no
+replay or breadcrumb reconstruction. The per-row × per-path conformance for
+this write point is **Row 11** in
+[`durability-contract.md`](durability-contract.md).
+
+**`internal_metadata`** — a single-turn, platform-internal key/value bag on
+each output item and on the response (via `stream.internal_metadata` /
+`item.internal_metadata`, both live `MutableMapping[str, Any]` views). It is
+persisted wherever the response is persisted (`response.created`, every
+`stream.checkpoint()`, terminal) and is **always stripped before any
+client-facing HTTP/SSE payload** — and symmetrically stripped on ingress, so
+clients can neither read nor inject it. Use it for lightweight per-turn
+watermarks, id mappings (upstream message id ↔ emitted item), or in-turn
+stale-message detection; read it back on recovery via
+`context.persisted_response`. It is distinct from the *public*
+`ResponseObject.metadata` (the client's own metadata, never stripped) and
+from `context.conversation_chain_metadata` (cross-turn, named-scope,
+flush-controlled — §8.1). Rule of thumb: cross-turn state →
+`conversation_chain_metadata`; reconstruct *this* response on crash →
+`internal_metadata` + `stream.checkpoint()`.
+
+---
+
+## §9 — Stream contract
+
+> **Normative source:** the streaming sub-contract — event-persistence
+> ordering, `starting_after=` reconnect, the single-`response.created`
+> per-stream rule, and the `response.in_progress` reset — is owned by
+> [`durability-contract.md` § Streaming sub-contract](durability-contract.md).
+> This section is the design detail; the contract is authoritative.
+
+For every `stream=true` request with `store=true`:
+
+### §9.1 — Persistence ordering
+
+The framework MUST persist each SSE event to the stream event store
+in the order the handler emits it, and MUST assign a strictly
+monotonic `sequence_number` per event within a single
+`response_id`'s log. The framework MUST NOT deduplicate events across
+recovery attempts: if the handler emits `output_item.added(idx=0)`
+twice (once in the pre-crash attempt, once in the recovered attempt),
+both events are persisted, both have distinct sequence numbers, both
+are delivered to reconnecting clients.
+
+On a recovered entry the framework MUST seed the next sequence number
+from the durable stream event store's cursor — `next_seq = last_cursor() + 1`
+(or `0` when the log is empty) — so the recovered attempt's events
+carry sequence numbers strictly succeeding the pre-crash events. The
+stream-store cursor is the single source of truth for "how far the
+stream got"; the framework MUST NOT maintain a parallel
+`last_sequence_number` watermark in task metadata (which could diverge
+from the events actually persisted).
+
+### §9.2 — Reconnection (`starting_after=`)
+
+`GET /responses/{id}?stream=true&starting_after=N` returns only events
+with `sequence_number > N`. The reconnection is transparent — clients
+do not need an out-of-band signal that "this is a recovered stream";
+the reset event in the stream is sufficient (§9.3).
+
+### §9.3 — The reset-on-`in_progress` rule
+
+Clients MUST treat the **second or later** `response.in_progress`
+event in a stream as a snapshot reset:
+
+> Replace the local `response.output` with the event's `response.output`.
+> Discard any partial in-flight item content accumulated since the
+> previous snapshot. Treat subsequent events as additive on top of the
+> new snapshot.
+
+This rule applies whether the client is reading the live SSE feed or
+replaying via `starting_after=`.
+
+The framework's persisted-response-state machine MUST observe the
+same rule: a second-or-later `response.in_progress` REPLACES the
+persisted response's `output` array; subsequent `output_item.added`
+at indexes already present REPLACES the slot rather than appends.
+
+### §9.4 — Idempotent `response.created` and terminal
+
+The framework MUST tolerate a duplicate `response.created` event from
+a recovery-aware handler that emits it idempotently; only the first
+is authoritative for response-store persistence, subsequent ones are
+no-ops at the persistence layer (but ARE persisted to the event
+stream — see §9.1).
+
+The framework MUST be idempotent against duplicate terminal events. A
+second `response.completed` (or `response.failed`) after one has
+already been persisted to the response store is a no-op at the
+persistence layer.
+
+The response store MUST raise `ResponseAlreadyExistsError` from
+`create_response()` when called for a `response_id` that already has
+a non-deleted entry. Callers MUST swallow this error on recovery
+attempts (log at INFO, treat as already-persisted, proceed to the
+terminal `update_response()` path).
+
+### §9.5 — Output index re-use
+
+After a snapshot reset, the handler MAY re-use `output_index` values
+that appeared before the reset. The framework MUST allow this. Clients
+MUST treat `output_index` as a slot identifier (not a monotonic
+counter):
+
+- `output_item.added` at an index already present in the snapshot →
+  REPLACE the slot.
+- `output_item.added` at a new index → APPEND a slot.
+- Subsequent `output_item.delta` / `output_item.done` apply to the
+  slot identified by `output_index`.
+
+### §9.6 — `ResponseEventStream` seeding
+
+`ResponseEventStream(response=resumption_response)` MUST seed the
+stream's internal `_output_index` counter past the highest index
+present in `resumption_response.output`, so the next
+`add_output_item_*` allocates a non-colliding index by default. The
+handler MAY still re-use prior indexes deliberately.
+
+### §9.7 — Recovery `response.in_progress` is the reset point
+
+In the recovery model, the handler's emitted `response.in_progress`
+carrying the resumption response IS the client-visible reset point.
+The framework MUST NOT synthesise a reset event of its own; the
+client-side reset rule (§9.3) is the only mechanism. If a naive
+handler emits `response.in_progress` with empty `output`, that empty
+payload IS the reset to "nothing was persisted last time"; clients
+process it identically.
+
+---
+
+## §10 — Cancellation
+
+A handler running inside the durable task body observes cancellation
+via two **distinct** surfaces and a cause-flag boolean:
+
+- **`cancellation_signal`** (3rd positional handler arg,
+  `asyncio.Event`) — set when the request itself is being cancelled
+  (`POST /v1/responses/{id}/cancel`, non-bg POST disconnect, or
+  steering pressure). This is the wake-up signal handlers await /
+  poll on inside their work loop.
+- **`context.shutdown: Event`** — set when the server is shutting
+  down (e.g. SIGTERM). This is a **separate** surface — shutdown
+  does NOT fire the cancellation signal. Handler expectations differ:
+  shutdown demands `await context.exit_for_recovery()` (durable+bg)
+  or a quick failed/incomplete terminal (others), while cancellation
+  demands a graceful finish or status-aware terminal. Handlers that
+  care about both surfaces MUST inspect each independently.
+- **`context.client_cancelled: Bool`** — cause flag stamped at the
+  HTTP boundary when the cancellation cause was explicit client
+  cancellation (the `/cancel` endpoint OR a non-bg POST disconnect).
+  When `cancellation_signal` fires but `client_cancelled` is False
+  and `context.shutdown` is not set, the cause is steering pressure.
+
+Cause matrix:
+
+| Trigger | `cancellation_signal` (3rd positional handler arg) | `context.shutdown` | `context.client_cancelled` |
+|---|---|---|---|
+| Steering (new turn queued) | set | not set | False |
+| Client `POST /responses/{id}/cancel` | set | not set | True |
+| Non-bg POST disconnect | set | not set | True |
+| Graceful shutdown (`SIGTERM`) | not set | set | False |
+| Race: client cancel + concurrent shutdown | set | set | True |
+| No cancellation has occurred | not set | not set | False |
+
+**Recovery exit primitive.** Handlers request the graceful-shutdown
+re-entry path explicitly with a single uniform call:
+
+```
+await context.exit_for_recovery()
+```
+
+It **raises** `ResponseExitForRecovery` internally (it never returns), so
+the same line works in every handler shape — coroutine, async generator,
+or sync. The framework catches the signal at the durable task boundary and
+leaves the response `in_progress` so the next-lifetime recovery scanner can
+resume it. For `durable_background=True` responses (Row 1) the handler is
+re-invoked on the next process startup. For `store=false` / non-durable
+requests there is no task to defer, so the call raises `RuntimeError`
+(surfacing as a `failed` response — the documented non-durable shutdown
+disposition). `ResponseExitForRecovery` subclasses `BaseException` (not
+`Exception`), so a handler's broad `except Exception` cannot swallow the
+recovery signal; `try/finally` cleanup still runs.
+
+The cancellation contract for the handler:
+
+- **Default pattern** (most handlers) — observe BOTH surfaces in the
+  work loop. On `cancellation_signal.is_set()`, break and emit
+  `response.completed` with the current partial output (the framework
+  overrides this to `cancelled` when `context.client_cancelled` is
+  True). On `context.shutdown.is_set()`, call
+  `await context.exit_for_recovery()` (durable+bg Row 1) or emit a quick
+  terminal (others). For steering pressure (cancel set but no cause
+  flag), the handler's `completed` terminal is correct — the
+  steered-out turn really did complete with whatever output it
+  managed to emit before the steer.
+- **Hard rule** — every async-generator handler MUST emit
+  `response.created` before any early return; framework forces
+  `failed` if it does not. Every handler MUST emit a terminal event
+  (`completed`, `incomplete`, `failed`) or the framework forces
+  `failed`. To defer to recovery without a terminal, call
+  `await context.exit_for_recovery()` — because it raises rather than
+  returns a value, it works uniformly in async-generator and coroutine
+  handlers alike (no `return <value>` generator-syntax constraint).
+- **No `cancelled` from steering or shutdown** — the handler MUST
+  NOT emit `response.cancelled` for steering pressure or shutdown;
+  that terminal is reserved for `context.client_cancelled=True`.
+- **Cooperation model** — steering pressure and client cancel wait
+  indefinitely for the handler to honour the signal. Shutdown has a
+  bounded grace window; if the handler does not return within the
+  window, the framework moves to Path B / Path C handling.
+
+### §10.1 — Cancellation × recovery composition
+
+Recovery composes with cancellation as follows:
+
+| Pre-crash trigger | Recovery behaviour |
+|---|---|
+| Steering pressure (during recovery) | Recovered entry sees `cancellation_signal.is_set()` with no cause flag. Handler honours the signal as in the fresh case. |
+| Client cancel (during recovery) | Recovered entry sees `cancellation_signal.is_set()` and `context.client_cancelled=True`. Handler honours the signal; framework finalises with `cancelled` terminal. |
+| Shutdown (during recovery) | If `context.shutdown.is_set()`, the handler calls `await context.exit_for_recovery()` (or returns without a terminal — the implicit fallback); the framework leaves the task `in_progress` for the next lifetime. |
+
+The cancellation surface is unchanged across fresh and recovered
+entries — handlers do not need a separate branch for "I'm in
+recovery AND cancelled".
+
+---
+
+## §11 — Steering
+
+`steerable_conversations=True` enables multi-turn steering on top of
+Rows 1, 2, or 3 (i.e. any `store=true` row). With steering enabled:
+
+- Every turn in a conversation chain shares the same durable `task_id`
+  (the chain partitioning rule in §4.2 collapses them).
+- A new turn submitted while a prior turn's handler is still running
+  is **queued** into the underlying task primitive's steering queue.
+  The queued turn's HTTP caller synchronously receives a queued
+  response (status `"queued"`) produced by the acceptance hook
+  (§11.3).
+- When the queued turn moves to the front of the queue, the
+  framework signals the running handler via ``cancellation_signal` (3rd positional handler arg) Event`
+  with `steering pressure (cancellation_signal set, no cause flag)`. Once the running handler
+  reaches terminal, the framework drains the queue and the queued
+  turn's handler is invoked with `is_steered_turn=True`.
+
+### §11.1 — `steerable_conversations=False` semantics
+
+For `store=true` Rows 1/2/3 with `steerable_conversations=False`:
+
+- Each turn that uses `previous_response_id` (without
+  `conversation_id`) maps to its own `task_id` (the `fork:` partition;
+  §4.2). This makes parallel forks possible (sequential turns also
+  work — each turn is just its own one-shot task).
+- Each turn that uses `conversation_id` maps to a SHARED `task_id`
+  (the `conv:` partition) regardless of `steerable_conversations`.
+  The chain transitions to `suspended` between turns, so sequential
+  turns successfully extend the chain. Only **concurrent overlap**
+  (a new turn arriving while a prior turn's handler is still
+  `in_progress`) raises `TaskConflictError`; the framework MUST
+  translate this to HTTP 409:
+
+  ```json
+  {
+    "error": {
+      "message": "Conversation is locked — task is in_progress",
+      "type": "conflict",
+      "code": "conversation_locked",
+      "param": null
+    }
+  }
+  ```
+
+  Clarifier: _in progress_ here means the underlying task is
+  `status="in_progress"` (a handler is actively executing). A
+  `suspended` chain between turns of a `conversation_id` +
+  `steerable_conversations=False` deployment is NOT locked — sequential
+  turns extend the chain. Only overlapping turns conflict.
+
+  (Implementation note: `TaskConflictError` carries only
+  `current_status` on this implementation's narrow surface — the
+  human-readable status is included in the error body to give the
+  client a clue about why the conflict fired.)
+
+### §11.2 — Fork rejection (no branching of a steerable chain)
+
+When `steerable_conversations=true`, each turn after the first MUST
+reference the immediately-prior turn's `response_id` via
+`previous_response_id`. The framework enforces this via the
+underlying task primitive's **input-precondition primitive**:
+
+- The responses layer passes `input_id=response_id` and
+  `if_last_input_id=previous_response_id` to `start()`.
+- The primitive stores `last_input_id` in a framework-reserved
+  payload namespace (typically `_framework.last_input_id`) and
+  rejects a `start()` whose `if_last_input_id` does not match the
+  stored value.
+- On rejection, the primitive raises `LastInputIdPreconditionFailed`
+  (a typed subclass of `TaskPreconditionFailed`).
+
+The framework MUST translate `LastInputIdPreconditionFailed` to HTTP
+409 with body:
+
+```json
+{
+  "error": {
+    "message": "This agent does not support conversation forking. previous_response_id must reference the most recent response in the conversation.",
+    "type": "conflict",
+    "code": "conversation_fork_not_supported",
+    "param": "previous_response_id"
+  }
+}
+```
+
+This covers both stale-predecessor cases ("you sent a `previous_response_id`
+that refers to a turn other than the most recent one") and concurrent
+races (two POSTs arrive together with the same `previous_response_id`
+— exactly one wins by atomic precondition CAS; the other gets the
+409). There is no soft path through.
+
+### §11.3 — Acceptance hook
+
+When a new turn arrives for an already-active steerable task, the
+running handler cannot produce the response object for the queued
+turn (it is busy with the prior turn). The acceptance hook fills
+that gap: it runs synchronously during HTTP request handling and
+produces the initial response object the HTTP caller sees.
+
+| Property | Rule |
+|---|---|
+| **When invoked** | ONLY for steered turns (turn N where N ≥ 2 and the handler for turn N-1 is still running). NEVER for first-turn requests. |
+| **Synchronous** | Runs in the request handler; MUST NOT make LLM calls or perform heavy I/O. |
+| **Registration** | Via `@app.response_acceptor` decorator (or equivalent registration API). Optional. |
+| **Default** | If unregistered or raises, framework returns a default queued response: `{ "id": <response_id>, "object": "response", "status": "queued", "model": <model>, "output": [] }`. |
+| **Override status** | If the hook returns a dict without `status`, framework sets `status="queued"`. |
+| **First turn** | The acceptance hook is NEVER invoked for the first turn of a chain (no prior handler is running). The first turn's `response.created` comes from the handler itself. |
+
+### §11.4 — Steering queue semantics
+
+The framework MUST guarantee:
+
+- **Sequential delivery within a chain** — for `steerable_conversations=true`,
+  queued turns drain in FIFO order; no two handlers for the same
+  chain ever execute concurrently.
+- **`is_steered_turn=True` for queued turns** — the second-and-later
+  turns of a chain (any turn invoked by drain rather than by initial
+  start) MUST observe `context.is_steered_turn == True`.
+- **`pending_input_count` is post-this** — the count of inputs queued
+  *after* the currently-being-invoked one. A handler observing
+  `pending_input_count == 0` is the most recent queued turn.
+
+### §11.5 — Steering × recovery
+
+If the process crashes mid-steering-drain, the recovered entry is
+given the mid-drain input as its `context.input` (or equivalent —
+the primitive's race-recovery contract supplies the in-flight input).
+Handler honours it as a normal turn invocation. The cancellation
+signal is set with `steering pressure (cancellation_signal set, no cause flag)` if the prior turn's
+handler was already cancelled at crash time.
+
+---
+
+## §12 — The acceptance flow (worked sequence)
+
+The two-phase steerable-conversation accept flow:
+
+```
+       (turn 1, fresh)
+HTTP   ──► POST /v1/responses { input: "...", store, background } ────────┐
+                                                                          │
+       framework: derive_task_id → "durable-resp-AB12..."                 │
+       framework: task_fn.start(task_id, input=params,                    │
+                                input_id=resp_1,                          │
+                                if_last_input_id=None)                    │
+       framework: task body schedules; handler invoked                    │
+       handler:   emit response.created (response_id=resp_1)              │
+       framework: persist response envelope → response store              │
+                                                                          │
+       HTTP    ◄── 200 { id: resp_1, status: in_progress, ... } ──────────┘
+                                                                          
+       (turn 2 arrives while turn 1's handler is still running)
+HTTP   ──► POST /v1/responses { input: "...", previous_response_id: resp_1 } ──┐
+                                                                                │
+       framework: derive_task_id → SAME "durable-resp-AB12..." (chain)         │
+       framework: task_fn.start(task_id, input=params2,                        │
+                                input_id=resp_2,                               │
+                                if_last_input_id=resp_1)                       │
+       primitive: task already in_progress → queue input                       │
+       primitive: precondition holds → advance last_input_id to resp_2         │
+       primitive: signal turn-1 handler's ctx.cancel (steering)                │
+       framework: acceptance_hook(parsed, context) → queued envelope           │
+                                                                                │
+       HTTP    ◄── 200 { id: resp_2, status: queued, ... } ────────────────────┘
+                                                                          
+       (turn 1's handler honours the steer, emits terminal, returns)
+       framework: persist terminal for resp_1
+       primitive: drain queue → invoke handler again for resp_2
+                  with is_steered_turn=True
+       handler:   emit response.created (response_id=resp_2)
+       framework: persist response envelope → response store
+       ...
+```
+
+If a third POST arrives with `previous_response_id=resp_1` (the now-stale
+prior head), the precondition fails and the third caller receives 409
+`conversation_fork_not_supported`.
+
+If `steerable_conversations=False` instead, the second POST receives
+409 `conversation_locked` (turn 1's task is in_progress; turn 2 cannot
+extend a non-steerable chain).
+
+---
+
+## §13 — The recovery flow (worked sequence)
+
+### §13.1 — Row 1 (`durable_background=True`) × `stream=True`, crash before terminal
+
+```
+       (turn 1, fresh)
+HTTP   ──► POST /v1/responses { stream: true, store, background } ────────┐
+                                                                          │
+       framework: task_fn.start(task_id, input=params)                    │
+       framework: stamp _responses.disposition="re-invoke" in metadata    │
+                  (durably flushed before any await)                      │
+       framework: schedule task body; handler invoked                     │
+       handler:   emit response.created (seq=1)                           │
+       framework: persist response envelope → response store              │
+       handler:   emit response.in_progress (seq=2)                       │
+       framework: ...stream events... emit output_item.added(idx=0) (seq=3)│
+       framework: emit output_item.delta(idx=0, "Hel") (seq=4)            │
+                                                                          │
+       HTTP    ◄── live SSE events ────────────────────────────────────────┘
+       
+       ════════════ SIGKILL ════════════
+       
+       (next lifetime — recovery scanner re-fires task)
+       primitive: task lease expired → re-fire task body
+       framework: task body entered with context.is_recovery=True
+       framework: read _responses.disposition → "re-invoke"
+       framework: assign flat fields on response context (is_recovery=True, is_steered_turn=False, pending_input_count=0, conversation_chain_metadata=<rehydrated>)
+       framework: reconstruct ResponseExecution, ResponseContext from serialized params
+       framework: re-invoke handler with flat-field assignment on context
+       handler:   is_recovery == True
+       handler:   query upstream framework for resumption state
+       handler:   build resumption_response = ResponseObject(output=[...committed_items])
+       handler:   construct ResponseEventStream(response=resumption_response)
+       handler:   emit response.created  (seq=N, framework swallows duplicate persist)
+       handler:   emit response.in_progress(response=resumption_response)
+                  (seq=N+1, CLIENT-VISIBLE RESET POINT)
+       handler:   resume from upstream-resumption-point; emit further deltas / items
+       handler:   emit response.completed (seq=N+k)
+       framework: persist terminal → response store
+                                                                          
+       (client reconnects after recovery)
+HTTP   ──► GET /v1/responses/resp_1?stream=true&starting_after=4 ─────────┐
+       framework: stream event store returns seq=5, 6, 7, ..., N, N+1, ...│
+       HTTP    ◄── SSE events 5..N+k                                       │
+       client:   observes second response.in_progress at seq=N+1           │
+       client:   REPLACES local response.output with the event's payload   │
+       client:   processes subsequent events on top of the new snapshot    │
+                                                                          ─┘
+```
+
+### §13.2 — Row 2 (`durable_background=False`, bg+store), crash before terminal
+
+```
+       (turn 1, fresh)
+HTTP   ──► POST /v1/responses { stream: false, store, background } ───────┐
+                                                                          │
+       framework: start durable task with disposition="mark-failed"        │
+       framework: task body invokes handler (handler runs INSIDE the body) │
+       handler:   emit response.created                                    │
+       framework: persist response envelope                                │
+                                                                          │
+       HTTP    ◄── 200 { id: resp_1, status: in_progress, ... }            │
+       
+       ════════════ SIGKILL ════════════
+       
+       (next lifetime — recovery scanner re-fires the task)
+       primitive: task lease expired → re-fire task body
+       framework: task body entered with context.is_recovery=True
+       framework: read _responses.disposition → "mark-failed"
+       framework: lookup response in store: status="in_progress"
+       framework: persist failed terminal:
+                  { status: "failed",
+                    error: { code: "server_error",
+                             additionalInfo: { shutdown_reason: "crash_recovery" }}}
+       framework: task body returns → task → completed
+       
+       (client polls)
+HTTP   ──► GET /v1/responses/resp_1 ──────────────────────────────────────┐
+       framework: return persisted failed envelope                        │
+                                                                          ─┘
+```
+
+### §13.3 — Row 4 (no store), crash mid-handler
+
+No recovery. The handler dies with the process. Any HTTP caller still
+holding the connection sees a closed socket. No persisted envelope, no
+recovery scanner action.
+
+---
+
+## §14 — Conformance items
+
+Each conformance item is a normative behaviour that an implementation
+MUST exhibit. The label is for cross-reference from tests and other
+specs.
+
+### C-MATRIX — Dispatch matrix
+
+For every `POST /v1/responses`, the implementation MUST select exactly
+one of the four rows in §3 based on `(store, background, durable_background)`,
+and MUST deliver each of Termination Paths A, B, C as documented in
+§3.1.
+
+### C-CHAIN — Chain identity
+
+The chain id MUST be derived per §4.1. `task_id` MUST be derived per
+§4.2 (deterministic; partition-key-prefixed; agent+session salted;
+SHA-256 truncated). `context.conversation_chain_id` MUST expose the
+chain id to handlers per §4.3.
+
+### C-NS — Reserved namespace
+
+The handler-facing metadata API MUST reject keys and namespace names
+starting with `_` per §5. The framework's `_responses` namespace MUST
+hold at least `response_id`, `background`, and `disposition` per §5.1.
+The `disposition` write at first
+entry MUST be durably flushed before any subsequent interruptible
+await per §5.2.
+
+### C-PERPETUAL — Perpetual task
+
+For Row 1 with `steerable_conversations=true`, the durable task body
+MUST signal implicit-suspend (in this implementation: `return None`
+from a `@multi_turn_task`-decorated body) after the handler's terminal,
+keeping the task alive for subsequent turns per §6.1. For Rows 2/3,
+the task body invokes the handler directly; on graceful shutdown
+without explicit `exit_for_recovery`, the body persists the
+`shutdown_reason=grace_exhausted` failed terminal before returning.
+
+### C-DISPOSITION — Recovery dispatch
+
+On recovered entry, the task body MUST read `_responses.disposition`
+and route per §7. For `re-invoke`, the handler is re-invoked with
+`is_recovery=True`. For `mark-failed`, the handler is NOT re-invoked;
+a `server_error` terminal is persisted unless the response is
+already terminal (§7.2 idempotency check).
+
+### C-SERVER-ERROR — `server_error` payload
+
+Every framework-emitted shutdown/crash marker MUST conform to the
+shape in §7.3 — `type=code="server_error"`, structured
+`additionalInfo.shutdown_reason`, `output=[]`.
+
+### C-DURABILITY-CTX — Flat recovery + steering surface on `context`
+
+The handler MUST observe the flat recovery + steering fields on the
+response context: `is_recovery: bool`, `is_steered_turn: bool`,
+`pending_input_count: int`, `conversation_chain_metadata: ConversationChainMetadataNamespace`
+(see §8). `conversation_chain_metadata.flush()` MUST act as a durable-write
+fence; the framework MUST also auto-flush at lifecycle boundaries
+(§8.1). Handler keys/namespaces starting with `_` MUST raise
+`ValueError`.
+
+### C-RECOVERY-MODEL — Three-actor recovery contract
+
+The framework MUST re-invoke the handler with `is_recovery=True` per
+§8.2 (no dedup of handler-emitted SSE events; persist the envelope
+exactly-once at start and at terminal). The handler-side contract is
+specified in §8.2 / §8.3 — a naive handler MUST still produce a
+correct response (the framework MUST accept duplicate
+`response.created` and duplicate terminals, treat second-or-later
+`response.in_progress` as a reset, and tolerate output-index re-use).
+
+### C-STREAM-ORDER — Stream persistence
+
+The framework MUST persist every SSE event in emission order, MUST
+assign strictly monotonic `sequence_number` per `response_id`, MUST
+NOT deduplicate events across recovery attempts (§9.1).
+
+### C-RECONNECT — `starting_after=`
+
+`GET /responses/{id}?stream=true&starting_after=N` MUST return only
+events with `sequence_number > N`. The reconnection MUST work
+identically for fresh, recovered, and multiply-recovered streams
+(§9.2).
+
+### C-RESET — Reset on `response.in_progress`
+
+Clients MUST treat any second-or-later `response.in_progress` as a
+snapshot reset per §9.3. The framework's persisted-state machine MUST
+observe the same rule when applying events to the persisted response.
+
+### C-IDEMPOTENT — Idempotent `create` and terminal
+
+`create_response()` MUST raise `ResponseAlreadyExistsError` for an
+existing non-deleted entry per §9.4. The framework MUST swallow this
+on recovery (log INFO; proceed to `update_response()`). Duplicate
+terminal events MUST be idempotent at the persistence layer.
+
+### C-INDEX-REUSE — `output_index` slot semantics
+
+After a snapshot reset, the handler MAY re-use `output_index` values;
+the framework MUST allow it and treat re-used indexes as slot
+replacement per §9.5. `ResponseEventStream(response=...)` MUST seed
+its internal counter past the highest pre-existing index per §9.6.
+
+### C-CANCEL — Cancellation surface
+
+`cancellation_signal` (3rd positional handler arg) and `context cancellation cause (composing — see §10)` MUST
+be populated per §10. The cancellation policy (no `cancelled` from
+steering or shutdown; framework forces `failed` for missing terminal;
+cooperation model) MUST be enforced per §10.
+
+### C-CANCEL-RECOVERY — Cancel × recovery composition
+
+Pre-crash cancellation triggers MUST be re-surfaced on recovered
+entry per §10.1. A recovered handler that returns without emitting
+terminal under `SHUTTING_DOWN` MUST cause the framework to raise
+`CancelledError` so the task stays `in_progress` for the next
+lifetime.
+
+### C-LOCK — Conversation lock
+
+For `store=true` with `steerable_conversations=false`, a new turn
+arriving while a prior turn for the same chain is in progress MUST
+return HTTP 409 `conversation_locked` per §11.1.
+
+### C-FORK-REJECT — No forking of steerable chains
+
+For `steerable_conversations=true`, a turn whose
+`previous_response_id` does not match the chain's `last_input_id`
+MUST return HTTP 409 `conversation_fork_not_supported` per §11.2.
+Concurrent same-`previous_response_id` POSTs MUST resolve so that
+exactly one wins; the others get the 409.
+
+### C-ACCEPT — Acceptance hook
+
+The acceptance hook MUST run only for steered turns (not first
+turns), synchronously during request handling, and MUST produce the
+HTTP-visible queued response envelope per §11.3. If the hook is
+unregistered or raises, the framework MUST emit the default queued
+envelope.
+
+### C-STEER-DELIVERY — Steering delivery order
+
+For `steerable_conversations=true`, queued turns MUST drain in FIFO
+order, with no concurrent handler executions for the same chain
+(§11.4). Drained turns MUST observe `is_steered_turn=True`.
+`pending_input_count` MUST count post-this queued turns.
+
+### C-COMPOSE — Composition guards
+
+`durable_background=true` requires `store=true` to engage row 1; if
+`store=false`, the request falls through to row 4 regardless of
+`durable_background`. `steerable_conversations=true` requires
+`store=true` for the steering queue and acceptance hook to function;
+implementations MUST reject the combination at startup or fall
+through to non-store behaviour per their stability policy.
+
+---
+
+## §15 — Worked storage timeline (worked example)
+
+A `(store=true, background=true, durable_background=true, stream=true,
+steerable_conversations=true)` chain with two turns and a crash
+between them. Numbers are illustrative.
+
+```
+T=0   POST /v1/responses { input: "Hi", store: true, background: true }
+      → derive_task_id = "durable-resp-AB12..."
+      → derive_chain_id = (input was conv_id-less + prev_id-less) → resp_1
+
+T=1   primitive: task_store.create({
+        id: "durable-resp-AB12...",
+        status: "in_progress",
+        payload: { input: <serialized>, _responses: {} },
+        ...
+      })
+
+T=2   task body entered (fresh)
+      primitive: _framework.last_input_id = resp_1 (precondition stamp)
+      framework: _responses.disposition = "re-invoke", FLUSH
+      framework: _responses.response_id = resp_1
+      framework: _responses.background = true
+      handler:   emit response.created
+      framework: response_store.create({
+                   id: resp_1, status: "in_progress", ...
+                 })
+      framework: stream_store.append(seq=1, event=response.created)
+
+T=3   handler:   emit response.in_progress (seq=2)
+      handler:   emit output_item.added(idx=0)
+      framework: stream_store.append(seq=3, ...)
+      handler:   emit output_item.delta(idx=0, "Hel")
+      framework: stream_store.append(seq=4, ...)
+
+T=4   ═══════ SIGKILL ═══════
+      
+T=5   process restarts; lease scanner sees "durable-resp-AB12..."
+      with status="in_progress" and expired lease
+
+T=6   primitive: re-fire task body with ctx.context.is_recovery=True
+      framework: read _responses.disposition → "re-invoke"
+      framework: assign flat fields on response context
+                 (is_recovery=True,
+                  is_steered_turn=False,
+                  pending_input_count=0,
+                  conversation_chain_metadata=<rehydrated namespace facade>)
+      framework: reconstruct (ResponseExecution, ResponseContext)
+                 from serialized params
+      framework: re-invoke handler
+
+T=7   handler:   is_recovery == True
+      handler:   query upstream framework for committed state
+      handler:   build resumption_response (e.g., output=[] for naive
+                 handler; or output=[committed_items] for recovery-aware)
+      handler:   stream = ResponseEventStream(response=resumption_response)
+      handler:   emit response.created
+      framework: response_store.create({...}) → ResponseAlreadyExistsError
+      framework: log INFO "_persist_create dedup'd on recovery"; continue
+      framework: response.created GATED — the durable stream is non-empty
+                 (seq 1-4 survived the crash), so the provider append is
+                 SUPPRESSED (spec 026 empty-stream gate). seq=5 is consumed
+                 but never stream-visible; the recovered handler's
+                 response.in_progress (next) is its first stream event.
+
+T=8   handler:   emit response.in_progress (carries resumption_response)
+      framework: stream_store.append(seq=6, event=response.in_progress)
+                 NOTE: this is the second response.in_progress → reset event
+      framework: persisted-response logic: REPLACE response.output with
+                 resumption_response.output
+
+T=9   handler:   emit output_item.added(idx=0, content=<new attempt>)
+      framework: stream_store.append(seq=7, ...)
+      framework: persisted: REPLACE output[0] (idx already present after reset)
+      ...
+      handler:   emit response.completed (seq=K)
+      framework: response_store.update({id: resp_1, status: "completed", ...})
+      framework: stream_store.append(seq=K, event=response.completed)
+
+T=10  task body returns Suspended (steerable_conversations=true)
+      primitive: task → status="suspended", awaiting next input
+
+T=11  POST /v1/responses { input: "Now this", previous_response_id: resp_1,
+                           store: true, background: true }
+      → derive_task_id = SAME "durable-resp-AB12..." (chain inherits)
+      framework: task_fn.start(task_id, input_id=resp_2,
+                               if_last_input_id=resp_1)
+      primitive: precondition holds (_framework.last_input_id == resp_1)
+      primitive: advance _framework.last_input_id = resp_2
+      primitive: task resumes (status: suspended → in_progress)
+      ...turn 2 proceeds...
+```
+
+### §15.1 — Concurrent fork-attempt timeline
+
+```
+T=11a POST /v1/responses { previous_response_id: resp_1, ... }
+T=11b POST /v1/responses { previous_response_id: resp_1, ... }   (concurrent)
+      
+      primitive: both call start(input_id=resp_2/resp_3, if_last_input_id=resp_1)
+      primitive: atomic precondition CAS on _framework.last_input_id
+      primitive: exactly one wins (say T=11a), advances last_input_id=resp_2
+      primitive: T=11b sees stale last_input_id → LastInputIdPreconditionFailed
+      framework: T=11a → 200 (queued or in_progress)
+      framework: T=11b → 409 conversation_fork_not_supported
+```
+
+---
+
+## §16 — Storage layout
+
+The framework engages three logical stores:
+
+### §16.1 — Durable task store
+
+Owned by the underlying task primitive. Holds:
+
+- `task_id` (the §4.2 derivation)
+- `status` (one of `queued`, `in_progress`, `suspended`, `completed`,
+  `cancelled`, `failed`)
+- `payload.input` (current turn's serialized input — cleared at
+  suspend per the core spec's data-retention rule)
+- `payload._responses` (the framework-reserved namespace from §5)
+- `payload._steering` (the primitive's steering-queue state — owned by
+  the core spec)
+- `payload._framework.last_input_id` (the input-precondition primitive's
+  CAS slot from §11.2)
+- `metadata` (developer's checkpoint store, in named namespaces)
+- Lease state (owned by the primitive)
+
+### §16.2 — Response store
+
+Holds the `ResponseObject` envelope per `response_id`. Operations:
+
+| Operation | Semantics |
+|---|---|
+| `create_response` | Idempotent at the conformance layer (§9.4). Raises `ResponseAlreadyExistsError` on conflict; callers swallow on recovery. |
+| `update_response` | Updates the envelope in place. Raises `KeyError` if not present (caller falls back to `create_response` for race recovery). |
+| `get_response` | Returns the envelope. |
+| `delete_response` | Soft-delete. |
+
+Local-dev implementations (`FileResponseStore`) MUST persist envelopes
+to disk atomically (write to tempfile + `os.replace()`). Production
+implementations (Foundry) MUST translate the HTTP 409 from
+double-`POST` into `ResponseAlreadyExistsError`.
+
+#### §16.2.1 — `FileResponseStore` on-disk layout (local dev, informative)
+
+The response-store **contract** above (operations + atomic envelope
+commit) is normative. The physical file layout below is specific to the
+local-dev `FileResponseStore` and is **not** binding on other
+implementations (Foundry uses its own storage); it is documented here
+because the file provider is part of the responses durability workstream.
+
+Under the store root, each item is persisted **exactly once**; the
+response envelope and conversations hold only pointers:
+
+```
+responses/
+    {response_id}.json        # envelope. output[] entries are pointer
+                              #   stubs {"$item_ref": <item_id>} for id'd
+                              #   items; id-less items stay inline.
+    {response_id}.indexes.json # ordered {input,output,history}_item_ids —
+                              #   the single place history_item_ids is read.
+    {response_id}.deleted     # soft-delete marker
+items/
+    {item_id}.json            # THE one copy of each item's content
+conversations/
+    {conversation_id}.json    # {response_ids: [...]}
+```
+
+- `get_items` / `get_input_items` / `get_history_item_ids` resolve content
+  and id lists from `items/` + `indexes.json`; `get_response` rehydrates
+  the envelope's pointer stubs from `items/`, returning a `ResponseObject`
+  whose `output[]` is byte-equal (content and order) to the in-memory
+  provider.
+- **Crash ordering.** Writers store every referenced item under `items/`
+  **before** the atomic envelope write. Items are immutable by id (re-stores
+  are idempotent same-content), so a crash exposes either the prior or the
+  new snapshot — **never** an envelope referencing a missing or
+  mid-mutated item. An unresolvable pointer on read is treated as transient
+  corruption (a non-`KeyError` storage error), **not** as the "definitively
+  absent" not-found that triggers the §7 recovery drop.
+- There is no per-response item directory and no separate `history.json`
+  (both were redundant copies of data already in `items/` / `indexes.json`).
+
+### §16.3 — Stream event store
+
+Holds the ordered SSE event log per `response_id`. Operations:
+
+| Operation | Semantics |
+|---|---|
+| `append(event)` | Append with strictly monotonic `sequence_number`. No dedup across recovery attempts. |
+| `read(starting_after=N)` | Return events with `sequence_number > N`. |
+| `read(starting_after=None)` | Return the full log. |
+
+Local-dev implementations (`FileStreamProvider`) MUST persist events
+to disk in the order they are appended. Production implementations
+MUST give the same ordering guarantee. TTL-based replay cleanup
+(framework-internal, defaults to at least 10 minutes per Rule B35)
+is allowed.
+
+A reset event (§9.3) is a `response.in_progress` event with
+`sequence_number > N` where N is the previous `response.in_progress`
+event's `sequence_number` for the same `response_id`.
+
+---
+
+## §17 — Composition constraints
+
+### §17.1 — `durable_background=true` requires `store=true`
+
+If `store=false`, the request falls through to Row 4 regardless of
+`durable_background`. There is no persistent record to recover from;
+the durable orchestrator is bypassed. The implementation MUST NOT
+silently fail; the row-4 best-effort marker fires per §6.3.
+
+### §17.2 — `steerable_conversations=true` requires `store=true`
+
+The steering queue, the conversation lock, and the acceptance hook
+ALL depend on the durable task primitive. With `store=false`, no
+durable task is created; there is no queue to enqueue into; the
+acceptance hook is not invoked. Implementations MUST either reject the
+combination at startup or document the no-op fall-through clearly.
+
+### §17.3 — `steerable_conversations=true` × `durable_background=false`
+
+This combination is supported (composition guard relaxed in). The Row 2 task still provides the conversation lock and the
+acceptance hook; the handler runs inside the task body just like
+Row 1. The only difference from Row 1 is the recovery disposition —
+`mark-failed` instead of `re-invoke`. The crash-recovery branch
+persists `failed` per §7.2 instead of re-invoking the handler.
+
+### §17.4 — `background=false` + steerable
+
+This is Row 3. The handler runs inside the durable task body; the
+HTTP request awaits the task body's terminal via the framework's
+`TaskRun.result()` API. A new turn arriving mid-handler still goes
+through the queue / lock / acceptance hook per §11. (Note:
+`background=false` + steering means the original HTTP caller's
+connection is open while the handler runs to completion; a steered
+turn arriving from a different client connection gets queued.)
+
+---
+
+## §18 — What this spec does NOT cover
+
+- The underlying durable-task primitive's own contract (lease,
+  heartbeat, suspend/resume, steering queue, retry semantics,
+  recovery scanner): see
+  `azure-ai-agentserver-core/docs/task-and-streaming-spec.md`.
+- Multi-replica / cross-region recovery. Single-node-restart only.
+- Wire-format additions to the OpenAI Responses HTTP/SSE protocol.
+  This spec adds new HTTP error codes (`conversation_locked`,
+  `conversation_fork_not_supported`) and the recovery-time
+  `response.in_progress` reset semantics; everything else uses
+  existing OpenAI Responses event shapes.
+- Schema migrations for `metadata` shapes across SDK upgrades.
+- The OpenAI Responses input-conversion / output-rendering pipeline
+  itself.
+
+---
+
+## §19 — Cross-references
+
+| External | Topic |
+|---|---|
+| `azure-ai-agentserver-core/docs/task-and-streaming-spec.md` | Underlying durable-task primitive (lease, suspend, recovery scanner, steering queue, input-precondition primitive, streaming reconciliation). |
+| `azure-ai-agentserver-responses/docs/durable-responses-developer-guide.md` | Developer-facing guide; configuration, public API surface, common patterns. |
+| `azure-ai-agentserver-responses/docs/handler-implementation-guide.md` | Developer-facing guide; cancellation patterns, resumption response construction, framework-agnostic recovery walkthrough. |
+| `azure-ai-agentserver-responses/docs/durability-contract.md` | The per-row × per-path conformance contract matrix (rows 1–4 + Row 11 checkpoint-write); the test-facing companion to this design spec. |
+
+A change to this spec implies coordinated changes to those documents.
+A change to the durable-task primitive's recovery / streaming /
+steering surface implies a review of this spec.
+
+---
+
+## §20 — Change discipline
+
+This spec is the source of truth for the responses durability layer.
+Implementation MUST NOT diverge silently. Every change here is
+mirrored by:
+
+1. The corresponding implementation change in the chosen host
+   language (orchestrator + dispatch + endpoint layer).
+2. The two developer guides above.
+3. A conformance test under the durability-contract suite that
+   exercises the new or changed behaviour end-to-end through the
+   create-response endpoint, on the real file-based providers, with
+   a real crash harness for any recovery-relevant change.
+
+If a future change has to alter this contract (rather than extend it),
+this document MUST be updated first, the change MUST be reviewed as a
+contract change, and the implementation MUST land in a single
+coordinated commit alongside the contract update.
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/README.md b/sdk/agentserver/azure-ai-agentserver-responses/samples/README.md
index 505ab0f128ef..87422bfb8974 100644
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/README.md
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/README.md
@@ -4,42 +4,77 @@ languages:
 - python
 products:
 - azure
-name: azure-ai-agentserver-responses samples for Python
-description: Samples for the azure-ai-agentserver-responses client library.
+name: azure-ai-agentserver-responses durable samples for Python
+description: Durable Responses-API agent samples for the azure-ai-agentserver-responses preview.
 ---
 
-# azure-ai-agentserver-responses Samples
+# azure-ai-agentserver-responses — durable samples
 
-## Quick start
+This preview drop ships the **durable** Responses-API samples. Each shows a
+crash-resilient, optionally steerable handler built on the spec-025 durability
+primitives (`durable_background=True`, one `OutputItem` + `stream.checkpoint()`
+per unit of work, recovery via `context.persisted_response`).
 
-```bash
-pip install -r requirements.txt
-python sample_01_getting_started.py
-```
+## Run them locally (crash → recover)
+
+The hosted task API is currently returning 403, so the durable samples are
+exercised **locally** — the durable task store + response store are file-backed,
+no hosted dependency. A ready-to-run, verified kit lives at:
+
+> **[`durable-responses-agent-demo/local/`](durable-responses-agent-demo/local/README.md)** —
+> `./setup.sh` then `./run.sh` for an automated stream → crash → recover → verify
+> run, or `./serve.sh` to drive the agent yourself.
+
+The same pattern (`AGENTSERVER_TASKS_BACKEND=local` +
+`AGENTSERVER_DURABLE_ROOT=<dir>`, restart the process to recover) applies to
+every sample below.
 
 ## Samples index
 
 | # | Sample | Pattern | Description |
 |---|--------|---------|-------------|
-| 01 | [Getting Started](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_01_getting_started.py) | `TextResponse` | Echo handler — simplest async handler that echoes user input |
-| 02 | [Streaming Text Deltas](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_02_streaming_text_deltas.py) | `TextResponse` + `text=iterable` | Token-by-token streaming via async iterable, with `configure` callback |
-| 03 | [Full Control](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_03_full_control.py) | `ResponseEventStream` | Convenience, streaming, and builder — three ways to emit the same output |
-| 04 | [Function Calling](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_04_function_calling.py) | `ResponseEventStream` | Two-turn function calling with convenience and builder variants |
-| 05 | [Conversation History](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_05_conversation_history.py) | `TextResponse` + `text=callable` | Study tutor with `context.get_history()` and `ResponsesServerOptions` |
-| 06 | [Multi-Output](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_06_multi_output.py) | `ResponseEventStream` | Math solver: reasoning + message, convenience and builder variants |
-| 07 | [Customization](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_07_customization.py) | `TextResponse` | Custom `ResponsesServerOptions`, default model, debug logging |
-| 08 | [Mixin Composition](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_08_mixin_composition.py) | `TextResponse` | Multi-protocol server via cooperative mixin inheritance |
-| 09 | [Self-Hosting](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_09_self_hosting.py) | `TextResponse` | Mount responses into an existing Starlette app under `/api` |
-| 10 | [Streaming Upstream](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_10_streaming_upstream.py) | Raw events | Forward to upstream streaming LLM via `openai` SDK, relay SSE events |
-| 11 | [Non-Streaming Upstream](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_11_non_streaming_upstream.py) | `ResponseEventStream` | Forward to upstream non-streaming LLM via `openai` SDK, emit items |
-| 12 | [Image Generation](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_12_image_generation.py) | `ResponseEventStream` | Image gen convenience, streaming partials, and full-control builder |
-| 13 | [Image Input](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_13_image_input.py) | `ResponseContext` | Receive images via URL, base64 data URL, or file ID |
-| 14 | [File Inputs](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_14_file_inputs.py) | `ResponseContext` | Receive files via base64 data URL, URL, or file ID |
-| 15 | [Annotations](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_15_annotations.py) | `ResponseEventStream` | Attach file_path, file_citation, and url_citation annotations to messages |
-| 16 | [Structured Outputs](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_16_structured_outputs.py) | `ResponseEventStream` | Return structured JSON as a `structured_outputs` item |
-
-### When to use which
-
-- **`TextResponse`** — Use for text-only responses (samples 1, 2, 5, 7–9). Handles the full SSE lifecycle automatically.
-- **`ResponseEventStream`** — Use when you need function calls, reasoning items, multiple output types, image generation, structured outputs, annotations, upstream proxying, or fine-grained event control (samples 3, 4, 6, 10–12, 15, 16).
-- **`ResponseContext`** — Use `get_input_items()` to inspect incoming images and files (samples 13, 14).
\ No newline at end of file
+| 18 | [Durable Copilot](sample_18_durable_copilot.py) | Durable + steerable | GitHub Copilot SDK with `durable_background=True, steerable_conversations=True` — `create_session` / `resume_session` flow with live delta forwarding |
+| 19 | [Durable Streaming](sample_19_durable_streaming.py) | Durable | Three-phase streaming handler with `durable_background=True` — uses `context.conversation_chain_metadata` watermarks to skip phases that already completed on recovery |
+| 20 | [Durable Steering](sample_20_durable_steering.py) | Durable + steerable | Demonstrates `context.is_steered_turn` on the drain re-entry with `durable_background=True, steerable_conversations=True` |
+| 21 | [Durable LangGraph](sample_21_durable_langgraph.py) | Durable + steerable | LangGraph upstream framework integration — `context.conversation_chain_id` as the LangGraph thread id |
+| 22 | [Durable Multiturn](sample_22_durable_multiturn.py) | Durable | Multi-turn conversation with `durable_background=True, steerable_conversations=False` — `context.conversation_chain_metadata` tracks per-turn counters |
+
+The flagship end-to-end demo (15-phase × 4-subcall research agent, one
+checkpoint per sub-call, azd-deployable + locally runnable) is
+[`durable-responses-agent-demo/`](durable-responses-agent-demo/).
+
+## Key durable APIs
+
+Use these from a durable handler (`ResponseContext`):
+
+- `context.is_recovery` / `context.persisted_response` — seed the stream from the
+  persisted snapshot and resume at the first un-checkpointed item.
+- `context.is_steered_turn` / `context.pending_input_count` — observe and drain
+  mid-turn steering inputs.
+- `context.conversation_chain_metadata` / `context.conversation_chain_id` —
+  per-conversation durable metadata and the stable chain id.
+- `await context.exit_for_recovery()` — graceful-shutdown primitive that leaves
+  the response `in_progress` for next-lifetime recovery (works in every handler
+  shape).
+
+## Enabling durability and steering
+
+Durable + steerable behaviour is **opt-in** via `ResponsesServerOptions` — the
+defaults are both `False`:
+
+```python
+from azure.ai.agentserver.responses import ResponsesAgentServerHost, ResponsesServerOptions
+
+app = ResponsesAgentServerHost(
+    options=ResponsesServerOptions(
+        durable_background=True,             # opt-in to crash recovery
+        steerable_conversations=True,        # opt-in to mid-turn steering
+    ),
+)
+```
+
+Without `durable_background=True`, a crash mid-handler leaves the response in the
+"crash-failed" state (the next process lifetime marks it `failed` instead of
+re-invoking the handler). Without `steerable_conversations=True`, concurrent
+multi-turn requests for the same conversation return `409 conversation_locked`
+instead of queueing.
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/.gitignore b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/.gitignore
new file mode 100644
index 000000000000..290ba6d930d3
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/.gitignore
@@ -0,0 +1,11 @@
+# azd environment
+.azure/*/state/
+.azure/*/*.env.bak
+
+# Demo client runtime
+.demo-session
+
+# Docker-build staging dir — populated by ./build.sh which copies
+# the checked-in wheels from sdk/agentserver/wheels/ into here. Never
+# committed: source of truth is the central wheels directory.
+src/durable-responses-agent-demo/wheels/
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/README.md b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/README.md
new file mode 100644
index 000000000000..7ef57e4eb510
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/README.md
@@ -0,0 +1,193 @@
+# Durable Responses Research Agent — Demo
+
+> **▶ Deploy it (hosted, recommended):** `azd deploy` this sample and drive it
+> against the hosted Foundry deployment — durable stream → reconnect → recover
+> works against the hosted task API. Prefer an offline run? The verified local
+> kit in **[`local/`](local/README.md)** exercises the same
+> stream → crash → recover → verify flow file-backed on your machine
+> (`cd local && ./setup.sh && ./run.sh`).
+
+A `ResponsesAgentServerHost`-decorated long-running research agent
+that demonstrates four platform capabilities of the Azure AI Hosted
+Agent + the `azure-ai-agentserver-responses` package:
+
+1. **Long-running responses run uninterrupted past the platform's
+   sandbox-eviction window.** The underlying `@multi_turn_task`
+   primitive's PATCH lease-renewal cycle (every ~30s, half of the 60s
+   lease) refreshes the platform's sandbox idle-reclaim timer. The
+   demo runs for several minutes with **zero client-side keepalive
+   ingress** and the sandbox stays warm the whole time.
+
+2. **Recovery from container crashes.** When the agent container
+   dies (intentional crash or OOM), the platform's nanny worker
+   brings it back within ~1 min **without any new client ingress**.
+   The durable response automatically resumes with
+   `context.is_recovery is True`. Recovery uses the
+   **one-OutputItem-per-subcall** pattern: the persisted response *is*
+   the watermark — the handler seeds its stream from
+   `context.persisted_response` and resumes at
+   `len(stream.response.output)`, re-emitting `response.in_progress`
+   as the client-visible reset. User-visible: any reconnect attempt
+   picks up the recovered run.
+
+3. **Steering.** POSTing a follow-up turn (with `previous_response_id`
+   pointing at the still-running response) queues the input as a
+   steering input. The agent observes
+   `cancellation_signal.is_set() and context.pending_input_count > 0`,
+   winds down at the next phase boundary, and re-enters with
+   `context.is_steered_turn is True` carrying the new input.
+
+4. **Operator cancel.** `POST /responses/{id}/cancel` fires
+   `cancellation_signal` + stamps `context.client_cancelled`; the
+   framework forces the response to `status="cancelled"` regardless
+   of what the handler emits (B11 contract).
+
+## Compared to the invocations demo
+
+This demo is intentionally **much thinner** than its sibling
+`durable-agent-demo` (which is built on the invocations protocol).
+The reason: the responses package wraps the OpenAI Responses API
+wire protocol, so the framework owns everything the invocations demo
+had to wire by hand:
+
+| Concern | Invocations demo | Responses demo |
+|---|---|---|
+| Wire protocol | Custom JSON shape; handler writes the SSE format | OpenAI Responses API SSE event taxonomy; emitted via `ResponseEventStream` builders |
+| Cancellation route | Custom `@app.cancel_invocation_handler` that looks up the task and calls `run.cancel()` | Built-in `POST /responses/{id}/cancel` route handled by the framework |
+| Stream replay route | Custom `@app.get_invocation_handler` that subscribes to the per-invocation stream | Built-in `GET /responses/{id}?stream=true&starting_after=N` |
+| Durability + steering | Compose `@multi_turn_task(steerable=True)` directly; map `task_id`/`input_id` to session/invocation | Opt-in via `ResponsesServerOptions(durable_background=True, steerable_conversations=True)` — framework handles the rest |
+| Recovery surface | Read `ctx.entry_mode == "recovered"` + `ctx.metadata` | Read `context.is_recovery` + seed from `context.persisted_response`; same recovery primitive underneath |
+
+`main.py` here is ~250 lines (mostly the subcall-streaming logic);
+`agent.py` + `app.py` for the invocations demo is ~700 lines.
+
+What the agent actually does: a faithful port of the invocations
+`durable-agent-demo` — **15 research phases × 4 chained subcalls each**
+(research → critique → refine → synthesize, ~1500 tokens/subcall via a
+real `gpt-4.1-mini` call), with intra-phase and inter-phase cooldowns so
+a run spans ~33 min (~2x the sandbox-eviction window). Each subcall is
+**one OutputItem** with its own `yield stream.checkpoint()`, so the
+persisted response is a per-subcall watermark: a crash recovers at the
+next un-finished subcall (the actively-streaming item was never closed,
+so it never entered the snapshot and is re-run cleanly — at most one
+wasted subcall). Same env knobs as the invocations demo
+(`NUM_PHASES`, `CALLS_PER_PHASE`, `TARGET_OUTPUT_TOKENS`,
+`INTRA_PHASE_COOLDOWN_SEC`, `INTER_PHASE_COOLDOWN_SEC`, `DEMO_MODE`).
+
+Between phases the agent sleeps for `INTER_PHASE_COOLDOWN_SEC` (30s
+default in the hosted defaults) so a single demo run spans the
+sandbox-eviction window and exercises the lease keep-alive path.
+
+## Prerequisites
+
+- Python 3.11+
+- Azure subscription with AI Foundry access
+- [Azure Developer CLI](https://learn.microsoft.com/azure/developer/azure-developer-cli/install-azd)
+- `azd` AI agents extension: `azd extension install azure.ai.agents`
+
+## Deploy
+
+```bash
+# 1. Stage the checked-in agentserver preview wheels into the docker
+#    build context (build.sh copies sdk/agentserver/wheels/*.whl into
+#    a per-sample gitignored staging dir — no compilation, no PyPI fetch).
+./build.sh
+
+# 2. Login + deploy.
+azd auth login
+azd up
+```
+
+The deploy provisions infra + ships the container image and prints
+the responses endpoint. Point `demo-client.sh` at your deployment by
+setting the `ENDPOINT=` env var when invoking (or editing the default
+near the top of the script).
+
+> The `azure-ai-agentserver-responses` package's durable + steerable
+> surface is in **private preview** and is not on PyPI yet. It ships
+> as the pre-release wheels checked into
+> [`sdk/agentserver/wheels/`](../../../../wheels). See
+> [`sdk/agentserver/wheels/README.md`](../../../../wheels/README.md)
+> for the consumption workflow in your own project.
+
+## demo-client.sh — command reference
+
+| Command | What it does |
+|---|---|
+| `./demo-client.sh start "<topic>"` | Dispatches `POST /responses` with `{stream: true, store: true, background: true}` and the topic, then attaches to the SSE stream via `GET /responses/{id}?stream=true`. Writes the new `response_id` to `.demo-session`. |
+| `./demo-client.sh stream` | Reuses the `response_id` + `last_sequence_number` from `.demo-session` and reattaches via `GET /responses/{id}?stream=true&starting_after=N`. The server skips events you've already seen. |
+| `./demo-client.sh steer "<topic>"` | POSTs a new response with `previous_response_id` pointing at the active one. With `steerable_conversations=True` the framework queues it as a steering input on the active conversation; the agent winds down the current turn at its next phase boundary and re-enters with the new topic. |
+| `./demo-client.sh cancel` | `POST /responses/{id}/cancel` on the active response. The framework fires `cancellation_signal` + stamps `context.client_cancelled`; the response transitions to `status=cancelled`. |
+| `./demo-client.sh crash` | POSTs `{"input": "crash"}`. The agent (gated by `DEMO_MODE=1`) calls `os._exit(137)`. The platform's nanny worker brings the container back within ~1 min; `./demo-client.sh stream` after will pick up the recovered run. |
+| `./demo-client.sh delete` | `DELETE /responses/{id}`. Cleans up the persisted snapshot + per-response stream. |
+| `./demo-client.sh status` | Prints the local session state (`RESPONSE_ID`, `LAST_SEQUENCE_NUMBER`) + the server's current snapshot of the response. |
+| `./demo-client.sh logs` | Tails the agent container's stdout/stderr via `azd ai agent monitor --follow`. |
+| `./demo-client.sh reset` | Deletes `.demo-session`. The next `start` allocates a fresh response. |
+
+### Session-state lifecycle
+
+The client tracks one active response per `.demo-session` file:
+
+```
+./demo-client.sh start "<topic>"
+        │
+        ├─ RESPONSE_ID          = caresp_...   ← assigned by the platform
+        ├─ LAST_SEQUENCE_NUMBER = 0            ← bumps as events stream
+        └─ written to .demo-session
+                │
+                ▼  these commands REUSE the same response_id:
+        ./demo-client.sh stream     (resumes from LAST_SEQUENCE_NUMBER)
+        ./demo-client.sh steer "<new topic>"  (creates a new response steered on the prior)
+        ./demo-client.sh crash
+        ./demo-client.sh cancel
+        ./demo-client.sh delete
+        ./demo-client.sh logs
+        ./demo-client.sh status
+
+To start over with a brand-new response:
+        ./demo-client.sh reset            # clears .demo-session
+        ./demo-client.sh start "<topic>"
+```
+
+`steer` is the only command that bumps `RESPONSE_ID` — the steered
+turn is technically a new response (with a new `response_id`) whose
+`previous_response_id` points at the prior one. The client tracks the
+prior id in `PREV_RESPONSE_ID` for convenience.
+
+## Local iteration
+
+The **[`local/`](local/README.md)** kit runs this agent fully on your machine —
+a file-backed durable store (no hosted task API), with one command for the
+automated crash → recover demo and another to serve the agent for manual
+exploration:
+
+```bash
+cd local
+./setup.sh        # builds a venv from ../../../../wheels + deps
+
+az login
+export FOUNDRY_PROJECT_ENDPOINT="https://<account>.services.ai.azure.com/api/projects/<project>"
+export AZURE_AI_MODEL_DEPLOYMENT_NAME="gpt-4o"
+
+./run.sh          # automated: stream -> crash -> restart -> recover -> verify
+./serve.sh        # or drive it yourself: curl http://localhost:8088/responses
+```
+
+See [`local/README.md`](local/README.md) for the manual curl recipe (stream →
+crash → reconnect) and how the local durable backend works
+(`AGENTSERVER_TASKS_BACKEND=local` + `AGENTSERVER_DURABLE_ROOT`).
+
+## Configuration
+
+All knobs are env vars read at startup. Hosted defaults are tuned for
+the demo's "span the eviction window" narrative; override for local
+iteration.
+
+| Var | Default | Description |
+|---|---|---|
+| `FOUNDRY_PROJECT_ENDPOINT` | (required) | Foundry project endpoint for the upstream `gpt-4.1-mini` calls. Platform-injected in hosted; set manually locally. |
+| `AZURE_AI_MODEL_DEPLOYMENT_NAME` | `gpt-4.1-mini` | Responses-API model deployment name. |
+| `NUM_PHASES` | `5` | Logical research phases per run. |
+| `TARGET_OUTPUT_TOKENS` | `200` | `max_output_tokens` per phase's upstream call. |
+| `INTER_PHASE_COOLDOWN_SEC` | `30` | Sleep between phases. Set to `0` for local iteration. |
+| `DEMO_MODE` | unset | When `1`, the input `"crash"` triggers `os._exit(137)`. Production deployments should leave this off. |
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/azure.yaml b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/azure.yaml
new file mode 100644
index 000000000000..fd0f5a0b6ccc
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/azure.yaml
@@ -0,0 +1,31 @@
+# yaml-language-server: $schema=https://raw.githubusercontent.com/Azure/azure-dev/main/schemas/v1.0/azure.yaml.json
+
+requiredVersions:
+    extensions:
+        azure.ai.agents: '>=0.1.0-preview'
+name: ai-foundry-starter-basic
+services:
+    durable-responses-agent-demo:
+        project: src/durable-responses-agent-demo
+        host: azure.ai.agent
+        language: docker
+        docker:
+            remoteBuild: true
+        config:
+            container:
+                resources:
+                    cpu: "1"
+                    memory: 2Gi
+            deployments:
+                - model:
+                    format: OpenAI
+                    name: gpt-4.1-mini
+                    version: "2025-04-14"
+                  name: gpt-4.1-mini
+                  sku:
+                    capacity: 1053
+                    name: GlobalStandard
+            startupCommand: python main.py
+infra:
+    provider: bicep
+    path: ./infra
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/build.sh b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/build.sh
new file mode 100755
index 000000000000..707bf2a1ee66
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/build.sh
@@ -0,0 +1,39 @@
+#!/usr/bin/env bash
+# Stage agentserver preview wheels into the docker build context.
+# Run this BEFORE 'azd up' or 'docker build'.
+#
+# Wheels are checked into the repo at sdk/agentserver/wheels/ — this
+# script just copies them into a per-sample docker-build staging dir
+# (src/durable-responses-agent-demo/wheels/, gitignored) so the
+# Dockerfile's `COPY wheels/ /tmp/wheels/` finds them at build time.
+#
+# Bundles all three preview packages (core, invocations, responses) so
+# a single `pip install /tmp/wheels/*.whl` gives the container the
+# full surface.
+#
+# To refresh the source wheels (maintainer-only — devs shouldn't need
+# to do this), see ../../../../wheels/README.md.
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../../../../.." && pwd)"
+CENTRAL_WHEELS="$REPO_ROOT/sdk/agentserver/wheels"
+STAGING_DIR="$SCRIPT_DIR/src/durable-responses-agent-demo/wheels"
+
+if [[ ! -d "$CENTRAL_WHEELS" ]] || ! ls "$CENTRAL_WHEELS"/*.whl >/dev/null 2>&1; then
+    echo "ERROR: no checked-in wheels found at $CENTRAL_WHEELS" >&2
+    echo "       Did you pull the latest from feature/agentserver-durable-agent-demo?" >&2
+    exit 1
+fi
+
+echo "==> Staging checked-in preview wheels into docker build context"
+echo "    src:  $CENTRAL_WHEELS"
+echo "    dst:  $STAGING_DIR"
+rm -rf "$STAGING_DIR"
+mkdir -p "$STAGING_DIR"
+cp "$CENTRAL_WHEELS"/*.whl "$STAGING_DIR"/
+ls -la "$STAGING_DIR"/*.whl
+
+echo ""
+echo "Done. Now run: azd up   (or docker build)"
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/demo-client.sh b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/demo-client.sh
new file mode 100755
index 000000000000..8232d6ce8132
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/demo-client.sh
@@ -0,0 +1,464 @@
+#!/usr/bin/env bash
+# ─────────────────────────────────────────────────────────────────────────────
+# Durable Responses Research Agent — Demo Client
+#
+# Showcases four platform capabilities of the responses package
+# (all empirically validated against a Foundry hosted deployment):
+#   1. LONG-RUNNING RESPONSES — the underlying @multi_turn_task lease
+#      renewals (every ~30s) keep the platform's sandbox idle-reclaim
+#      timer fresh, so a single response stays warm well past the
+#      eviction window without any client-side keepalive ingress.
+#   2. CRASH RECOVERY — when the container dies, the platform's nanny
+#      worker restarts it within ~1 min on its own (no new ingress
+#      needed); the durable response auto-resumes with
+#      `context.is_recovery is True` from its last completed phase.
+#   3. STEERING — sending a follow-up turn while one is still running
+#      (POST with `previous_response_id`) queues the input; the agent
+#      winds down at the next phase boundary and re-enters with the
+#      new input as a fresh steered turn (`context.is_steered_turn`).
+#   4. OPERATOR CANCEL — POST /responses/{id}/cancel forces the
+#      response to `status=cancelled` regardless of what the handler
+#      emits (B11 contract).
+#
+# Commands:
+#   ./demo-client.sh start "<topic>"   Dispatch + stream a fresh response (bg+stream)
+#   ./demo-client.sh stream            Reconnect to the active response (no fresh POST)
+#   ./demo-client.sh steer "<topic>"   Queue a follow-up turn — agent winds down
+#                                      current turn at next checkpoint and switches
+#   ./demo-client.sh cancel            Operator cancel of the active response
+#   ./demo-client.sh crash             Trigger demo-mode container crash
+#   ./demo-client.sh delete            DELETE /responses/{id}
+#   ./demo-client.sh status            Show local session info
+#   ./demo-client.sh logs              Stream container stdout/stderr via azd
+#   ./demo-client.sh reset             Clear local session state
+# ─────────────────────────────────────────────────────────────────────────────
+
+set -uo pipefail
+
+# ── Config ────────────────────────────────────────────────────────────────────
+
+# Point at your own hosted deployment. After `azd ai agent run`, the
+# endpoint is printed in the deploy output (…/agents/<name>/endpoint/protocols),
+# or read it from your azd env (AGENT_*_RESPONSES_ENDPOINT). Override via
+# the ENDPOINT env var instead of editing this default.
+ENDPOINT="${ENDPOINT:-https://<account>.services.ai.azure.com/api/projects/<project>/agents/durable-responses-agent-demo/endpoint/protocols}"
+API_VERSION="${API_VERSION:-v1}"
+MODEL="${MODEL:-gpt-4.1-mini}"
+SESSION_FILE=".demo-session"
+
+# ── Colors ────────────────────────────────────────────────────────────────────
+
+BOLD='\033[1m'
+DIM='\033[2m'
+GREEN='\033[32m'
+YELLOW='\033[33m'
+RED='\033[31m'
+CYAN='\033[36m'
+RESET='\033[0m'
+
+# ── Session state ─────────────────────────────────────────────────────────────
+
+load_session() {
+    if [[ -f "$SESSION_FILE" ]]; then
+        # shellcheck disable=SC1090
+        source "$SESSION_FILE"
+    fi
+}
+
+save_session() {
+    {
+        echo "RESPONSE_ID=\"${RESPONSE_ID:-}\""
+        echo "PREV_RESPONSE_ID=\"${PREV_RESPONSE_ID:-}\""
+        echo "LAST_SEQUENCE_NUMBER=\"${LAST_SEQUENCE_NUMBER:-0}\""
+    } > "$SESSION_FILE"
+}
+
+ensure_token() {
+    if [[ "${LOCAL_NOAUTH:-0}" == "1" ]]; then
+        TOKEN="local-noauth"
+        return
+    fi
+    if [[ -z "${TOKEN:-}" ]]; then
+        TOKEN=$(az account get-access-token --resource https://ai.azure.com --query accessToken -o tsv 2>/dev/null)
+        if [[ -z "$TOKEN" ]]; then
+            echo -e "${RED}Failed to get Azure token. Run 'az login' first.${RESET}" >&2
+            exit 1
+        fi
+    fi
+}
+
+# Extract a top-level JSON field. Returns empty string on missing/null.
+_jq() {
+    local json="$1"
+    local key="$2"
+    echo "$json" | python3 -c "
+import sys, json
+try:
+    d = json.loads(sys.stdin.read())
+    v = d.get('$key')
+    print('' if v is None else v)
+except Exception:
+    print('')
+" 2>/dev/null
+}
+
+# ── SSE stream renderer (Python — see comment) ───────────────────────────────
+
+# Why a python renderer instead of bash:
+#  - At LLM emit rate (50-100 tok/s) a bash 'while read | printf' loop
+#    makes the real interactive terminal the bottleneck — one printf-per-
+#    token causes syscall thrash. The python renderer batches writes per
+#    SSE event, keeping the terminal responsive even on slow links.
+#  - We also need a single place to persist LAST_SEQUENCE_NUMBER for
+#    later reconnects.
+
+stream_sse() {
+    local url="$1"
+    local extra_header="${2:-}"
+    local method="${3:-GET}"
+    local post_body="${4:-}"
+    ensure_token
+
+    local hdrs=(-H "Authorization: Bearer $TOKEN"
+                -H "Accept: text/event-stream"
+                -H "Foundry-Features: HostedAgents=V1Preview")
+    if [[ -n "$extra_header" ]]; then
+        hdrs+=(-H "$extra_header")
+    fi
+
+    # Use a pipe + python to render; on exit (Ctrl-C or stream end) the
+    # renderer prints the last sequence number AND the discovered response
+    # id (if the stream came from POST /responses) to sidecar files we read
+    # back into LAST_SEQUENCE_NUMBER / RESPONSE_ID.
+    local seq_file=".demo-session.lastseq"
+    local id_file=".demo-session.rid"
+    rm -f "$seq_file" "$id_file"
+
+    STREAM_RESULT="ok"
+    local curl_args=("${hdrs[@]}")
+    if [[ "$method" == "POST" ]]; then
+        curl_args+=(-X POST -H "Content-Type: application/json" --data "$post_body")
+    fi
+    curl -sS -N "${curl_args[@]}" "$url" 2>/dev/null | python3 -u -c "
+import json, sys, os, signal
+
+SEQ_FILE = '$seq_file'
+ID_FILE = '$id_file'
+
+def _save_seq(n):
+    try:
+        with open(SEQ_FILE, 'w') as f:
+            f.write(str(n))
+    except Exception:
+        pass
+
+def _save_id(rid):
+    try:
+        with open(ID_FILE, 'w') as f:
+            f.write(str(rid))
+    except Exception:
+        pass
+
+_last = 0
+_id_saved = False
+
+def _handle_sigint(*_):
+    _save_seq(_last)
+    sys.exit(0)
+
+signal.signal(signal.SIGINT, _handle_sigint)
+
+current_event = None
+current_data = []
+
+for raw in sys.stdin:
+    line = raw.rstrip('\n')
+    if not line:
+        if current_event and current_data:
+            data = '\n'.join(current_data)
+            try:
+                payload = json.loads(data)
+            except Exception:
+                payload = {'_raw': data}
+            seq = payload.get('sequence_number')
+            if isinstance(seq, int):
+                _last = seq
+            # Extract response id from the first lifecycle event we see.
+            if not _id_saved:
+                resp = payload.get('response') or {}
+                rid = resp.get('id')
+                if rid:
+                    _save_id(rid)
+                    _id_saved = True
+            t = payload.get('type', current_event)
+            if t == 'response.output_text.delta':
+                sys.stdout.write(payload.get('delta', ''))
+                sys.stdout.flush()
+            elif t in ('response.created', 'response.in_progress', 'response.completed',
+                       'response.failed', 'response.cancelled', 'response.incomplete'):
+                resp = payload.get('response') or {}
+                status = resp.get('status') or t.split('.')[-1]
+                sys.stdout.write('\n\033[2m[' + t + ' status=' + str(status) + ']\033[0m\n')
+                sys.stdout.flush()
+        current_event = None
+        current_data = []
+        continue
+    if line.startswith('event:'):
+        current_event = line.split(':', 1)[1].strip()
+    elif line.startswith('data:'):
+        current_data.append(line.split(':', 1)[1].lstrip())
+
+_save_seq(_last)
+print()
+"
+    local rc=$?
+    if [[ -f "$id_file" ]]; then
+        local new_id
+        new_id=$(cat "$id_file" 2>/dev/null || echo "")
+        if [[ -n "$new_id" ]]; then
+            RESPONSE_ID="$new_id"
+        fi
+        rm -f "$id_file"
+    fi
+    if [[ -f "$seq_file" ]]; then
+        LAST_SEQUENCE_NUMBER=$(cat "$seq_file" 2>/dev/null || echo "0")
+        rm -f "$seq_file"
+    fi
+    save_session
+    if [[ "$rc" -ne 0 && "$rc" -ne 130 ]]; then
+        STREAM_RESULT="error"
+    fi
+}
+
+# ── Commands ──────────────────────────────────────────────────────────────────
+
+cmd_start() {
+    local topic="${1:-Research the future of quantum computing}"
+    RESPONSE_ID=""
+    PREV_RESPONSE_ID=""
+    LAST_SEQUENCE_NUMBER="0"
+    save_session
+    ensure_token
+
+    echo -e "${GREEN}Starting a fresh research response${RESET}"
+    echo -e "${DIM}Topic: ${topic}${RESET}"
+
+    local body
+    body=$(python3 -c "
+import json, sys
+print(json.dumps({
+    'model': '$MODEL',
+    'input': sys.argv[1],
+    'stream': True,
+    'store': True,
+    'background': True,
+}))
+" "$topic")
+
+    local response
+    # POST with stream=true returns SSE; pipe through stream_sse which
+    # extracts response_id from the first response.created event,
+    # renders the rest, and persists LAST_SEQUENCE_NUMBER on exit.
+    echo ""
+    echo -e "${BOLD}Streaming. ${DIM}Use Ctrl-C to detach; reconnect later with './demo-client.sh stream'.${RESET}"
+    stream_sse "${ENDPOINT}/responses?api-version=${API_VERSION}" "" POST "$body"
+    if [[ -z "${RESPONSE_ID:-}" ]]; then
+        echo -e "${RED}Failed to dispatch (no response.id captured from SSE).${RESET}"
+        exit 1
+    fi
+    echo -e "${DIM}Dispatched: response_id=${RESPONSE_ID}${RESET}"
+    _report_stream_result
+}
+
+cmd_stream() {
+    load_session
+    if [[ -z "${RESPONSE_ID:-}" ]]; then
+        echo -e "${RED}No active response. Run './demo-client.sh start \"<topic>\"' first.${RESET}" >&2
+        exit 1
+    fi
+    ensure_token
+
+    echo -e "${DIM}Reconnecting to response ${RESPONSE_ID}${RESET}"
+    local url="${ENDPOINT}/responses/${RESPONSE_ID}?stream=true&api-version=${API_VERSION}"
+    if [[ "${LAST_SEQUENCE_NUMBER:-0}" != "0" ]]; then
+        url="${url}&starting_after=${LAST_SEQUENCE_NUMBER}"
+        echo -e "${DIM}Resuming from sequence_number ${LAST_SEQUENCE_NUMBER}${RESET}"
+    fi
+    stream_sse "$url"
+    _report_stream_result
+}
+
+cmd_steer() {
+    local topic="${1:-}"
+    if [[ -z "$topic" ]]; then
+        echo -e "${RED}Usage: ./demo-client.sh steer \"<new topic>\"${RESET}" >&2
+        exit 1
+    fi
+    load_session
+    if [[ -z "${RESPONSE_ID:-}" ]]; then
+        echo -e "${RED}No active response to steer. Run './demo-client.sh start' first.${RESET}" >&2
+        exit 1
+    fi
+    ensure_token
+
+    echo -e "${YELLOW}Steering: queuing follow-up turn on response ${RESPONSE_ID}${RESET}"
+    echo -e "${DIM}New topic: ${topic}${RESET}"
+
+    local body
+    body=$(python3 -c "
+import json, sys
+print(json.dumps({
+    'model': '$MODEL',
+    'input': sys.argv[1],
+    'previous_response_id': sys.argv[2],
+    'stream': True,
+    'store': True,
+    'background': True,
+}))
+" "$topic" "$RESPONSE_ID")
+
+    PREV_RESPONSE_ID="$RESPONSE_ID"
+    RESPONSE_ID=""
+    LAST_SEQUENCE_NUMBER="0"
+    save_session
+
+    echo ""
+    echo -e "${BOLD}Streaming the steered turn.${RESET}"
+    # POST returns SSE (stream=true) — stream_sse captures the new
+    # response_id from the first response.created event.
+    stream_sse "${ENDPOINT}/responses?api-version=${API_VERSION}" "" POST "$body"
+    if [[ -z "${RESPONSE_ID:-}" ]]; then
+        echo -e "${RED}Failed to steer (no response.id captured from SSE).${RESET}"
+        RESPONSE_ID="$PREV_RESPONSE_ID"
+        save_session
+        exit 1
+    fi
+    echo -e "${DIM}New response_id=${RESPONSE_ID} (steered after ${PREV_RESPONSE_ID})${RESET}"
+    _report_stream_result
+}
+
+cmd_cancel() {
+    load_session
+    if [[ -z "${RESPONSE_ID:-}" ]]; then
+        echo -e "${RED}No active response.${RESET}" >&2
+        exit 1
+    fi
+    ensure_token
+
+    echo -e "${YELLOW}Cancelling response ${RESPONSE_ID}${RESET}"
+    curl -sS -X POST \
+        -H "Authorization: Bearer $TOKEN" \
+        -H "Foundry-Features: HostedAgents=V1Preview" \
+        "${ENDPOINT}/responses/${RESPONSE_ID}/cancel?api-version=${API_VERSION}" | python3 -m json.tool
+}
+
+cmd_crash() {
+    load_session
+    ensure_token
+
+    echo -e "${RED}Triggering container crash via input=\"crash\"${RESET}"
+    echo -e "${DIM}(requires DEMO_MODE=1 on the server)${RESET}"
+
+    local body='{"model": "'"$MODEL"'", "input": "crash", "stream": true, "store": true, "background": true}'
+    # The crash POST returns SSE briefly (response.created + response.failed
+    # if our handler emits before exit) — pipe through stream_sse so we see
+    # whatever comes out before the container dies. The renderer's
+    # accumulated curl will then error out when the connection drops.
+    stream_sse "${ENDPOINT}/responses?api-version=${API_VERSION}" "" POST "$body"
+
+    echo ""
+    echo -e "${DIM}Container will exit shortly. Platform nanny restarts within ~1 min.${RESET}"
+    echo -e "${DIM}If you had an active response, './demo-client.sh stream' after restart will${RESET}"
+    echo -e "${DIM}reconnect and resume from the last completed phase.${RESET}"
+}
+
+cmd_delete() {
+    load_session
+    if [[ -z "${RESPONSE_ID:-}" ]]; then
+        echo -e "${RED}No active response.${RESET}" >&2
+        exit 1
+    fi
+    ensure_token
+
+    echo -e "${YELLOW}Deleting response ${RESPONSE_ID}${RESET}"
+    curl -sS -X DELETE \
+        -H "Authorization: Bearer $TOKEN" \
+        -H "Foundry-Features: HostedAgents=V1Preview" \
+        "${ENDPOINT}/responses/${RESPONSE_ID}?api-version=${API_VERSION}" | python3 -m json.tool
+}
+
+cmd_status() {
+    load_session
+    echo -e "${BOLD}Local session state${RESET} ${DIM}(${SESSION_FILE})${RESET}"
+    echo "  RESPONSE_ID:          ${RESPONSE_ID:-<none>}"
+    echo "  PREV_RESPONSE_ID:     ${PREV_RESPONSE_ID:-<none>}"
+    echo "  LAST_SEQUENCE_NUMBER: ${LAST_SEQUENCE_NUMBER:-0}"
+    echo ""
+    if [[ -n "${RESPONSE_ID:-}" ]]; then
+        ensure_token
+        echo -e "${BOLD}Server-side snapshot${RESET}"
+        curl -sS \
+            -H "Authorization: Bearer $TOKEN" \
+            -H "Foundry-Features: HostedAgents=V1Preview" \
+            "${ENDPOINT}/responses/${RESPONSE_ID}?api-version=${API_VERSION}" | python3 -m json.tool
+    fi
+}
+
+cmd_logs() {
+    azd ai agent monitor durable-responses-agent-demo --follow "$@"
+}
+
+cmd_reset() {
+    rm -f "$SESSION_FILE"
+    echo -e "${DIM}Cleared ${SESSION_FILE}.${RESET}"
+}
+
+_report_stream_result() {
+    case "$STREAM_RESULT" in
+        ok)    : ;;
+        error) echo -e "${RED}Stream errored; try './demo-client.sh stream' to reconnect.${RESET}" >&2 ;;
+    esac
+}
+
+usage() {
+    cat <<'USAGE'
+Durable Responses Research Agent — Demo Client
+
+Usage:
+  ./demo-client.sh start "<topic>"   Dispatch + stream a fresh research response
+  ./demo-client.sh stream            Reconnect to the active response (no fresh POST)
+  ./demo-client.sh steer "<topic>"   Queue a follow-up turn — agent winds down
+                                     current turn at next checkpoint and switches
+  ./demo-client.sh cancel            Operator cancel of the active response
+  ./demo-client.sh crash             Trigger demo-mode container crash
+  ./demo-client.sh delete            DELETE /responses/{id}
+  ./demo-client.sh status            Show local session info + server snapshot
+  ./demo-client.sh logs              Stream container stdout/stderr via azd
+  ./demo-client.sh reset             Clear local session state
+
+Environment overrides:
+  ENDPOINT     Foundry agent protocols endpoint (set to your deployment).
+  API_VERSION  Default: v1.
+  MODEL        Default: gpt-4.1-mini.
+USAGE
+}
+
+# ── Dispatch ──────────────────────────────────────────────────────────────────
+
+case "${1:-}" in
+    start)   shift; cmd_start "${1:-}" ;;
+    stream)  cmd_stream ;;
+    steer)   shift; cmd_steer "${1:-}" ;;
+    cancel)  cmd_cancel ;;
+    crash)   cmd_crash ;;
+    delete)  cmd_delete ;;
+    status)  cmd_status ;;
+    logs)    shift; cmd_logs "$@" ;;
+    reset)   cmd_reset ;;
+    -h|--help|help|"") usage ;;
+    *)
+        echo -e "\033[31mUnknown command: $1\033[0m" >&2
+        usage
+        exit 1
+        ;;
+esac
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/abbreviations.json b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/abbreviations.json
new file mode 100644
index 000000000000..879b2a9507b1
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/abbreviations.json
@@ -0,0 +1,137 @@
+{
+    "aiFoundryAccounts": "aif",
+    "analysisServicesServers": "as",
+    "apiManagementService": "apim-",
+    "appConfigurationStores": "appcs-",
+    "appManagedEnvironments": "cae-",
+    "appContainerApps": "ca-",
+    "authorizationPolicyDefinitions": "policy-",
+    "automationAutomationAccounts": "aa-",
+    "blueprintBlueprints": "bp-",
+    "blueprintBlueprintsArtifacts": "bpa-",
+    "cacheRedis": "redis-",
+    "cdnProfiles": "cdnp-",
+    "cdnProfilesEndpoints": "cdne-",
+    "cognitiveServicesAccounts": "cog-",
+    "cognitiveServicesFormRecognizer": "cog-fr-",
+    "cognitiveServicesTextAnalytics": "cog-ta-",
+    "computeAvailabilitySets": "avail-",
+    "computeCloudServices": "cld-",
+    "computeDiskEncryptionSets": "des",
+    "computeDisks": "disk",
+    "computeDisksOs": "osdisk",
+    "computeGalleries": "gal",
+    "computeSnapshots": "snap-",
+    "computeVirtualMachines": "vm",
+    "computeVirtualMachineScaleSets": "vmss-",
+    "containerInstanceContainerGroups": "ci",
+    "containerRegistryRegistries": "cr",
+    "containerServiceManagedClusters": "aks-",
+    "databricksWorkspaces": "dbw-",
+    "dataFactoryFactories": "adf-",
+    "dataLakeAnalyticsAccounts": "dla",
+    "dataLakeStoreAccounts": "dls",
+    "dataMigrationServices": "dms-",
+    "dBforMySQLServers": "mysql-",
+    "dBforPostgreSQLServers": "psql-",
+    "devicesIotHubs": "iot-",
+    "devicesProvisioningServices": "provs-",
+    "devicesProvisioningServicesCertificates": "pcert-",
+    "documentDBDatabaseAccounts": "cosmos-",
+    "documentDBMongoDatabaseAccounts": "cosmon-",
+    "eventGridDomains": "evgd-",
+    "eventGridDomainsTopics": "evgt-",
+    "eventGridEventSubscriptions": "evgs-",
+    "eventHubNamespaces": "evhns-",
+    "eventHubNamespacesEventHubs": "evh-",
+    "hdInsightClustersHadoop": "hadoop-",
+    "hdInsightClustersHbase": "hbase-",
+    "hdInsightClustersKafka": "kafka-",
+    "hdInsightClustersMl": "mls-",
+    "hdInsightClustersSpark": "spark-",
+    "hdInsightClustersStorm": "storm-",
+    "hybridComputeMachines": "arcs-",
+    "insightsActionGroups": "ag-",
+    "insightsComponents": "appi-",
+    "keyVaultVaults": "kv-",
+    "kubernetesConnectedClusters": "arck",
+    "kustoClusters": "dec",
+    "kustoClustersDatabases": "dedb",
+    "logicIntegrationAccounts": "ia-",
+    "logicWorkflows": "logic-",
+    "machineLearningServicesWorkspaces": "mlw-",
+    "managedIdentityUserAssignedIdentities": "id-",
+    "managementManagementGroups": "mg-",
+    "migrateAssessmentProjects": "migr-",
+    "networkApplicationGateways": "agw-",
+    "networkApplicationSecurityGroups": "asg-",
+    "networkAzureFirewalls": "afw-",
+    "networkBastionHosts": "bas-",
+    "networkConnections": "con-",
+    "networkDnsZones": "dnsz-",
+    "networkExpressRouteCircuits": "erc-",
+    "networkFirewallPolicies": "afwp-",
+    "networkFirewallPoliciesWebApplication": "waf",
+    "networkFirewallPoliciesRuleGroups": "wafrg",
+    "networkFrontDoors": "fd-",
+    "networkFrontdoorWebApplicationFirewallPolicies": "fdfp-",
+    "networkLoadBalancersExternal": "lbe-",
+    "networkLoadBalancersInternal": "lbi-",
+    "networkLoadBalancersInboundNatRules": "rule-",
+    "networkLocalNetworkGateways": "lgw-",
+    "networkNatGateways": "ng-",
+    "networkNetworkInterfaces": "nic-",
+    "networkNetworkSecurityGroups": "nsg-",
+    "networkNetworkSecurityGroupsSecurityRules": "nsgsr-",
+    "networkNetworkWatchers": "nw-",
+    "networkPrivateDnsZones": "pdnsz-",
+    "networkPrivateLinkServices": "pl-",
+    "networkPublicIPAddresses": "pip-",
+    "networkPublicIPPrefixes": "ippre-",
+    "networkRouteFilters": "rf-",
+    "networkRouteTables": "rt-",
+    "networkRouteTablesRoutes": "udr-",
+    "networkTrafficManagerProfiles": "traf-",
+    "networkVirtualNetworkGateways": "vgw-",
+    "networkVirtualNetworks": "vnet-",
+    "networkVirtualNetworksSubnets": "snet-",
+    "networkVirtualNetworksVirtualNetworkPeerings": "peer-",
+    "networkVirtualWans": "vwan-",
+    "networkVpnGateways": "vpng-",
+    "networkVpnGatewaysVpnConnections": "vcn-",
+    "networkVpnGatewaysVpnSites": "vst-",
+    "notificationHubsNamespaces": "ntfns-",
+    "notificationHubsNamespacesNotificationHubs": "ntf-",
+    "operationalInsightsWorkspaces": "log-",
+    "portalDashboards": "dash-",
+    "powerBIDedicatedCapacities": "pbi-",
+    "purviewAccounts": "pview-",
+    "recoveryServicesVaults": "rsv-",
+    "resourcesResourceGroups": "rg-",
+    "searchSearchServices": "srch-",
+    "serviceBusNamespaces": "sb-",
+    "serviceBusNamespacesQueues": "sbq-",
+    "serviceBusNamespacesTopics": "sbt-",
+    "serviceEndPointPolicies": "se-",
+    "serviceFabricClusters": "sf-",
+    "signalRServiceSignalR": "sigr",
+    "sqlManagedInstances": "sqlmi-",
+    "sqlServers": "sql-",
+    "sqlServersDataWarehouse": "sqldw-",
+    "sqlServersDatabases": "sqldb-",
+    "sqlServersDatabasesStretch": "sqlstrdb-",
+    "storageStorageAccounts": "st",
+    "storageStorageAccountsVm": "stvm",
+    "storSimpleManagers": "ssimp",
+    "streamAnalyticsCluster": "asa-",
+    "synapseWorkspaces": "syn",
+    "synapseWorkspacesAnalyticsWorkspaces": "synw",
+    "synapseWorkspacesSqlPoolsDedicated": "syndp",
+    "synapseWorkspacesSqlPoolsSpark": "synsp",
+    "timeSeriesInsightsEnvironments": "tsi-",
+    "webServerFarms": "plan-",
+    "webSitesAppService": "app-",
+    "webSitesAppServiceEnvironment": "ase-",
+    "webSitesFunctions": "func-",
+    "webStaticSites": "stapp-"
+}
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/ai/acr-role-assignment.bicep b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/ai/acr-role-assignment.bicep
new file mode 100644
index 000000000000..3e0c2b218be7
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/ai/acr-role-assignment.bicep
@@ -0,0 +1,27 @@
+targetScope = 'resourceGroup'
+
+@description('Name of the existing container registry')
+param acrName string
+
+@description('Principal ID to grant AcrPull role')
+param principalId string
+
+@description('Full resource ID of the ACR (for generating unique GUID)')
+param acrResourceId string
+
+// Reference the existing ACR in this resource group
+resource acr 'Microsoft.ContainerRegistry/registries@2023-07-01' existing = {
+  name: acrName
+}
+
+// Grant AcrPull role to the AI project's managed identity
+resource acrPullRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+  scope: acr
+  name: guid(acrResourceId, principalId, '7f951dda-4ed3-4680-a7ca-43fe172d538d')
+  properties: {
+    principalId: principalId
+    principalType: 'ServicePrincipal'
+    // AcrPull role
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '7f951dda-4ed3-4680-a7ca-43fe172d538d')
+  }
+}
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/ai/ai-project.bicep b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/ai/ai-project.bicep
new file mode 100644
index 000000000000..31b06ad76a25
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/ai/ai-project.bicep
@@ -0,0 +1,417 @@
+targetScope = 'resourceGroup'
+
+@description('Tags that will be applied to all resources')
+param tags object = {}
+
+@description('Main location for the resources')
+param location string
+
+@description('Optional salt to diversify resource names across project recreations')
+param resourceTokenSalt string = ''
+
+var resourceToken = empty(resourceTokenSalt) ? uniqueString(subscription().id, resourceGroup().id, location) : uniqueString(subscription().id, resourceGroup().id, location, resourceTokenSalt)
+
+@description('Name of the project')
+param aiFoundryProjectName string
+
+param deployments deploymentsType
+
+@description('Id of the user or app to assign application roles')
+param principalId string
+
+@description('Principal type of user or app')
+param principalType string
+
+@description('Optional. Name of an existing AI Services account in the current resource group. If not provided, a new one will be created.')
+param existingAiAccountName string = ''
+
+@description('List of connections to provision')
+param connections array = []
+
+@secure()
+@description('Map of connection name to credentials object. Kept as @secure to prevent secrets from appearing in deployment logs. Example: { "my-conn": { "key": "secret" } }')
+param connectionCredentials object = {}
+
+@description('Also provision dependent resources and connect to the project')
+param additionalDependentResources dependentResourcesType
+
+@description('Enable monitoring via appinsights and log analytics')
+param enableMonitoring bool = true
+
+@description('Enable hosted agent deployment')
+param enableHostedAgents bool = false
+
+@description('Enable the capability host for agent conversations. When false and hosted agents are enabled, the capability host is not created (v2 hosted agents handle storage automatically).')
+param enableCapabilityHost bool = true
+
+@description('Optional. Existing container registry resource ID. If provided, a connection will be created to this ACR instead of creating a new one.')
+param existingContainerRegistryResourceId string = ''
+
+@description('Optional. Existing container registry login server (e.g., myregistry.azurecr.io). Required if existingContainerRegistryResourceId is provided.')
+param existingContainerRegistryEndpoint string = ''
+
+@description('Optional. Name of an existing ACR connection on the Foundry project. If provided, no new ACR or connection will be created.')
+param existingAcrConnectionName string = ''
+
+@description('Optional. Existing Application Insights connection string. If provided, a connection will be created but no new App Insights resource.')
+param existingApplicationInsightsConnectionString string = ''
+
+@description('Optional. Existing Application Insights resource ID. Used for connection metadata when providing an existing App Insights.')
+param existingApplicationInsightsResourceId string = ''
+
+@description('Optional. Name of an existing Application Insights connection on the Foundry project. If provided, no new App Insights or connection will be created.')
+param existingAppInsightsConnectionName string = ''
+
+// Load abbreviations
+var abbrs = loadJsonContent('../../abbreviations.json')
+
+// Determine which resources to create based on connections
+var hasStorageConnection = length(filter(additionalDependentResources, conn => conn.resource == 'storage')) > 0
+var hasAcrConnection = length(filter(additionalDependentResources, conn => conn.resource == 'registry')) > 0
+var hasExistingAcr = !empty(existingContainerRegistryResourceId)
+var hasExistingAcrConnection = !empty(existingAcrConnectionName)
+var hasExistingAppInsightsConnection = !empty(existingAppInsightsConnectionName)
+var hasExistingAppInsightsConnectionString = !empty(existingApplicationInsightsConnectionString)
+// Only create new App Insights resources if monitoring enabled and no existing connection/connection string
+var shouldCreateAppInsights = enableMonitoring && !hasExistingAppInsightsConnection && !hasExistingAppInsightsConnectionString
+var hasSearchConnection = length(filter(additionalDependentResources, conn => conn.resource == 'azure_ai_search')) > 0
+var hasBingConnection = length(filter(additionalDependentResources, conn => conn.resource == 'bing_grounding')) > 0
+var hasBingCustomConnection = length(filter(additionalDependentResources, conn => conn.resource == 'bing_custom_grounding')) > 0
+
+// Extract connection names from ai.yaml for each resource type
+var storageConnectionName = hasStorageConnection ? filter(additionalDependentResources, conn => conn.resource == 'storage')[0].connectionName : ''
+var acrConnectionName = hasAcrConnection ? filter(additionalDependentResources, conn => conn.resource == 'registry')[0].connectionName : ''
+var searchConnectionName = hasSearchConnection ? filter(additionalDependentResources, conn => conn.resource == 'azure_ai_search')[0].connectionName : ''
+var bingConnectionName = hasBingConnection ? filter(additionalDependentResources, conn => conn.resource == 'bing_grounding')[0].connectionName : ''
+var bingCustomConnectionName = hasBingCustomConnection ? filter(additionalDependentResources, conn => conn.resource == 'bing_custom_grounding')[0].connectionName : ''
+
+// Enable monitoring via Log Analytics and Application Insights
+module logAnalytics '../monitor/loganalytics.bicep' = if (shouldCreateAppInsights) {
+  name: 'logAnalytics'
+  params: {
+    location: location
+    tags: tags
+    name: 'logs-${resourceToken}'
+  }
+}
+
+module applicationInsights '../monitor/applicationinsights.bicep' = if (shouldCreateAppInsights) {
+  name: 'applicationInsights'
+  params: {
+    location: location
+    tags: tags
+    name: 'appi-${resourceToken}'
+    logAnalyticsWorkspaceId: logAnalytics.outputs.id
+    projectMIPrincipalId: aiAccount::project.identity.principalId
+  }
+}
+
+// Always create a new AI Account for now (simplified approach)
+// TODO: Add support for existing accounts in a future version
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-06-01' = {
+  name: !empty(existingAiAccountName) ? existingAiAccountName : 'ai-account-${resourceToken}'
+  location: location
+  tags: tags
+  sku: {
+    name: 'S0'
+  }
+  kind: 'AIServices'
+  identity: {
+    type: 'SystemAssigned'
+  }
+  properties: {
+    allowProjectManagement: true
+    customSubDomainName: !empty(existingAiAccountName) ? existingAiAccountName : 'ai-account-${resourceToken}'
+    networkAcls: {
+      defaultAction: 'Allow'
+      virtualNetworkRules: []
+      ipRules: []
+    }
+    publicNetworkAccess: 'Enabled'
+    disableLocalAuth: true
+  }
+  
+  @batchSize(1)
+  resource seqDeployments 'deployments' = [
+    for dep in (deployments??[]): {
+      name: dep.name
+      properties: {
+        model: dep.model
+      }
+      sku: dep.sku
+    }
+  ]
+
+  resource project 'projects' = {
+    name: aiFoundryProjectName
+    location: location
+    identity: {
+      type: 'SystemAssigned'
+    }
+    properties: {
+      description: '${aiFoundryProjectName} Project'
+      displayName: '${aiFoundryProjectName}Project'
+    }
+    dependsOn: [
+      seqDeployments
+    ]
+  }
+
+  resource aiFoundryAccountCapabilityHost 'capabilityHosts@2025-10-01-preview' = if (enableHostedAgents && enableCapabilityHost) {
+    name: 'agents'
+    properties: {
+      capabilityHostKind: 'Agents'
+      // IMPORTANT: this is required to enable hosted agents deployment
+      // if no BYO Net is provided
+      enablePublicHostingEnvironment: true
+    }
+  }
+}
+
+
+// Create connection towards appinsights:
+// - when we create a new App Insights resource, OR
+// - when the user provided an existing App Insights connection string + resource ID but no existing connection name
+// Both cases are merged into a single resource to avoid duplicate ARM resource definitions (which fail deployment).
+var shouldCreateExistingAppInsightsConnection = enableMonitoring && hasExistingAppInsightsConnectionString && !hasExistingAppInsightsConnection && !empty(existingApplicationInsightsResourceId)
+var shouldCreateAppInsightsConnection = shouldCreateAppInsights || shouldCreateExistingAppInsightsConnection
+
+resource appInsightConnection 'Microsoft.CognitiveServices/accounts/projects/connections@2025-04-01-preview' = if (shouldCreateAppInsightsConnection) {
+  parent: aiAccount::project
+  name: 'appi-${resourceToken}'
+  properties: {
+    category: 'AppInsights'
+    target: shouldCreateAppInsights ? applicationInsights.outputs.id : existingApplicationInsightsResourceId
+    authType: 'ApiKey'
+    isSharedToAll: true
+    credentials: {
+      key: shouldCreateAppInsights ? applicationInsights.outputs.connectionString : existingApplicationInsightsConnectionString
+    }
+    metadata: {
+      ApiType: 'Azure'
+      ResourceId: shouldCreateAppInsights ? applicationInsights.outputs.id : existingApplicationInsightsResourceId
+    }
+  }
+}
+
+// Create additional connections from ai.yaml configuration
+module aiConnections './connection.bicep' = [for (connection, index) in connections: {
+  name: 'connection-${connection.name}'
+  params: {
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+    connectionConfig: connection
+    credentials: connectionCredentials[?connection.name] ?? {}
+  }
+}]
+
+// Azure AI User for the developer, scoped to the Foundry Project.
+// Project scope is sufficient for creating/running agents and calling models via the project endpoint.
+resource localUserAzureAIUserRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+  scope: aiAccount::project
+  name: guid(subscription().id, resourceGroup().id, principalId, '53ca6127-db72-4b80-b1b0-d745d6d5456d')
+  properties: {
+    principalId: principalId
+    principalType: principalType
+    roleDefinitionId: resourceId('Microsoft.Authorization/roleDefinitions', '53ca6127-db72-4b80-b1b0-d745d6d5456d')
+  }
+}
+
+
+// All connections are now created directly within their respective resource modules
+// using the centralized ./connection.bicep module
+
+// Storage module - deploy if storage connection is defined in ai.yaml
+module storage '../storage/storage.bicep' = if (hasStorageConnection) {
+  name: 'storage'
+  params: {
+    location: location
+    tags: tags
+    resourceName: 'st${resourceToken}'
+    connectionName: storageConnectionName
+    principalId: principalId
+    principalType: principalType
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+  }
+}
+
+// Azure Container Registry module - deploy if ACR connection is defined in ai.yaml
+module acr '../host/acr.bicep' = if (hasAcrConnection) {
+  name: 'acr'
+  params: {
+    location: location
+    tags: tags
+    resourceName: '${abbrs.containerRegistryRegistries}${resourceToken}'
+    connectionName: acrConnectionName
+    principalId: principalId
+    principalType: principalType
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+  }
+}
+
+// Connection for existing ACR - create if user provided an existing ACR resource ID but no existing connection
+module existingAcrConnection './connection.bicep' = if (hasExistingAcr && !hasExistingAcrConnection) {
+  name: 'existing-acr-connection'
+  params: {
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+    connectionConfig: {
+      name: 'acr-${resourceToken}'
+      category: 'ContainerRegistry'
+      target: existingContainerRegistryEndpoint
+      authType: 'ManagedIdentity'
+      isSharedToAll: true
+      metadata: {
+        ResourceId: existingContainerRegistryResourceId
+      }
+    }
+    credentials: {
+      clientId: aiAccount::project.identity.principalId
+      resourceId: existingContainerRegistryResourceId
+    }
+  }
+}
+
+// Extract resource group name from the existing ACR resource ID
+// Resource ID format: /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.ContainerRegistry/registries/{name}
+var existingAcrResourceGroup = hasExistingAcr ? split(existingContainerRegistryResourceId, '/')[4] : ''
+var existingAcrName = hasExistingAcr ? last(split(existingContainerRegistryResourceId, '/')) : ''
+
+// Grant AcrPull role to the AI project's managed identity on the existing ACR
+// This allows the hosted agents to pull images from the user-provided registry
+// Note: User must have permission to assign roles on the existing ACR (Owner or User Access Administrator)
+// Using a module allows scoping to a different resource group if the ACR isn't in the same RG
+// Skip if connection already exists (role assignment should already be in place)
+module existingAcrRoleAssignment './acr-role-assignment.bicep' = if (hasExistingAcr && !hasExistingAcrConnection) {
+  name: 'existing-acr-role-assignment'
+  scope: resourceGroup(existingAcrResourceGroup)
+  params: {
+    acrName: existingAcrName
+    acrResourceId: existingContainerRegistryResourceId
+    principalId: aiAccount::project.identity.principalId
+  }
+}
+
+// Bing Search grounding module - deploy if Bing connection is defined in ai.yaml or parameter is enabled
+module bingGrounding '../search/bing_grounding.bicep' = if (hasBingConnection) {
+  name: 'bing-grounding'
+  params: {
+    tags: tags
+    resourceName: 'bing-${resourceToken}'
+    connectionName: bingConnectionName
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+  }
+}
+
+// Bing Custom Search grounding module - deploy if custom Bing connection is defined in ai.yaml or parameter is enabled
+module bingCustomGrounding '../search/bing_custom_grounding.bicep' = if (hasBingCustomConnection) {
+  name: 'bing-custom-grounding'
+  params: {
+    tags: tags
+    resourceName: 'bingcustom-${resourceToken}'
+    connectionName: bingCustomConnectionName
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+  }
+}
+
+// Azure AI Search module - deploy if search connection is defined in ai.yaml
+module azureAiSearch '../search/azure_ai_search.bicep' = if (hasSearchConnection) {
+  name: 'azure-ai-search'
+  params: {
+    tags: tags
+    resourceName: 'search-${resourceToken}'
+    connectionName: searchConnectionName
+    storageAccountResourceId: hasStorageConnection ? storage!.outputs.storageAccountId : ''
+    containerName: 'knowledge'
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+    principalId: principalId
+    principalType: principalType
+    location: location
+  }
+}
+
+// Outputs
+output AZURE_AI_PROJECT_ENDPOINT string = aiAccount::project.properties.endpoints['AI Foundry API']
+output FOUNDRY_PROJECT_ENDPOINT string = aiAccount::project.properties.endpoints['AI Foundry API']
+output AZURE_OPENAI_ENDPOINT string = aiAccount.properties.endpoints['OpenAI Language Model Instance API']
+output aiServicesEndpoint string = aiAccount.properties.endpoint
+output accountId string = aiAccount.id
+output projectId string = aiAccount::project.id
+output aiServicesAccountName string = aiAccount.name
+output aiServicesProjectName string = aiAccount::project.name
+output aiServicesPrincipalId string = aiAccount.identity.principalId
+output projectName string = aiAccount::project.name
+output APPLICATIONINSIGHTS_CONNECTION_STRING string = shouldCreateAppInsights ? applicationInsights.outputs.connectionString : (hasExistingAppInsightsConnectionString ? existingApplicationInsightsConnectionString : '')
+output APPLICATIONINSIGHTS_RESOURCE_ID string = shouldCreateAppInsights ? applicationInsights.outputs.id : (hasExistingAppInsightsConnectionString ? existingApplicationInsightsResourceId : '')
+
+// Connection outputs from the connections array
+output connectionIds array = [for (connection, index) in (connections ?? []): {
+  name: aiConnections[index].outputs.connectionName
+  id: aiConnections[index].outputs.connectionId
+}]
+
+// Grouped dependent resources outputs
+output dependentResources object = {
+  registry: {
+    name: hasAcrConnection ? acr!.outputs.containerRegistryName : ''
+    loginServer: hasAcrConnection ? acr!.outputs.containerRegistryLoginServer : ((hasExistingAcr || hasExistingAcrConnection) ? existingContainerRegistryEndpoint : '')
+    connectionName: hasAcrConnection ? acr!.outputs.containerRegistryConnectionName : (hasExistingAcrConnection ? existingAcrConnectionName : (hasExistingAcr ? 'acr-${resourceToken}' : ''))
+  }
+  bing_grounding: {
+    name: (hasBingConnection) ? bingGrounding!.outputs.bingGroundingName : ''
+    connectionName: (hasBingConnection) ? bingGrounding!.outputs.bingGroundingConnectionName : ''
+    connectionId: (hasBingConnection) ? bingGrounding!.outputs.bingGroundingConnectionId : ''
+  }
+  bing_custom_grounding: {
+    name: (hasBingCustomConnection) ? bingCustomGrounding!.outputs.bingCustomGroundingName : ''
+    connectionName: (hasBingCustomConnection) ? bingCustomGrounding!.outputs.bingCustomGroundingConnectionName : ''
+    connectionId: (hasBingCustomConnection) ? bingCustomGrounding!.outputs.bingCustomGroundingConnectionId : ''
+  }
+  search: {
+    serviceName: hasSearchConnection ? azureAiSearch!.outputs.searchServiceName : ''
+    connectionName: hasSearchConnection ? azureAiSearch!.outputs.searchConnectionName : ''
+  }
+  storage: {
+    accountName: hasStorageConnection ? storage!.outputs.storageAccountName : ''
+    connectionName: hasStorageConnection ? storage!.outputs.storageConnectionName : ''
+  }
+}
+
+type deploymentsType = {
+  @description('Specify the name of cognitive service account deployment.')
+  name: string
+
+  @description('Required. Properties of Cognitive Services account deployment model.')
+  model: {
+    @description('Required. The name of Cognitive Services account deployment model.')
+    name: string
+
+    @description('Required. The format of Cognitive Services account deployment model.')
+    format: string
+
+    @description('Required. The version of Cognitive Services account deployment model.')
+    version: string
+  }
+
+  @description('The resource model definition representing SKU.')
+  sku: {
+    @description('Required. The name of the resource model definition representing SKU.')
+    name: string
+
+    @description('The capacity of the resource model definition representing SKU.')
+    capacity: int
+  }
+}[]?
+
+type dependentResourcesType = {
+  @description('The type of dependent resource to create')
+  resource: 'storage' | 'registry' | 'azure_ai_search' | 'bing_grounding' | 'bing_custom_grounding'
+  
+  @description('The connection name for this resource')
+  connectionName: string
+}[]
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/ai/connection.bicep b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/ai/connection.bicep
new file mode 100644
index 000000000000..a08726645243
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/ai/connection.bicep
@@ -0,0 +1,112 @@
+targetScope = 'resourceGroup'
+
+@description('AI Services account name')
+param aiServicesAccountName string
+
+@description('AI project name')
+param aiProjectName string
+
+// Connection configuration type definition
+type ConnectionConfig = {
+  @description('Name of the connection')
+  name: string
+
+  @description('Category of the connection (e.g., ContainerRegistry, AzureStorageAccount, CognitiveSearch, AzureOpenAI)')
+  category: string
+
+  @description('Target endpoint or URL for the connection')
+  target: string
+
+  @description('Authentication type')
+  authType: 'AAD' | 'AccessKey' | 'AccountKey' | 'AgenticIdentity' | 'ApiKey' | 'CustomKeys' | 'ManagedIdentity' | 'None' | 'OAuth2' | 'PAT' | 'SAS' | 'ServicePrincipal' | 'UsernamePassword' | 'UserEntraToken' | 'ProjectManagedIdentity'
+
+  @description('Whether the connection is shared to all users (optional, defaults to true)')
+  isSharedToAll: bool?
+
+  @description('Additional metadata for the connection (optional)')
+  metadata: object?
+
+  @description('Error message if the connection fails (optional)')
+  error: string?
+
+  @description('Expiry time for the connection (optional)')
+  expiryTime: string?
+
+  @description('Private endpoint requirement: Required, NotRequired, or NotApplicable (optional)')
+  peRequirement: ('NotApplicable' | 'NotRequired' | 'Required')?
+
+  @description('Private endpoint status: Active, Inactive, or NotApplicable (optional)')
+  peStatus: ('Active' | 'Inactive' | 'NotApplicable')?
+
+  @description('List of users to share the connection with (optional, alternative to isSharedToAll)')
+  sharedUserList: string[]?
+
+  @description('Whether to use workspace managed identity (optional)')
+  useWorkspaceManagedIdentity: bool?
+
+  @description('OAuth2 authorization endpoint URL (optional, OAuth2 authType only)')
+  authorizationUrl: string?
+
+  @description('OAuth2 token endpoint URL (optional, OAuth2 authType only)')
+  tokenUrl: string?
+
+  @description('OAuth2 refresh token endpoint URL (optional, OAuth2 authType only)')
+  refreshUrl: string?
+
+  @description('OAuth2 scopes to request (optional, OAuth2 authType only)')
+  scopes: string[]?
+
+  @description('Token audience for UserEntraToken / AgenticIdentity auth types (optional)')
+  audience: string?
+
+  @description('Managed connector name for OAuth2 managed connectors (optional)')
+  connectorName: string?
+}
+
+@description('Connection configuration')
+param connectionConfig ConnectionConfig
+
+@secure()
+@description('Credentials for the connection. Kept as a separate @secure parameter to prevent secrets from appearing in deployment logs. Shape depends on authType — e.g. { key: "..." } for ApiKey, { clientId: "...", clientSecret: "..." } for OAuth2/ServicePrincipal.')
+param credentials object = {}
+
+
+// Get reference to the AI Services account and project
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' existing = {
+  name: aiServicesAccountName
+
+  resource project 'projects' existing = {
+    name: aiProjectName
+  }
+}
+
+// Create the connection
+resource connection 'Microsoft.CognitiveServices/accounts/projects/connections@2025-04-01-preview' = {
+  parent: aiAccount::project
+  name: connectionConfig.name
+  properties: {
+    category: connectionConfig.category
+    target: connectionConfig.target
+    authType: connectionConfig.authType
+    isSharedToAll: connectionConfig.?isSharedToAll ?? true
+    credentials: !empty(credentials) ? credentials : null
+    metadata: connectionConfig.?metadata
+    // Only include if they appear in the connectionConfig
+    ...connectionConfig.?error != null ? { error: connectionConfig.?error  } : {}
+    ...connectionConfig.?expiryTime != null ? { expiryTime: connectionConfig.?expiryTime  } : {}
+    ...connectionConfig.?peRequirement != null ? { peRequirement: connectionConfig.?peRequirement  } : {}
+    ...connectionConfig.?peStatus != null ? { peStatus: connectionConfig.?peStatus  } : {}
+    ...connectionConfig.?sharedUserList != null ? { sharedUserList: connectionConfig.?sharedUserList  } : {}
+    ...connectionConfig.?useWorkspaceManagedIdentity != null ? { useWorkspaceManagedIdentity: connectionConfig.?useWorkspaceManagedIdentity  } : {}
+    ...connectionConfig.?authorizationUrl != null ? { authorizationUrl: connectionConfig.?authorizationUrl } : {}
+    ...connectionConfig.?tokenUrl != null ? { tokenUrl: connectionConfig.?tokenUrl } : {}
+    ...connectionConfig.?refreshUrl != null ? { refreshUrl: connectionConfig.?refreshUrl } : {}
+    ...connectionConfig.?scopes != null ? { scopes: connectionConfig.?scopes } : {}
+    ...connectionConfig.?audience != null ? { audience: connectionConfig.?audience } : {}
+    ...connectionConfig.?connectorName != null ? { connectorName: connectionConfig.?connectorName } : {}
+  }
+}
+
+// Outputs
+output connectionName string = connection.name
+output connectionId string = connection.id
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/ai/existing-ai-project.bicep b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/ai/existing-ai-project.bicep
new file mode 100644
index 000000000000..12e5a1217b2f
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/ai/existing-ai-project.bicep
@@ -0,0 +1,140 @@
+targetScope = 'resourceGroup'
+
+@description('Name of the existing AI Services account')
+param aiServicesAccountName string
+
+@description('Name of the existing AI Foundry project')
+param aiFoundryProjectName string
+
+@description('Existing ACR connection name (already set in the environment)')
+param existingAcrConnectionName string = ''
+
+@description('Existing container registry endpoint (already set in the environment)')
+param existingContainerRegistryEndpoint string = ''
+
+@description('Existing Application Insights connection string (already set in the environment)')
+param existingApplicationInsightsConnectionString string = ''
+
+@description('Existing Application Insights resource ID (already set in the environment)')
+param existingApplicationInsightsResourceId string = ''
+
+@description('Model deployments to create on the existing AI Services account')
+param deployments deploymentsType
+
+@description('List of connections to provision on the existing project')
+param connections array = []
+
+@secure()
+@description('Map of connection name to credentials object. Kept as @secure to prevent secrets from appearing in deployment logs. Example: { "my-conn": { "key": "secret" } }')
+param connectionCredentials object = {}
+
+// Reference the existing account and project — read-only except for the
+// additional connections provisioned below from the agent manifest.
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-06-01' existing = {
+  name: aiServicesAccountName
+
+  resource project 'projects' existing = {
+    name: aiFoundryProjectName
+  }
+}
+
+// Create model deployments on the existing AI Services account.
+// Uses @batchSize(1) to avoid concurrent deployment conflicts (same as ai-project.bicep).
+@batchSize(1)
+resource seqDeployments 'Microsoft.CognitiveServices/accounts/deployments@2025-06-01' = [
+  for dep in (deployments ?? []): {
+    parent: aiAccount
+    name: dep.name
+    properties: {
+      model: dep.model
+    }
+    sku: dep.sku
+  }
+]
+
+// Create additional connections from ai.yaml / agent manifest configuration on
+// the existing project. Mirrors the loop in ai-project.bicep so manifest-declared
+// connections are provisioned regardless of whether the project itself is new or
+// pre-existing.
+module aiConnections './connection.bicep' = [for (connection, index) in connections: {
+  name: 'existing-connection-${connection.name}'
+  params: {
+    aiServicesAccountName: aiAccount.name
+    aiProjectName: aiAccount::project.name
+    connectionConfig: connection
+    credentials: connectionCredentials[?connection.name] ?? {}
+  }
+}]
+
+// Outputs — same shape as ai-project.bicep so main.bicep can use either interchangeably
+output AZURE_AI_PROJECT_ENDPOINT string = aiAccount::project.properties.endpoints['AI Foundry API']
+output FOUNDRY_PROJECT_ENDPOINT string = aiAccount::project.properties.endpoints['AI Foundry API']
+output AZURE_OPENAI_ENDPOINT string = aiAccount.properties.endpoints['OpenAI Language Model Instance API']
+output aiServicesEndpoint string = aiAccount.properties.endpoint
+output accountId string = aiAccount.id
+output projectId string = aiAccount::project.id
+output aiServicesAccountName string = aiAccount.name
+output aiServicesProjectName string = aiAccount::project.name
+output aiServicesPrincipalId string = aiAccount.identity.principalId
+output projectName string = aiAccount::project.name
+output APPLICATIONINSIGHTS_CONNECTION_STRING string = existingApplicationInsightsConnectionString
+output APPLICATIONINSIGHTS_RESOURCE_ID string = existingApplicationInsightsResourceId
+
+// Empty connection outputs — these are already set in the azd environment from init
+// Connection outputs from the connections array (provisioned above)
+output connectionIds array = [for (connection, index) in (connections ?? []): {
+  name: aiConnections[index].outputs.connectionName
+  id: aiConnections[index].outputs.connectionId
+}]
+
+output dependentResources object = {
+  registry: {
+    name: ''
+    loginServer: existingContainerRegistryEndpoint
+    connectionName: existingAcrConnectionName
+  }
+  bing_grounding: {
+    name: ''
+    connectionName: ''
+    connectionId: ''
+  }
+  bing_custom_grounding: {
+    name: ''
+    connectionName: ''
+    connectionId: ''
+  }
+  search: {
+    serviceName: ''
+    connectionName: ''
+  }
+  storage: {
+    accountName: ''
+    connectionName: ''
+  }
+}
+
+type deploymentsType = {
+  @description('Specify the name of cognitive service account deployment.')
+  name: string
+
+  @description('Required. Properties of Cognitive Services account deployment model.')
+  model: {
+    @description('Required. The name of Cognitive Services account deployment model.')
+    name: string
+
+    @description('Required. The format of Cognitive Services account deployment model.')
+    format: string
+
+    @description('Required. The version of Cognitive Services account deployment model.')
+    version: string
+  }
+
+  @description('The resource model definition representing SKU.')
+  sku: {
+    @description('Required. The name of the resource model definition representing SKU.')
+    name: string
+
+    @description('The capacity of the resource model definition representing SKU.')
+    capacity: int
+  }
+}[]?
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/host/acr.bicep b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/host/acr.bicep
new file mode 100644
index 000000000000..f1893d8ff312
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/host/acr.bicep
@@ -0,0 +1,88 @@
+targetScope = 'resourceGroup'
+
+@description('The location used for all deployed resources')
+param location string = resourceGroup().location
+
+@description('Tags that will be applied to all resources')
+param tags object = {}
+
+@description('Resource name for the container registry')
+param resourceName string
+
+@description('Id of the user or app to assign application roles')
+param principalId string
+
+@description('Principal type of user or app')
+param principalType string
+
+@description('AI Services account name for the project parent')
+param aiServicesAccountName string = ''
+
+@description('AI project name for creating the connection')
+param aiProjectName string = ''
+
+@description('Name for the AI Foundry ACR connection')
+param connectionName string
+
+// Get reference to the AI Services account and project to access their managed identities
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' existing = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: aiServicesAccountName
+
+  resource aiProject 'projects' existing = {
+    name: aiProjectName
+  }
+}
+
+// Create the Container Registry
+module containerRegistry 'br/public:avm/res/container-registry/registry:0.1.1' = {
+  name: 'registry'
+  params: {
+    name: resourceName
+    location: location
+    tags: tags
+    publicNetworkAccess: 'Enabled'
+    roleAssignments:[
+      {
+        principalId: principalId
+        principalType: principalType
+        // Container Registry Tasks Contributor — build images with ACR tasks and push container images
+        roleDefinitionIdOrName: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'fb382eab-e894-4461-af04-94435c366c3f')
+      }
+      // TODO SEPARATELY
+      {
+        // the foundry project itself can pull from the ACR
+        principalId: aiAccount::aiProject.identity.principalId
+        principalType: 'ServicePrincipal'
+        roleDefinitionIdOrName: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '7f951dda-4ed3-4680-a7ca-43fe172d538d')
+      }
+    ]
+  }
+}
+
+// Create the ACR connection using the centralized connection module
+module acrConnection '../ai/connection.bicep' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: 'acr-connection-creation'
+  params: {
+    aiServicesAccountName: aiServicesAccountName
+    aiProjectName: aiProjectName
+    connectionConfig: {
+      name: connectionName
+      category: 'ContainerRegistry'
+      target: containerRegistry.outputs.loginServer
+      authType: 'ManagedIdentity'
+      isSharedToAll: true
+      metadata: {
+        ResourceId: containerRegistry.outputs.resourceId
+      }
+    }
+    credentials: {
+      clientId: aiAccount::aiProject.identity.principalId
+      resourceId: containerRegistry.outputs.resourceId
+    }
+  }
+}
+
+output containerRegistryName string = containerRegistry.outputs.name
+output containerRegistryLoginServer string = containerRegistry.outputs.loginServer
+output containerRegistryResourceId string = containerRegistry.outputs.resourceId
+output containerRegistryConnectionName string = acrConnection.outputs.connectionName
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/monitor/applicationinsights-dashboard.bicep b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/monitor/applicationinsights-dashboard.bicep
new file mode 100644
index 000000000000..d082e668ed9f
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/monitor/applicationinsights-dashboard.bicep
@@ -0,0 +1,1236 @@
+metadata description = 'Creates a dashboard for an Application Insights instance.'
+param name string
+param applicationInsightsName string
+param location string = resourceGroup().location
+param tags object = {}
+
+// 2020-09-01-preview because that is the latest valid version
+resource applicationInsightsDashboard 'Microsoft.Portal/dashboards@2020-09-01-preview' = {
+  name: name
+  location: location
+  tags: tags
+  properties: {
+    lenses: [
+      {
+        order: 0
+        parts: [
+          {
+            position: {
+              x: 0
+              y: 0
+              colSpan: 2
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'id'
+                  value: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                }
+                {
+                  name: 'Version'
+                  value: '1.0'
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/AspNetOverviewPinnedPart'
+              asset: {
+                idInputName: 'id'
+                type: 'ApplicationInsights'
+              }
+              defaultMenuItemId: 'overview'
+            }
+          }
+          {
+            position: {
+              x: 2
+              y: 0
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ComponentId'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'Version'
+                  value: '1.0'
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/ProactiveDetectionAsyncPart'
+              asset: {
+                idInputName: 'ComponentId'
+                type: 'ApplicationInsights'
+              }
+              defaultMenuItemId: 'ProactiveDetection'
+            }
+          }
+          {
+            position: {
+              x: 3
+              y: 0
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ComponentId'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'ResourceId'
+                  value: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/QuickPulseButtonSmallPart'
+              asset: {
+                idInputName: 'ComponentId'
+                type: 'ApplicationInsights'
+              }
+            }
+          }
+          {
+            position: {
+              x: 4
+              y: 0
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ComponentId'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'TimeContext'
+                  value: {
+                    durationMs: 86400000
+                    endTime: null
+                    createdTime: '2018-05-04T01:20:33.345Z'
+                    isInitialTime: true
+                    grain: 1
+                    useDashboardTimeRange: false
+                  }
+                }
+                {
+                  name: 'Version'
+                  value: '1.0'
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/AvailabilityNavButtonPart'
+              asset: {
+                idInputName: 'ComponentId'
+                type: 'ApplicationInsights'
+              }
+            }
+          }
+          {
+            position: {
+              x: 5
+              y: 0
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ComponentId'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'TimeContext'
+                  value: {
+                    durationMs: 86400000
+                    endTime: null
+                    createdTime: '2018-05-08T18:47:35.237Z'
+                    isInitialTime: true
+                    grain: 1
+                    useDashboardTimeRange: false
+                  }
+                }
+                {
+                  name: 'ConfigurationId'
+                  value: '78ce933e-e864-4b05-a27b-71fd55a6afad'
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/AppMapButtonPart'
+              asset: {
+                idInputName: 'ComponentId'
+                type: 'ApplicationInsights'
+              }
+            }
+          }
+          {
+            position: {
+              x: 0
+              y: 1
+              colSpan: 3
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: []
+              type: 'Extension/HubsExtension/PartType/MarkdownPart'
+              settings: {
+                content: {
+                  settings: {
+                    content: '# Usage'
+                    title: ''
+                    subtitle: ''
+                  }
+                }
+              }
+            }
+          }
+          {
+            position: {
+              x: 3
+              y: 1
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ComponentId'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'TimeContext'
+                  value: {
+                    durationMs: 86400000
+                    endTime: null
+                    createdTime: '2018-05-04T01:22:35.782Z'
+                    isInitialTime: true
+                    grain: 1
+                    useDashboardTimeRange: false
+                  }
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/UsageUsersOverviewPart'
+              asset: {
+                idInputName: 'ComponentId'
+                type: 'ApplicationInsights'
+              }
+            }
+          }
+          {
+            position: {
+              x: 4
+              y: 1
+              colSpan: 3
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: []
+              type: 'Extension/HubsExtension/PartType/MarkdownPart'
+              settings: {
+                content: {
+                  settings: {
+                    content: '# Reliability'
+                    title: ''
+                    subtitle: ''
+                  }
+                }
+              }
+            }
+          }
+          {
+            position: {
+              x: 7
+              y: 1
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ResourceId'
+                  value: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                }
+                {
+                  name: 'DataModel'
+                  value: {
+                    version: '1.0.0'
+                    timeContext: {
+                      durationMs: 86400000
+                      createdTime: '2018-05-04T23:42:40.072Z'
+                      isInitialTime: false
+                      grain: 1
+                      useDashboardTimeRange: false
+                    }
+                  }
+                  isOptional: true
+                }
+                {
+                  name: 'ConfigurationId'
+                  value: '8a02f7bf-ac0f-40e1-afe9-f0e72cfee77f'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/CuratedBladeFailuresPinnedPart'
+              isAdapter: true
+              asset: {
+                idInputName: 'ResourceId'
+                type: 'ApplicationInsights'
+              }
+              defaultMenuItemId: 'failures'
+            }
+          }
+          {
+            position: {
+              x: 8
+              y: 1
+              colSpan: 3
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: []
+              type: 'Extension/HubsExtension/PartType/MarkdownPart'
+              settings: {
+                content: {
+                  settings: {
+                    content: '# Responsiveness\r\n'
+                    title: ''
+                    subtitle: ''
+                  }
+                }
+              }
+            }
+          }
+          {
+            position: {
+              x: 11
+              y: 1
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ResourceId'
+                  value: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                }
+                {
+                  name: 'DataModel'
+                  value: {
+                    version: '1.0.0'
+                    timeContext: {
+                      durationMs: 86400000
+                      createdTime: '2018-05-04T23:43:37.804Z'
+                      isInitialTime: false
+                      grain: 1
+                      useDashboardTimeRange: false
+                    }
+                  }
+                  isOptional: true
+                }
+                {
+                  name: 'ConfigurationId'
+                  value: '2a8ede4f-2bee-4b9c-aed9-2db0e8a01865'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/CuratedBladePerformancePinnedPart'
+              isAdapter: true
+              asset: {
+                idInputName: 'ResourceId'
+                type: 'ApplicationInsights'
+              }
+              defaultMenuItemId: 'performance'
+            }
+          }
+          {
+            position: {
+              x: 12
+              y: 1
+              colSpan: 3
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: []
+              type: 'Extension/HubsExtension/PartType/MarkdownPart'
+              settings: {
+                content: {
+                  settings: {
+                    content: '# Browser'
+                    title: ''
+                    subtitle: ''
+                  }
+                }
+              }
+            }
+          }
+          {
+            position: {
+              x: 15
+              y: 1
+              colSpan: 1
+              rowSpan: 1
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'ComponentId'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'MetricsExplorerJsonDefinitionId'
+                  value: 'BrowserPerformanceTimelineMetrics'
+                }
+                {
+                  name: 'TimeContext'
+                  value: {
+                    durationMs: 86400000
+                    createdTime: '2018-05-08T12:16:27.534Z'
+                    isInitialTime: false
+                    grain: 1
+                    useDashboardTimeRange: false
+                  }
+                }
+                {
+                  name: 'CurrentFilter'
+                  value: {
+                    eventTypes: [
+                      4
+                      1
+                      3
+                      5
+                      2
+                      6
+                      13
+                    ]
+                    typeFacets: {}
+                    isPermissive: false
+                  }
+                }
+                {
+                  name: 'id'
+                  value: {
+                    Name: applicationInsights.name
+                    SubscriptionId: subscription().subscriptionId
+                    ResourceGroup: resourceGroup().name
+                  }
+                }
+                {
+                  name: 'Version'
+                  value: '1.0'
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/AppInsightsExtension/PartType/MetricsExplorerBladePinnedPart'
+              asset: {
+                idInputName: 'ComponentId'
+                type: 'ApplicationInsights'
+              }
+              defaultMenuItemId: 'browser'
+            }
+          }
+          {
+            position: {
+              x: 0
+              y: 2
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'sessions/count'
+                          aggregationType: 5
+                          namespace: 'microsoft.insights/components/kusto'
+                          metricVisualization: {
+                            displayName: 'Sessions'
+                            color: '#47BDF5'
+                          }
+                        }
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'users/count'
+                          aggregationType: 5
+                          namespace: 'microsoft.insights/components/kusto'
+                          metricVisualization: {
+                            displayName: 'Users'
+                            color: '#7E58FF'
+                          }
+                        }
+                      ]
+                      title: 'Unique sessions and users'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                      openBladeOnClick: {
+                        openBlade: true
+                        destinationBlade: {
+                          extensionName: 'HubsExtension'
+                          bladeName: 'ResourceMenuBlade'
+                          parameters: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                            menuid: 'segmentationUsers'
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 4
+              y: 2
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'requests/failed'
+                          aggregationType: 7
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Failed requests'
+                            color: '#EC008C'
+                          }
+                        }
+                      ]
+                      title: 'Failed requests'
+                      visualization: {
+                        chartType: 3
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                      openBladeOnClick: {
+                        openBlade: true
+                        destinationBlade: {
+                          extensionName: 'HubsExtension'
+                          bladeName: 'ResourceMenuBlade'
+                          parameters: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                            menuid: 'failures'
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 8
+              y: 2
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'requests/duration'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Server response time'
+                            color: '#00BCF2'
+                          }
+                        }
+                      ]
+                      title: 'Server response time'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                      openBladeOnClick: {
+                        openBlade: true
+                        destinationBlade: {
+                          extensionName: 'HubsExtension'
+                          bladeName: 'ResourceMenuBlade'
+                          parameters: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                            menuid: 'performance'
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 12
+              y: 2
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'browserTimings/networkDuration'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Page load network connect time'
+                            color: '#7E58FF'
+                          }
+                        }
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'browserTimings/processingDuration'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Client processing time'
+                            color: '#44F1C8'
+                          }
+                        }
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'browserTimings/sendDuration'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Send request time'
+                            color: '#EB9371'
+                          }
+                        }
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'browserTimings/receiveDuration'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Receiving response time'
+                            color: '#0672F1'
+                          }
+                        }
+                      ]
+                      title: 'Average page load time breakdown'
+                      visualization: {
+                        chartType: 3
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 0
+              y: 5
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'availabilityResults/availabilityPercentage'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Availability'
+                            color: '#47BDF5'
+                          }
+                        }
+                      ]
+                      title: 'Average availability'
+                      visualization: {
+                        chartType: 3
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                      openBladeOnClick: {
+                        openBlade: true
+                        destinationBlade: {
+                          extensionName: 'HubsExtension'
+                          bladeName: 'ResourceMenuBlade'
+                          parameters: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                            menuid: 'availability'
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 4
+              y: 5
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'exceptions/server'
+                          aggregationType: 7
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Server exceptions'
+                            color: '#47BDF5'
+                          }
+                        }
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'dependencies/failed'
+                          aggregationType: 7
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Dependency failures'
+                            color: '#7E58FF'
+                          }
+                        }
+                      ]
+                      title: 'Server exceptions and Dependency failures'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 8
+              y: 5
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'performanceCounters/processorCpuPercentage'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Processor time'
+                            color: '#47BDF5'
+                          }
+                        }
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'performanceCounters/processCpuPercentage'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Process CPU'
+                            color: '#7E58FF'
+                          }
+                        }
+                      ]
+                      title: 'Average processor and process CPU utilization'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 12
+              y: 5
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'exceptions/browser'
+                          aggregationType: 7
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Browser exceptions'
+                            color: '#47BDF5'
+                          }
+                        }
+                      ]
+                      title: 'Browser exceptions'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 0
+              y: 8
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'availabilityResults/count'
+                          aggregationType: 7
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Availability test results count'
+                            color: '#47BDF5'
+                          }
+                        }
+                      ]
+                      title: 'Availability test results count'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 4
+              y: 8
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'performanceCounters/processIOBytesPerSecond'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Process IO rate'
+                            color: '#47BDF5'
+                          }
+                        }
+                      ]
+                      title: 'Average process I/O rate'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+          {
+            position: {
+              x: 8
+              y: 8
+              colSpan: 4
+              rowSpan: 3
+            }
+            metadata: {
+              inputs: [
+                {
+                  name: 'options'
+                  value: {
+                    chart: {
+                      metrics: [
+                        {
+                          resourceMetadata: {
+                            id: '/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Insights/components/${applicationInsights.name}'
+                          }
+                          name: 'performanceCounters/memoryAvailableBytes'
+                          aggregationType: 4
+                          namespace: 'microsoft.insights/components'
+                          metricVisualization: {
+                            displayName: 'Available memory'
+                            color: '#47BDF5'
+                          }
+                        }
+                      ]
+                      title: 'Average available memory'
+                      visualization: {
+                        chartType: 2
+                        legendVisualization: {
+                          isVisible: true
+                          position: 2
+                          hideSubtitle: false
+                        }
+                        axisVisualization: {
+                          x: {
+                            isVisible: true
+                            axisType: 2
+                          }
+                          y: {
+                            isVisible: true
+                            axisType: 1
+                          }
+                        }
+                      }
+                    }
+                  }
+                }
+                {
+                  name: 'sharedTimeRange'
+                  isOptional: true
+                }
+              ]
+              #disable-next-line BCP036
+              type: 'Extension/HubsExtension/PartType/MonitorChartPart'
+              settings: {}
+            }
+          }
+        ]
+      }
+    ]
+  }
+}
+
+resource applicationInsights 'Microsoft.Insights/components@2020-02-02' existing = {
+  name: applicationInsightsName
+}
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/monitor/applicationinsights.bicep b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/monitor/applicationinsights.bicep
new file mode 100644
index 000000000000..73240d1b1c9a
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/monitor/applicationinsights.bicep
@@ -0,0 +1,47 @@
+metadata description = 'Creates an Application Insights instance based on an existing Log Analytics workspace.'
+param name string
+param dashboardName string = ''
+param location string = resourceGroup().location
+param tags object = {}
+param logAnalyticsWorkspaceId string
+
+@description('Optional. Principal ID of the Foundry Project managed identity to grant Log Analytics Reader.')
+param projectMIPrincipalId string = ''
+
+resource applicationInsights 'Microsoft.Insights/components@2020-02-02' = {
+  name: name
+  location: location
+  tags: tags
+  kind: 'web'
+  properties: {
+    Application_Type: 'web'
+    WorkspaceResourceId: logAnalyticsWorkspaceId
+  }
+}
+
+module applicationInsightsDashboard 'applicationinsights-dashboard.bicep' = if (!empty(dashboardName)) {
+  name: 'application-insights-dashboard'
+  params: {
+    name: dashboardName
+    location: location
+    applicationInsightsName: applicationInsights.name
+  }
+}
+
+// Log Analytics Reader for the Foundry Project managed identity.
+// Required for running evaluations on traces generated by agents.
+resource logAnalyticsReaderRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(projectMIPrincipalId)) {
+  scope: applicationInsights
+  name: guid(applicationInsights.id, projectMIPrincipalId, '73c42c96-874c-492b-b04d-ab87d138a893')
+  properties: {
+    principalId: projectMIPrincipalId
+    principalType: 'ServicePrincipal'
+    // Log Analytics Reader
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '73c42c96-874c-492b-b04d-ab87d138a893')
+  }
+}
+
+output connectionString string = applicationInsights.properties.ConnectionString
+output id string = applicationInsights.id
+output instrumentationKey string = applicationInsights.properties.InstrumentationKey
+output name string = applicationInsights.name
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/monitor/loganalytics.bicep b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/monitor/loganalytics.bicep
new file mode 100644
index 000000000000..33f9dc29443a
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/monitor/loganalytics.bicep
@@ -0,0 +1,22 @@
+metadata description = 'Creates a Log Analytics workspace.'
+param name string
+param location string = resourceGroup().location
+param tags object = {}
+
+resource logAnalytics 'Microsoft.OperationalInsights/workspaces@2021-12-01-preview' = {
+  name: name
+  location: location
+  tags: tags
+  properties: any({
+    retentionInDays: 30
+    features: {
+      searchVersion: 1
+    }
+    sku: {
+      name: 'PerGB2018'
+    }
+  })
+}
+
+output id string = logAnalytics.id
+output name string = logAnalytics.name
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/search/azure_ai_search.bicep b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/search/azure_ai_search.bicep
new file mode 100644
index 000000000000..7bb8e6350025
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/search/azure_ai_search.bicep
@@ -0,0 +1,211 @@
+targetScope = 'resourceGroup'
+
+@description('Tags that will be applied to all resources')
+param tags object = {}
+
+@description('Azure Search resource name')
+param resourceName string
+
+@description('Azure Search SKU name')
+param azureSearchSkuName string = 'basic'
+
+@description('Azure storage account resource ID')
+param storageAccountResourceId string
+
+@description('container name')
+param containerName string = 'knowledgebase'
+
+@description('AI Services account name for the project parent')
+param aiServicesAccountName string = ''
+
+@description('AI project name for creating the connection')
+param aiProjectName string = ''
+
+@description('Id of the user or app to assign application roles')
+param principalId string
+
+@description('Principal type of user or app')
+param principalType string
+
+@description('Name for the AI Foundry search connection')
+param connectionName string
+
+@description('Location for all resources')
+param location string = resourceGroup().location
+
+// Get reference to the AI Services account and project to access their managed identities
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' existing = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: aiServicesAccountName
+
+  resource aiProject 'projects' existing = {
+    name: aiProjectName
+  }
+}
+
+// Azure Search Service
+resource searchService 'Microsoft.Search/searchServices@2024-06-01-preview' = {
+  name: resourceName
+  location: location
+  tags: tags
+  sku: {
+    name: azureSearchSkuName
+  }
+  identity: {
+    type: 'SystemAssigned'
+  }
+  properties: {
+    replicaCount: 1
+    partitionCount: 1
+    hostingMode: 'default'
+    authOptions: {
+      aadOrApiKey: {
+        aadAuthFailureMode: 'http401WithBearerChallenge'
+      }
+    }
+    disableLocalAuth: false
+    encryptionWithCmk: {
+      enforcement: 'Unspecified'
+    }
+    publicNetworkAccess: 'enabled'
+  }
+}
+
+// Reference to existing Storage Account
+resource storageAccount 'Microsoft.Storage/storageAccounts@2023-05-01' existing = {
+  name: last(split(storageAccountResourceId, '/'))
+}
+
+// Reference to existing Blob Service
+resource blobService 'Microsoft.Storage/storageAccounts/blobServices@2023-05-01' existing = {
+  parent: storageAccount
+  name: 'default'
+}
+
+// Storage Container (create if it doesn't exist)
+resource storageContainer 'Microsoft.Storage/storageAccounts/blobServices/containers@2023-05-01' = {
+  parent: blobService
+  name: containerName
+  properties: {
+    publicAccess: 'None'
+  }
+}
+
+// RBAC Assignments
+
+// Search needs to read from Storage
+resource searchToStorageRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+  name: guid(storageAccount.id, searchService.id, 'Storage Blob Data Reader', uniqueString(deployment().name))
+  scope: storageAccount
+  properties: {
+    // GOOD
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '2a2b9908-6ea1-4ae2-8e65-a410df84e7d1') // Storage Blob Data Reader
+    principalId: searchService.identity.principalId
+    principalType: 'ServicePrincipal'
+  }
+}
+
+// Search needs OpenAI access (AI Services account)
+resource searchToAIServicesRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(aiServicesAccountName)) {
+  name: guid(aiServicesAccountName, searchService.id, 'Cognitive Services OpenAI User', uniqueString(deployment().name))
+  properties: {
+    // GOOD
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '5e0bd9bd-7b93-4f28-af87-19fc36ad61bd') // Cognitive Services OpenAI User
+    principalId: searchService.identity.principalId
+    principalType: 'ServicePrincipal'
+  }
+}
+
+// AI Project needs Search access - Service Contributor
+resource aiServicesToSearchServiceRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: guid(searchService.id, aiServicesAccountName, aiProjectName, 'Search Service Contributor', uniqueString(deployment().name))
+  scope: searchService
+  properties: {
+    // GOOD
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '7ca78c08-252a-4471-8644-bb5ff32d4ba0') // Search Service Contributor
+    principalId: aiAccount::aiProject.identity.principalId
+    principalType: 'ServicePrincipal'
+  }
+}
+
+// AI Project needs Search access - Index Data Contributor
+resource aiServicesToSearchDataRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: guid(searchService.id, aiServicesAccountName, aiProjectName, 'Search Index Data Contributor', uniqueString(deployment().name))
+  scope: searchService
+  properties: {
+    // GOOD
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '8ebe5a00-799e-43f5-93ac-243d3dce84a7') // Search Index Data Contributor
+    principalId: aiAccount::aiProject.identity.principalId
+    principalType: 'ServicePrincipal'
+  }
+}
+
+// User permissions - Search Index Data Contributor
+resource userToSearchRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+  name: guid(searchService.id, principalId, 'Search Index Data Contributor', uniqueString(deployment().name))
+  scope: searchService
+  properties: {
+    // GOOD
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '8ebe5a00-799e-43f5-93ac-243d3dce84a7') // Search Index Data Contributor
+    principalId: principalId
+    principalType: principalType
+  }
+}
+
+// // User permissions - Storage Blob Data Contributor
+// resource userToStorageRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+//   name: guid(storageAccount.id, principalId, 'Storage Blob Data Contributor', uniqueString(deployment().name))
+//   scope: storageAccount
+//   properties: {
+//     roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe') // Storage Blob Data Contributor
+//     principalId: principalId
+//     principalType: principalType
+//   }
+// }
+
+// // Project needs Search access - Index Data Contributor
+// resource projectToSearchRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+//   name: guid(searchService.id, aiProjectName, 'Search Index Data Contributor', uniqueString(deployment().name))
+//   scope: searchService
+//   properties: {
+//     roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '8ebe5a00-799e-43f5-93ac-243d3dce84a7') // Search Index Data Contributor
+//     principalId: aiAccountPrincipalId // Using AI account principal ID as project identity
+//     principalType: 'ServicePrincipal'
+//   }
+// }
+
+// Create the AI Search connection using the centralized connection module
+module aiSearchConnection '../ai/connection.bicep' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: 'ai-search-connection-creation'
+  params: {
+    aiServicesAccountName: aiServicesAccountName
+    aiProjectName: aiProjectName
+    connectionConfig: {
+      name: connectionName
+      category: 'CognitiveSearch'
+      target: 'https://${searchService.name}.search.windows.net'
+      authType: 'AAD'
+      isSharedToAll: true
+      metadata: {
+        ApiVersion: '2024-07-01'
+        ResourceId: searchService.id
+        ApiType: 'Azure'
+        type: 'azure_ai_search'
+      }
+    }
+  }
+  dependsOn: [
+    aiServicesToSearchDataRoleAssignment
+  ]
+}
+
+// Outputs
+output searchServiceName string = searchService.name
+output searchServiceId string = searchService.id
+output searchServicePrincipalId string = searchService.identity.principalId
+output storageAccountName string = storageAccount.name
+output storageAccountId string = storageAccount.id
+output containerName string = storageContainer.name
+output storageAccountPrincipalId string = storageAccount.identity.principalId
+output searchConnectionName string = (!empty(aiServicesAccountName) && !empty(aiProjectName)) ? aiSearchConnection!.outputs.connectionName : ''
+output searchConnectionId string = (!empty(aiServicesAccountName) && !empty(aiProjectName)) ? aiSearchConnection!.outputs.connectionId : ''
+
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/search/bing_custom_grounding.bicep b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/search/bing_custom_grounding.bicep
new file mode 100644
index 000000000000..1fddea079e2e
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/search/bing_custom_grounding.bicep
@@ -0,0 +1,84 @@
+targetScope = 'resourceGroup'
+
+@description('Tags that will be applied to all resources')
+param tags object = {}
+
+@description('Bing custom grounding resource name')
+param resourceName string
+
+@description('AI Services account name for the project parent')
+param aiServicesAccountName string = ''
+
+@description('AI project name for creating the connection')
+param aiProjectName string = ''
+
+@description('Name for the AI Foundry Bing Custom Search connection')
+param connectionName string
+
+// Get reference to the AI Services account and project to access their managed identities
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' existing = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: aiServicesAccountName
+
+  resource aiProject 'projects' existing = {
+    name: aiProjectName
+  }
+}
+
+// Bing Search resource for grounding capability
+resource bingCustomSearch 'Microsoft.Bing/accounts@2020-06-10' = {
+  name: resourceName
+  location: 'global'
+  tags: tags
+  sku: {
+    name: 'G1'
+  }
+  properties: {
+    statisticsEnabled: false
+  }
+  kind: 'Bing.CustomGrounding'
+}
+
+// Role assignment to allow AI project to use Bing Search
+resource bingCustomSearchRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  scope: bingCustomSearch
+  name: guid(subscription().id, resourceGroup().id, 'bing-search-role', aiServicesAccountName, aiProjectName)
+  properties: {
+    principalId: aiAccount::aiProject.identity.principalId
+    principalType: 'ServicePrincipal'
+    roleDefinitionId: resourceId('Microsoft.Authorization/roleDefinitions', 'a97b65f3-24c7-4388-baec-2e87135dc908') // Cognitive Services User
+  }
+}
+
+// Create the Bing Custom Search connection using the centralized connection module
+module aiSearchConnection '../ai/connection.bicep' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: 'bing-custom-search-connection-creation'
+  params: {
+    aiServicesAccountName: aiServicesAccountName
+    aiProjectName: aiProjectName
+    connectionConfig: {
+      name: connectionName
+      category: 'GroundingWithCustomSearch'
+      target: bingCustomSearch.properties.endpoint
+      authType: 'ApiKey'
+      isSharedToAll: true
+      metadata: {
+        Location: 'global'
+        ResourceId: bingCustomSearch.id
+        ApiType: 'Azure'
+        type: 'bing_custom_search'
+      }
+    }
+    credentials: {
+      key: bingCustomSearch.listKeys().key1
+    }
+  }
+  dependsOn: [
+    bingCustomSearchRoleAssignment
+  ]
+}
+
+// Outputs
+output bingCustomGroundingName string = bingCustomSearch.name
+output bingCustomGroundingConnectionName string = aiSearchConnection.outputs.connectionName
+output bingCustomGroundingResourceId string = bingCustomSearch.id
+output bingCustomGroundingConnectionId string = aiSearchConnection.outputs.connectionId
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/search/bing_grounding.bicep b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/search/bing_grounding.bicep
new file mode 100644
index 000000000000..20ea5e9f160a
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/search/bing_grounding.bicep
@@ -0,0 +1,83 @@
+targetScope = 'resourceGroup'
+
+@description('Tags that will be applied to all resources')
+param tags object = {}
+
+@description('Bing grounding resource name')
+param resourceName string
+
+@description('AI Services account name for the project parent')
+param aiServicesAccountName string = ''
+
+@description('AI project name for creating the connection')
+param aiProjectName string = ''
+
+@description('Name for the AI Foundry Bing Search connection')
+param connectionName string
+
+// Get reference to the AI Services account and project to access their managed identities
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' existing = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: aiServicesAccountName
+
+  resource aiProject 'projects' existing = {
+    name: aiProjectName
+  }
+}
+
+// Bing Search resource for grounding capability
+resource bingSearch 'Microsoft.Bing/accounts@2020-06-10' = {
+  name: resourceName
+  location: 'global'
+  tags: tags
+  sku: {
+    name: 'G1'
+  }
+  properties: {
+    statisticsEnabled: false
+  }
+  kind: 'Bing.Grounding'
+}
+
+// Role assignment to allow AI project to use Bing Search
+resource bingSearchRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  scope: bingSearch
+  name: guid(subscription().id, resourceGroup().id, 'bing-search-role', aiServicesAccountName, aiProjectName)
+  properties: {
+    principalId: aiAccount::aiProject.identity.principalId
+    principalType: 'ServicePrincipal'
+    roleDefinitionId: resourceId('Microsoft.Authorization/roleDefinitions', 'a97b65f3-24c7-4388-baec-2e87135dc908') // Cognitive Services User
+  }
+}
+
+// Create the Bing Search connection using the centralized connection module
+module bingSearchConnection '../ai/connection.bicep' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: 'bing-search-connection-creation'
+  params: {
+    aiServicesAccountName: aiServicesAccountName
+    aiProjectName: aiProjectName
+    connectionConfig: {
+      name: connectionName
+      category: 'GroundingWithBingSearch'
+      target: bingSearch.properties.endpoint
+      authType: 'ApiKey'
+      isSharedToAll: true
+      metadata: {
+        Location: 'global'
+        ResourceId: bingSearch.id
+        ApiType: 'Azure'
+        type: 'bing_grounding'
+      }
+    }
+    credentials: {
+      key: bingSearch.listKeys().key1
+    }
+  }
+  dependsOn: [
+    bingSearchRoleAssignment
+  ]
+}
+
+output bingGroundingName string = bingSearch.name
+output bingGroundingConnectionName string = bingSearchConnection.outputs.connectionName
+output bingGroundingResourceId string = bingSearch.id
+output bingGroundingConnectionId string = bingSearchConnection.outputs.connectionId
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/storage/storage.bicep b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/storage/storage.bicep
new file mode 100644
index 000000000000..18d9535dcd0b
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/core/storage/storage.bicep
@@ -0,0 +1,113 @@
+targetScope = 'resourceGroup'
+
+@description('The location used for all deployed resources')
+param location string = resourceGroup().location
+
+@description('Tags that will be applied to all resources')
+param tags object = {}
+
+@description('Storage account resource name')
+param resourceName string
+
+@description('Id of the user or app to assign application roles')
+param principalId string
+
+@description('Principal type of user or app')
+param principalType string
+
+@description('AI Services account name for the project parent')
+param aiServicesAccountName string = ''
+
+@description('AI project name for creating the connection')
+param aiProjectName string = ''
+
+@description('Name for the AI Foundry storage connection')
+param connectionName string
+
+// Storage Account for the AI Services account
+resource storageAccount 'Microsoft.Storage/storageAccounts@2023-05-01' = {
+  name: resourceName
+  location: location
+  tags: tags
+  sku: {
+    name: 'Standard_LRS'
+  }
+  kind: 'StorageV2'
+  identity: {
+    type: 'SystemAssigned'
+  }
+  properties: {
+    supportsHttpsTrafficOnly: true
+    allowBlobPublicAccess: false
+    minimumTlsVersion: 'TLS1_2'
+    accessTier: 'Hot'
+    encryption: {
+      services: {
+        blob: {
+          enabled: true
+        }
+        file: {
+          enabled: true
+        }
+      }
+      keySource: 'Microsoft.Storage'
+    }
+  }
+}
+
+// Get reference to the AI Services account and project to access their managed identities
+resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-04-01-preview' existing = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: aiServicesAccountName
+
+  resource aiProject 'projects' existing = {
+    name: aiProjectName
+  }
+}
+
+// Role assignment for AI Services to access the storage account
+resource storageRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: guid(storageAccount.id, aiAccount.id, 'ai-storage-contributor')
+  scope: storageAccount
+  properties: {
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe') // Storage Blob Data Contributor
+    principalId: aiAccount::aiProject.identity.principalId
+    principalType: 'ServicePrincipal'
+  }
+}
+
+// User permissions - Storage Blob Data Contributor
+resource userStorageRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
+  name: guid(storageAccount.id, principalId, 'Storage Blob Data Contributor')
+  scope: storageAccount
+  properties: {
+    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe') // Storage Blob Data Contributor
+    principalId: principalId
+    principalType: principalType
+  }
+}
+
+// Create the storage connection using the centralized connection module
+module storageConnection '../ai/connection.bicep' = if (!empty(aiServicesAccountName) && !empty(aiProjectName)) {
+  name: 'storage-connection-creation'
+  params: {
+    aiServicesAccountName: aiServicesAccountName
+    aiProjectName: aiProjectName
+    connectionConfig: {
+      name: connectionName
+      category: 'AzureStorageAccount'
+      target: storageAccount.properties.primaryEndpoints.blob
+      authType: 'AAD'
+      isSharedToAll: true
+      metadata: {
+        ApiType: 'Azure'
+        ResourceId: storageAccount.id
+        location: storageAccount.location
+      }
+    }
+  }
+}
+
+output storageAccountName string = storageAccount.name
+output storageAccountId string = storageAccount.id
+output storageAccountPrincipalId string = storageAccount.identity.principalId
+output storageConnectionName string = storageConnection.outputs.connectionName
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/main.bicep b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/main.bicep
new file mode 100644
index 000000000000..ed4572c16225
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/main.bicep
@@ -0,0 +1,248 @@
+targetScope = 'subscription'
+// targetScope = 'resourceGroup'
+
+@minLength(1)
+@maxLength(64)
+@description('Name of the environment that can be used as part of naming resource convention')
+param environmentName string
+
+@minLength(1)
+@maxLength(90)
+@description('Name of the resource group to use or create')
+param resourceGroupName string = 'rg-${environmentName}'
+
+// Restricted locations to match list from
+// https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/responses?tabs=python-key#region-availability
+@minLength(1)
+@description('Primary location for all resources')
+@allowed([
+  'australiaeast'
+  'brazilsouth'
+  'canadacentral'
+  'canadaeast'
+  'eastus'
+  'eastus2'
+  'francecentral'
+  'germanywestcentral'
+  'italynorth'
+  'japaneast'
+  'koreacentral'
+  'northcentralus'
+  'norwayeast'
+  'polandcentral'
+  'southafricanorth'
+  'southcentralus'
+  'southeastasia'
+  'southindia'
+  'spaincentral'
+  'swedencentral'
+  'switzerlandnorth'
+  'uaenorth'
+  'uksouth'
+  'westus'
+  'westus2'
+  'westus3'
+])
+param location string
+
+param aiDeploymentsLocation string = location
+
+@description('Id of the user or app to assign application roles')
+param principalId string
+
+@description('Principal type of user or app')
+param principalType string
+
+@description('Optional salt to diversify resource names across project recreations')
+param resourceTokenSalt string = ''
+
+@description('Optional. Name of an existing AI Services account within the resource group. If not provided, a new one will be created.')
+param aiFoundryResourceName string = ''
+
+@description('Optional. Name of the AI Foundry project. If not provided, a default name will be used.')
+param aiFoundryProjectName string = 'ai-project-${environmentName}'
+
+@description('List of model deployments')
+param aiProjectDeploymentsJson string = '[]'
+
+@description('List of connections')
+param aiProjectConnectionsJson string = '[]'
+
+@secure()
+@description('JSON map of connection name to credentials object. Example: {"my-conn":{"key":"secret"}}')
+param aiProjectConnectionCredentialsJson string = '{}'
+
+@description('List of resources to create and connect to the AI project')
+param aiProjectDependentResourcesJson string = '[]'
+
+var aiProjectDeployments = json(aiProjectDeploymentsJson)
+var aiProjectConnections = json(aiProjectConnectionsJson)
+var aiProjectConnectionCreds = json(aiProjectConnectionCredentialsJson)
+var aiProjectDependentResources = json(aiProjectDependentResourcesJson)
+
+@description('Enable hosted agent deployment')
+param enableHostedAgents bool
+
+@description('Enable the capability host for supporting BYO storage of agent conversations. When false and hosted agents are enabled, the capability host is not created.')
+param enableCapabilityHost bool
+
+@description('Enable monitoring for the AI project')
+param enableMonitoring bool
+
+@description('When true, skip Foundry project/role/connection provisioning and reference the existing project read-only. Use when pointing at an existing Foundry project via --project-id.')
+param useExistingAiProject bool = false
+
+@description('Optional. Existing container registry resource ID. If provided, no new ACR will be created and a connection to this ACR will be established.')
+param existingContainerRegistryResourceId string = ''
+
+@description('Optional. Existing container registry endpoint (login server). Required if existingContainerRegistryResourceId is provided.')
+param existingContainerRegistryEndpoint string = ''
+
+@description('Optional. Name of an existing ACR connection on the Foundry project. If provided, no new ACR or connection will be created.')
+param existingAcrConnectionName string = ''
+
+@description('Optional. Skip ACR creation entirely (e.g. for code-deploy scenarios where no container registry is needed). Defaults to false for backward compatibility.')
+param skipAcr bool = false
+
+@description('Optional. Existing Application Insights connection string. If provided, a connection will be created but no new App Insights resource.')
+param existingApplicationInsightsConnectionString string = ''
+
+@description('Optional. Existing Application Insights resource ID. Used for connection metadata when providing an existing App Insights.')
+param existingApplicationInsightsResourceId string = ''
+
+@description('Optional. Name of an existing Application Insights connection on the Foundry project. If provided, no new App Insights or connection will be created.')
+param existingAppInsightsConnectionName string = ''
+
+// Tags that should be applied to all resources.
+// 
+// Note that 'azd-service-name' tags should be applied separately to service host resources.
+// Example usage:
+//   tags: union(tags, { 'azd-service-name': <service name in azure.yaml> })
+var tags = {
+  'azd-env-name': environmentName
+}
+
+// Check if resource group exists and create it if it doesn't
+resource rg 'Microsoft.Resources/resourceGroups@2021-04-01' = {
+  name: resourceGroupName
+  location: location
+  tags: tags
+}
+
+// Build dependent resources array conditionally
+// Check if ACR already exists in the user-provided array to avoid duplicates
+// Also skip if user provided an existing container registry endpoint or connection name
+var hasAcr = contains(map(aiProjectDependentResources, r => r.resource), 'registry')
+var shouldCreateAcr = !skipAcr && enableHostedAgents && !hasAcr && empty(existingContainerRegistryResourceId) && empty(existingAcrConnectionName)
+var dependentResources = shouldCreateAcr ? union(aiProjectDependentResources, [
+  {
+    resource: 'registry'
+    connectionName: 'acr-${uniqueString(subscription().id, resourceGroupName, location)}'
+  }
+]) : aiProjectDependentResources
+
+// AI Project module — only when creating new resources
+module aiProject 'core/ai/ai-project.bicep' = if (!useExistingAiProject) {
+  scope: rg
+  name: 'ai-project'
+  params: {
+    tags: tags
+    location: aiDeploymentsLocation
+    aiFoundryProjectName: aiFoundryProjectName
+    principalId: principalId
+    principalType: principalType
+    existingAiAccountName: aiFoundryResourceName
+    deployments: aiProjectDeployments
+    connections: aiProjectConnections
+    connectionCredentials: aiProjectConnectionCreds
+    additionalDependentResources: dependentResources
+    enableMonitoring: enableMonitoring
+    enableHostedAgents: enableHostedAgents
+    enableCapabilityHost: enableCapabilityHost
+    existingContainerRegistryResourceId: existingContainerRegistryResourceId
+    existingContainerRegistryEndpoint: existingContainerRegistryEndpoint
+    existingAcrConnectionName: existingAcrConnectionName
+    existingApplicationInsightsConnectionString: existingApplicationInsightsConnectionString
+    existingApplicationInsightsResourceId: existingApplicationInsightsResourceId
+    existingAppInsightsConnectionName: existingAppInsightsConnectionName
+    resourceTokenSalt: resourceTokenSalt
+  }
+}
+
+// Existing project module — read-only reference when reusing an existing Foundry project
+module existingAiProject 'core/ai/existing-ai-project.bicep' = if (useExistingAiProject) {
+  scope: rg
+  name: 'existing-ai-project'
+  params: {
+    aiServicesAccountName: aiFoundryResourceName
+    aiFoundryProjectName: aiFoundryProjectName
+    deployments: aiProjectDeployments
+    existingAcrConnectionName: existingAcrConnectionName
+    existingContainerRegistryEndpoint: existingContainerRegistryEndpoint
+    existingApplicationInsightsConnectionString: existingApplicationInsightsConnectionString
+    existingApplicationInsightsResourceId: existingApplicationInsightsResourceId
+    connections: aiProjectConnections
+    connectionCredentials: aiProjectConnectionCreds
+  }
+}
+
+// ACR for existing project — create when hosted agents need a registry but the existing project has none
+var shouldCreateAcrForExistingProject = useExistingAiProject && shouldCreateAcr
+var acrConnectionName = 'acr-${uniqueString(subscription().id, resourceGroupName, location)}'
+
+module acrForExistingProject 'core/host/acr.bicep' = if (shouldCreateAcrForExistingProject) {
+  scope: rg
+  name: 'acr-for-existing-project'
+  params: {
+    location: location
+    tags: tags
+    resourceName: 'cr${uniqueString(subscription().id, resourceGroupName, location)}'
+    connectionName: acrConnectionName
+    principalId: principalId
+    principalType: principalType
+    aiServicesAccountName: aiFoundryResourceName
+    aiProjectName: aiFoundryProjectName
+  }
+}
+
+// Resources
+output AZURE_RESOURCE_GROUP string = resourceGroupName
+output AZURE_AI_ACCOUNT_ID string = useExistingAiProject ? existingAiProject.outputs.accountId : aiProject.outputs.accountId
+output AZURE_AI_PROJECT_ID string = useExistingAiProject ? existingAiProject.outputs.projectId : aiProject.outputs.projectId
+output AZURE_AI_FOUNDRY_PROJECT_ID string = useExistingAiProject ? existingAiProject.outputs.projectId : aiProject.outputs.projectId
+output AZURE_AI_ACCOUNT_NAME string = useExistingAiProject ? existingAiProject.outputs.aiServicesAccountName : aiProject.outputs.aiServicesAccountName
+output AZURE_AI_PROJECT_NAME string = useExistingAiProject ? existingAiProject.outputs.projectName : aiProject.outputs.projectName
+
+// Endpoints
+output AZURE_AI_PROJECT_ENDPOINT string = useExistingAiProject ? existingAiProject.outputs.AZURE_AI_PROJECT_ENDPOINT : aiProject.outputs.AZURE_AI_PROJECT_ENDPOINT
+output FOUNDRY_PROJECT_ENDPOINT string = useExistingAiProject ? existingAiProject.outputs.FOUNDRY_PROJECT_ENDPOINT : aiProject.outputs.FOUNDRY_PROJECT_ENDPOINT
+output AZURE_OPENAI_ENDPOINT string = useExistingAiProject ? existingAiProject.outputs.AZURE_OPENAI_ENDPOINT : aiProject.outputs.AZURE_OPENAI_ENDPOINT
+output APPLICATIONINSIGHTS_CONNECTION_STRING string = useExistingAiProject ? existingAiProject.outputs.APPLICATIONINSIGHTS_CONNECTION_STRING : aiProject.outputs.APPLICATIONINSIGHTS_CONNECTION_STRING
+output APPLICATIONINSIGHTS_RESOURCE_ID string = useExistingAiProject ? existingAiProject.outputs.APPLICATIONINSIGHTS_RESOURCE_ID : aiProject.outputs.APPLICATIONINSIGHTS_RESOURCE_ID
+
+// Dependent Resources and Connections
+
+// ACR
+output AZURE_AI_PROJECT_ACR_CONNECTION_NAME string = shouldCreateAcrForExistingProject ? acrForExistingProject.outputs.containerRegistryConnectionName : (useExistingAiProject ? existingAiProject.outputs.dependentResources.registry.connectionName : aiProject.outputs.dependentResources.registry.connectionName)
+output AZURE_CONTAINER_REGISTRY_ENDPOINT string = shouldCreateAcrForExistingProject ? acrForExistingProject.outputs.containerRegistryLoginServer : (useExistingAiProject ? existingAiProject.outputs.dependentResources.registry.loginServer : aiProject.outputs.dependentResources.registry.loginServer)
+
+// Bing Search
+output BING_GROUNDING_CONNECTION_NAME  string = useExistingAiProject ? existingAiProject.outputs.dependentResources.bing_grounding.connectionName : aiProject.outputs.dependentResources.bing_grounding.connectionName
+output BING_GROUNDING_RESOURCE_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.bing_grounding.name : aiProject.outputs.dependentResources.bing_grounding.name
+output BING_GROUNDING_CONNECTION_ID string = useExistingAiProject ? existingAiProject.outputs.dependentResources.bing_grounding.connectionId : aiProject.outputs.dependentResources.bing_grounding.connectionId
+
+// Bing Custom Search
+output BING_CUSTOM_GROUNDING_CONNECTION_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.bing_custom_grounding.connectionName : aiProject.outputs.dependentResources.bing_custom_grounding.connectionName
+output BING_CUSTOM_GROUNDING_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.bing_custom_grounding.name : aiProject.outputs.dependentResources.bing_custom_grounding.name
+output BING_CUSTOM_GROUNDING_CONNECTION_ID string = useExistingAiProject ? existingAiProject.outputs.dependentResources.bing_custom_grounding.connectionId : aiProject.outputs.dependentResources.bing_custom_grounding.connectionId
+
+// Azure AI Search
+output AZURE_AI_SEARCH_CONNECTION_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.search.connectionName : aiProject.outputs.dependentResources.search.connectionName
+output AZURE_AI_SEARCH_SERVICE_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.search.serviceName : aiProject.outputs.dependentResources.search.serviceName
+
+// Azure Storage
+output AZURE_STORAGE_CONNECTION_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.storage.connectionName : aiProject.outputs.dependentResources.storage.connectionName
+output AZURE_STORAGE_ACCOUNT_NAME string = useExistingAiProject ? existingAiProject.outputs.dependentResources.storage.accountName : aiProject.outputs.dependentResources.storage.accountName
+
+// Connections
+output AI_PROJECT_CONNECTION_IDS_JSON string = useExistingAiProject ? string(existingAiProject.outputs.connectionIds) : string(aiProject.outputs.connectionIds)
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/main.parameters.json b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/main.parameters.json
new file mode 100644
index 000000000000..0d0109fe4a8f
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/infra/main.parameters.json
@@ -0,0 +1,78 @@
+{
+    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
+    "contentVersion": "1.0.0.0",
+    "parameters": {
+      "resourceGroupName": {
+        "value": "${AZURE_RESOURCE_GROUP}"
+      },
+      "environmentName": {
+        "value": "${AZURE_ENV_NAME}"
+      },
+      "location": {
+        "value": "${AZURE_LOCATION}"
+      },
+      "aiFoundryResourceName": {
+        "value": "${AZURE_AI_ACCOUNT_NAME}"
+      },
+      "aiFoundryProjectName": {
+        "value": "${AZURE_AI_PROJECT_NAME}"
+      },
+      "aiDeploymentsLocation": {
+        "value": "${AZURE_AI_DEPLOYMENTS_LOCATION}"
+      },
+      "resourceTokenSalt": {
+        "value": "${AZD_RESOURCE_TOKEN_SALT=}"
+      },
+      "principalId": {
+        "value": "${AZURE_PRINCIPAL_ID}"
+      },
+      "principalType": {
+        "value": "${AZURE_PRINCIPAL_TYPE}"
+      },
+      "aiProjectDeploymentsJson": {
+        "value": "${AI_PROJECT_DEPLOYMENTS=[]}"
+      },
+      "aiProjectConnectionsJson": {
+        "value": "${AI_PROJECT_CONNECTIONS=[]}"
+      },
+      "aiProjectConnectionCredentialsJson": {
+        "value": "${AI_PROJECT_CONNECTION_CREDENTIALS}"
+      },
+      "aiProjectDependentResourcesJson": {
+        "value": "${AI_PROJECT_DEPENDENT_RESOURCES=[]}"
+      },
+      "enableMonitoring": {
+        "value": "${ENABLE_MONITORING=true}"
+      },
+      "enableHostedAgents": {
+        "value": "${ENABLE_HOSTED_AGENTS=false}"
+      },
+      "enableCapabilityHost": {
+        "value": "${ENABLE_CAPABILITY_HOST=true}"
+      },
+      "useExistingAiProject": {
+        "value": "${USE_EXISTING_AI_PROJECT=false}"
+      },
+      "existingContainerRegistryResourceId": {
+        "value": "${AZURE_CONTAINER_REGISTRY_RESOURCE_ID=}"
+      },
+      "existingContainerRegistryEndpoint": {
+        "value": "${AZURE_CONTAINER_REGISTRY_ENDPOINT=}"
+      },
+      "existingAcrConnectionName": {
+        "value": "${AZURE_AI_PROJECT_ACR_CONNECTION_NAME=}"
+      },
+      "skipAcr": {
+        "value": "${AZD_AGENT_SKIP_ACR=false}"
+      },
+      "existingApplicationInsightsConnectionString": {
+        "value": "${APPLICATIONINSIGHTS_CONNECTION_STRING=}"
+      },
+      "existingApplicationInsightsResourceId": {
+        "value": "${APPLICATIONINSIGHTS_RESOURCE_ID=}"
+      },
+      "existingAppInsightsConnectionName": {
+        "value": "${APPLICATIONINSIGHTS_CONNECTION_NAME=}"
+      }
+    }
+}
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/.gitignore b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/.gitignore
new file mode 100644
index 000000000000..e4f4657e5654
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/.gitignore
@@ -0,0 +1,4 @@
+# Local-run artifacts — never commit these
+.venv/
+.durable/
+out/
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/README.md b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/README.md
new file mode 100644
index 000000000000..88bf7502af6c
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/README.md
@@ -0,0 +1,155 @@
+# Run the durable Responses agent locally (crash → recover)
+
+This kit runs the `durable-responses-agent-demo` **entirely on your machine** and
+demonstrates durable crash-recovery — **without** the hosted Foundry task API.
+
+> **Why local?** Durable recovery normally relies on the hosted task-store
+> `/tasks` API. That API is currently returning **403** for hosted agents, which
+> blocks deployed recovery. Off-platform, the framework auto-selects a
+> **file-backed** task store + response store, so the *exact same* recovery code
+> path runs locally with no hosted dependency. Only the LLM sub-calls go to your
+> Foundry project.
+
+## Prerequisites
+
+- Python 3.10+
+- `az login` (the LLM sub-calls use `DefaultAzureCredential`)
+- A Foundry **project endpoint** and a **model deployment** in it
+
+## Quick start (automated demo)
+
+```bash
+cd local
+./setup.sh                          # builds a venv from ../../../../wheels + deps
+
+az login
+export FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project>
+export AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o     # a deployment in that project
+
+./run.sh
+```
+
+`run.sh` drives the whole thing and prints a narrated, verified result:
+
+1. **Start** the agent as a local server (file-backed durable store).
+2. **Stream** a 3-phase research response (one durable `OutputItem` +
+   `checkpoint()` per sub-call) → `out/sse_initial.txt`.
+3. **Crash** it after 5 checkpoints (the demo's `"crash"` input forces
+   `os._exit(137)`), pinned to the same session so the right replica dies.
+4. **Restart** → the startup recovery scan reclaims the in-progress task and
+   re-invokes the handler (`context.is_recovery`), seeding from the persisted
+   response and resuming at the first un-checkpointed sub-call.
+5. **Reconnect** with `GET …?stream=true&starting_after=<seq>` →
+   `out/sse_resumed.txt`, and assert the response completes the full plan.
+
+Example tail:
+
+```
+[4/4] Reconnecting to the same response and verifying it completes across the crash
+  » first resumed event: response.created (carries 5 checkpointed item(s))
+  » terminal event: response.completed with 12 total output item(s)
+
+RESULT
+{
+  "pre_crash_checkpoints": 5,
+  "first_resumed_event": "response.created",
+  "items_seeded_on_resume": 5,
+  "terminal_event": "response.completed",
+  "final_item_count": 12,
+  "expected_item_count": 12,
+  "RECOVERED_FULL_PLAN": true
+}
+
+✓ Durable recovery succeeded — the response completed the full plan across a crash.
+```
+
+Tunables (env): `NUM_PHASES` (default 3 → 12 sub-calls), `CRASH_AFTER` (default
+5 checkpoints), `PORT` (default 8088), `TARGET_OUTPUT_TOKENS` (default 80).
+
+## Manual exploration
+
+Drive the agent yourself in two terminals.
+
+**Terminal 1 — start the agent:**
+
+```bash
+cd local
+az login
+export FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project>
+export AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o
+./serve.sh
+```
+
+**Terminal 2 — stream, crash, reconnect** (`SID` pins everything to one session):
+
+```bash
+TOKEN=$(az account get-access-token --resource https://ai.azure.com --query accessToken -o tsv)
+SID=$(openssl rand -hex 16)
+
+# 1) Start a streaming, background, stored response. Note the "id" (caresp_...)
+#    and the highest "sequence_number" you see before you crash it.
+curl -N -s http://localhost:8088/responses \
+  -H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
+  -d "{\"model\":\"gpt-4o\",\"input\":\"renewable energy supply chains\",
+       \"stream\":true,\"store\":true,\"background\":true,\"agent_session_id\":\"$SID\"}"
+
+# 2) In a THIRD terminal, after a few `response.output_item.done` events, crash it:
+curl -s http://localhost:8088/responses \
+  -H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
+  -d "{\"model\":\"gpt-4o\",\"input\":\"crash\",\"stream\":false,\"store\":true,
+       \"background\":true,\"agent_session_id\":\"$SID\"}"
+
+# The server process exits (137). Restart it in Terminal 1 (./serve.sh again,
+# SAME durable root). On startup it logs "Reclaimed stale task ... Recovered
+# task ... is now active".
+
+# 3) Reconnect to the SAME response (use the id + last seq from step 1):
+curl -N -s "http://localhost:8088/responses/<caresp_id>?stream=true&starting_after=<last_seq>" \
+  -H "authorization: Bearer $TOKEN"
+# First event is response.in_progress/created carrying the already-checkpointed
+# items; the next sub-call resumes; the stream ends with response.completed.
+```
+
+> GET routes by `response_id` — you don't pass a session id on reconnect. For
+> `POST /responses`, the session id goes in the **body** (`agent_session_id`),
+> not the query string.
+
+## How it works locally
+
+`serve.sh` / `run.sh` set two env vars that flip the framework into local mode:
+
+| Env var | Effect |
+|---------|--------|
+| `AGENTSERVER_TASKS_BACKEND=local` | Use the file-backed task store instead of the hosted `/tasks` API. |
+| `AGENTSERVER_DURABLE_ROOT=<dir>` | Where the durable task store **and** response store live (`<dir>/tasks`, `<dir>/responses`, `<dir>/streams`). |
+
+Recovery works by restarting the process against the **same** `AGENTSERVER_DURABLE_ROOT`:
+the startup scan finds the stale in-progress task, reclaims its lease, and
+re-invokes the handler. `DEMO_MODE=1` enables the `"crash"` input sentinel.
+
+## Files
+
+| File | Purpose |
+|------|---------|
+| `setup.sh` | Create a venv and install the preview wheels + demo deps. |
+| `run.sh` | One-command automated crash → recover → verify demo. |
+| `serve.sh` | Start the agent locally for manual exploration. |
+| `recovery_demo.py` | The orchestrator `run.sh` invokes. |
+
+The agent handler itself is `../src/durable-responses-agent-demo/main.py`.
+
+## Other durable samples
+
+The same local pattern (`AGENTSERVER_TASKS_BACKEND=local` +
+`AGENTSERVER_DURABLE_ROOT`, restart to recover) applies to the other durable
+samples in this drop — see `../../sample_19_durable_streaming.py`,
+`sample_20_durable_steering.py`, `sample_21_durable_langgraph.py`,
+`sample_22_durable_multiturn.py`, and the invocations
+`durable_research` / `durable_multiturn` / `durable_langgraph` / `durable_copilot`
+samples.
+
+## Troubleshooting
+
+**`Address already in use` / `OSError: [Errno 98]`** — a server is still running
+on the port. `run.sh` auto-picks the next free port; for `serve.sh`, stop the
+old server (`Ctrl-C` in its terminal) or pick another port: `PORT=8090 ./serve.sh`.
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/recovery_demo.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/recovery_demo.py
new file mode 100755
index 000000000000..fc394e81704d
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/recovery_demo.py
@@ -0,0 +1,334 @@
+#!/usr/bin/env python3
+"""Local durable crash-recovery demo for the durable-responses-agent-demo.
+
+Runs the agent **entirely on your machine** — the durable task store and the
+response store are file-backed under a local directory, so you do **not** need
+the hosted Foundry task API (the one currently returning 403). Only the LLM
+sub-calls go to your Foundry project, so you need ``az login`` + a project
+endpoint + a model deployment.
+
+What it demonstrates, automatically, in one run:
+
+  1. Starts the agent as a local server (file-backed durable backend).
+  2. POSTs a streaming, background, stored response that runs a multi-phase
+     research plan, emitting one durable ``OutputItem`` + ``checkpoint()`` per
+     sub-call. The live SSE is streamed to ``out/sse_initial.txt``.
+  3. After a few checkpoints land, injects a crash (the demo's ``"crash"``
+     input forces ``os._exit(137)``) — pinned to the same session so it kills
+     the replica running our response. The stream drops mid-flight.
+  4. Restarts the server against the **same** durable root. On startup the
+     framework's recovery scan reclaims the in-progress task and re-invokes the
+     handler with ``context.is_recovery is True``; it seeds from the persisted
+     response and resumes at the first un-checkpointed sub-call.
+  5. Reconnects with ``GET /responses/{id}?stream=true&starting_after=<seq>``
+     and streams the resumed SSE to ``out/sse_resumed.txt``, then asserts the
+     response completes with the full set of output items.
+
+Run it via ``./run.sh`` (which sets up the venv + env), or directly:
+
+    FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project> \
+    AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o \
+    python recovery_demo.py
+
+Tunables (env): ``NUM_PHASES`` (default 3), ``CRASH_AFTER`` (default 5
+checkpoints), ``PORT`` (default 8088), ``DURABLE_ROOT`` (default ``./.durable``),
+``OUT_DIR`` (default ``./out``).
+"""
+from __future__ import annotations
+
+import json
+import os
+import signal
+import subprocess
+import sys
+import threading
+import time
+from pathlib import Path
+
+try:
+    import httpx
+except ImportError:  # pragma: no cover - guided setup
+    sys.exit("httpx is required. Run ./run.sh, or: pip install httpx")
+
+HERE = Path(__file__).resolve().parent
+MAIN_PY = HERE.parent / "src" / "durable-responses-agent-demo" / "main.py"
+
+PORT = int(os.environ.get("PORT", "8088"))
+
+
+def _port_is_free(port: int) -> bool:
+    import socket
+
+    s = socket.socket()
+    try:
+        s.bind(("0.0.0.0", port))
+        return True
+    except OSError:
+        return False
+    finally:
+        s.close()
+
+
+# Auto-pick a free port if the requested one is busy (e.g. a leftover server).
+_requested_port = PORT
+while not _port_is_free(PORT) and PORT < _requested_port + 25:
+    PORT += 1
+if PORT != _requested_port:
+    print(f"  » port {_requested_port} is busy; using {PORT} instead", flush=True)
+
+BASE = f"http://localhost:{PORT}"
+NUM_PHASES = int(os.environ.get("NUM_PHASES", "3"))
+CRASH_AFTER = int(os.environ.get("CRASH_AFTER", "5"))
+DURABLE_ROOT = Path(os.environ.get("DURABLE_ROOT", HERE / ".durable")).resolve()
+OUT_DIR = Path(os.environ.get("OUT_DIR", HERE / "out")).resolve()
+TOPIC = os.environ.get("TOPIC", "The impact of renewable energy adoption on global supply chains")
+
+if "FOUNDRY_PROJECT_ENDPOINT" not in os.environ:
+    sys.exit(
+        "FOUNDRY_PROJECT_ENDPOINT is required (your Foundry project endpoint for the LLM\n"
+        "sub-calls). Run `az login` first, then set it. See README.md."
+    )
+
+# Child-process env: real LLM via the project endpoint, but durability stays
+# local (file-backed task store + response store under DURABLE_ROOT).
+CHILD_ENV = {
+    **os.environ,
+    "AZURE_AI_MODEL_DEPLOYMENT_NAME": os.environ.get("AZURE_AI_MODEL_DEPLOYMENT_NAME", "gpt-4o"),
+    "DEMO_MODE": "1",  # enables the "crash" input sentinel in main.py
+    "AGENTSERVER_TASKS_BACKEND": "local",
+    "AGENTSERVER_DURABLE_ROOT": str(DURABLE_ROOT),
+    "INTRA_PHASE_COOLDOWN_SEC": os.environ.get("INTRA_PHASE_COOLDOWN_SEC", "1"),
+    "INTER_PHASE_COOLDOWN_SEC": os.environ.get("INTER_PHASE_COOLDOWN_SEC", "1"),
+    "TARGET_OUTPUT_TOKENS": os.environ.get("TARGET_OUTPUT_TOKENS", "80"),
+    "NUM_PHASES": str(NUM_PHASES),
+    "PORT": str(PORT),
+}
+
+st = {"rid": None, "max_seq": 0, "done": 0, "crashed": False}
+
+
+def log(*a: object) -> None:
+    print("  »", *a, flush=True)
+
+
+def banner(text: str) -> None:
+    print(f"\n\033[1m{text}\033[0m", flush=True)
+
+
+def wait_port(timeout: float = 45.0) -> bool:
+    t0 = time.time()
+    while time.time() - t0 < timeout:
+        try:
+            httpx.get(f"{BASE}/responses/_ping", timeout=2)
+            return True
+        except Exception:
+            time.sleep(0.5)
+    return False
+
+
+def start_server(tag: str) -> subprocess.Popen:
+    OUT_DIR.mkdir(parents=True, exist_ok=True)
+    logf = open(OUT_DIR / f"server_{tag}.log", "w")
+    proc = subprocess.Popen(
+        [sys.executable, str(MAIN_PY)],
+        env=CHILD_ENV,
+        stdout=logf,
+        stderr=subprocess.STDOUT,
+        start_new_session=True,
+    )
+    if not wait_port():
+        raise RuntimeError(f"server '{tag}' did not come up — see {OUT_DIR / f'server_{tag}.log'}")
+    log(f"server '{tag}' is up (pid {proc.pid}), logs -> out/server_{tag}.log")
+    return proc
+
+
+def parse_frame(frame: str):
+    ev = data = None
+    for line in frame.split("\n"):
+        if line.startswith("event:"):
+            ev = line[6:].strip()
+        elif line.startswith("data:"):
+            data = line[5:].strip()
+    if ev is None:
+        return None, {}
+    try:
+        return ev, (json.loads(data) if data else {})
+    except Exception:
+        return ev, {}
+
+
+def inject_crash() -> None:
+    log("injecting crash (POST input='crash', pinned to the same session) ...")
+    try:
+        httpx.post(
+            f"{BASE}/responses",
+            json={
+                "model": CHILD_ENV["AZURE_AI_MODEL_DEPLOYMENT_NAME"],
+                "input": "crash",
+                "stream": False,
+                "store": True,
+                "background": True,
+                "agent_session_id": os.urandom(8).hex(),
+            },
+            timeout=10,
+        )
+    except Exception as exc:
+        log(f"crash request returned/disconnected (expected): {type(exc).__name__}")
+    st["crashed"] = True
+
+
+def stream_initial() -> None:
+    body = {
+        "model": CHILD_ENV["AZURE_AI_MODEL_DEPLOYMENT_NAME"],
+        "input": TOPIC,
+        "stream": True,
+        "store": True,
+        "background": True,
+        "agent_session_id": os.urandom(16).hex(),
+    }
+    f = open(OUT_DIR / "sse_initial.txt", "w")
+    buf = ""
+    try:
+        with httpx.stream("POST", f"{BASE}/responses", json=body, timeout=None) as r:
+            log(f"initial stream opened (HTTP {r.status_code})")
+            for chunk in r.iter_text():
+                if not chunk:
+                    continue
+                f.write(chunk)
+                f.flush()
+                buf += chunk
+                while "\n\n" in buf:
+                    frame, buf = buf.split("\n\n", 1)
+                    ev, data = parse_frame(frame)
+                    seq = data.get("sequence_number")
+                    if isinstance(seq, int):
+                        st["max_seq"] = max(st["max_seq"], seq)
+                    rid = (data.get("response") or {}).get("id") or data.get("id")
+                    if rid and not st["rid"]:
+                        st["rid"] = rid
+                        log(f"response id: {rid}")
+                    if ev == "response.output_item.done":
+                        st["done"] += 1
+                        log(f"checkpoint #{st['done']} committed (seq={st['max_seq']})")
+                        if st["done"] == CRASH_AFTER and not st["crashed"]:
+                            threading.Thread(target=inject_crash, daemon=True).start()
+    except Exception as exc:
+        log(f"initial stream dropped: {type(exc).__name__} (this is the crash)")
+    finally:
+        f.close()
+
+
+def reconnect_and_verify() -> bool:
+    starting_after = st["max_seq"]
+    log(f"reconnecting: GET /responses/{st['rid']}?stream=true&starting_after={starting_after}")
+    f = open(OUT_DIR / "sse_resumed.txt", "w")
+    buf = ""
+    first_event = None
+    seeded_items = None
+    final_items = None
+    terminal = None
+    deadline = time.time() + 240
+    try:
+        with httpx.stream(
+            "GET",
+            f"{BASE}/responses/{st['rid']}",
+            params={"stream": "true", "starting_after": starting_after},
+            timeout=None,
+        ) as r:
+            log(f"reconnect stream opened (HTTP {r.status_code})")
+            for chunk in r.iter_text():
+                if time.time() > deadline:
+                    log("reconnect deadline reached")
+                    break
+                if not chunk:
+                    continue
+                f.write(chunk)
+                f.flush()
+                buf += chunk
+                while "\n\n" in buf:
+                    frame, buf = buf.split("\n\n", 1)
+                    ev, data = parse_frame(frame)
+                    if first_event is None:
+                        first_event = ev
+                        seeded_items = len((data.get("response") or {}).get("output") or [])
+                        log(f"first resumed event: {ev} (carries {seeded_items} checkpointed item(s))")
+                    if ev in ("response.completed", "response.failed", "response.incomplete"):
+                        terminal = ev
+                        final_items = len((data.get("response") or {}).get("output") or [])
+                        break
+            if terminal:
+                log(f"terminal event: {terminal} with {final_items} total output item(s)")
+    except Exception as exc:
+        log(f"reconnect stream ended: {type(exc).__name__}")
+    finally:
+        f.close()
+
+    expected = NUM_PHASES * 4
+    ok = terminal == "response.completed" and final_items == expected
+    st["_summary"] = {
+        "response_id": st["rid"],
+        "pre_crash_checkpoints": st["done"],
+        "pre_crash_max_seq": st["max_seq"],
+        "first_resumed_event": first_event,
+        "items_seeded_on_resume": seeded_items,
+        "terminal_event": terminal,
+        "final_item_count": final_items,
+        "expected_item_count": expected,
+        "RECOVERED_FULL_PLAN": ok,
+    }
+    return ok
+
+
+def main() -> int:
+    if not MAIN_PY.exists():
+        sys.exit(f"agent entrypoint not found: {MAIN_PY}")
+    DURABLE_ROOT.mkdir(parents=True, exist_ok=True)
+    OUT_DIR.mkdir(parents=True, exist_ok=True)
+    # Fresh state each run.
+    for sub in ("tasks", "responses", "streams"):
+        d = DURABLE_ROOT / sub
+        if d.exists():
+            for p in sorted(d.rglob("*"), reverse=True):
+                p.unlink() if p.is_file() else p.rmdir()
+
+    banner(f"[1/4] Starting local durable agent (file-backed store at {DURABLE_ROOT})")
+    p1 = start_server("1")
+
+    banner(f"[2/4] Streaming a {NUM_PHASES}-phase research response; will crash after {CRASH_AFTER} checkpoints")
+    stream_initial()
+    log(f"pre-crash watermark: {st['done']} checkpoints, max seq {st['max_seq']}, response {st['rid']}")
+    for _ in range(60):
+        if p1.poll() is not None:
+            log(f"server '1' exited (rc={p1.returncode}) — crash confirmed")
+            break
+        time.sleep(0.5)
+    else:
+        log("server '1' still alive; killing it to simulate the crash")
+        os.killpg(os.getpgid(p1.pid), signal.SIGKILL)
+    time.sleep(2)
+
+    banner("[3/4] Restarting the agent — startup recovery scan reclaims the in-progress task")
+    p2 = start_server("2")
+    log("giving recovery a moment to re-invoke the handler ...")
+    time.sleep(8)
+
+    banner("[4/4] Reconnecting to the same response and verifying it completes across the crash")
+    ok = reconnect_and_verify()
+
+    try:
+        os.killpg(os.getpgid(p2.pid), signal.SIGTERM)
+    except Exception:
+        pass
+
+    banner("RESULT")
+    print(json.dumps(st["_summary"], indent=2))
+    print(f"\nSSE transcripts: {OUT_DIR / 'sse_initial.txt'}  +  {OUT_DIR / 'sse_resumed.txt'}")
+    if ok:
+        print("\n\033[32m✓ Durable recovery succeeded — the response completed the full plan across a crash.\033[0m")
+        return 0
+    print("\n\033[31m✗ Recovery did not complete the full plan — inspect out/server_2.log.\033[0m")
+    return 1
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/run.sh b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/run.sh
new file mode 100755
index 000000000000..7d6ece4f74e5
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/run.sh
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+# ─────────────────────────────────────────────────────────────────────────────
+# Automated end-to-end durable crash-recovery demo:
+#   start agent (local store) -> stream -> crash -> restart -> recover -> verify.
+#
+#   az login
+#   export FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project>
+#   export AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o
+#   ./run.sh
+#
+# Tunables (env): NUM_PHASES (default 3), CRASH_AFTER (default 5), PORT (8088).
+# ─────────────────────────────────────────────────────────────────────────────
+set -euo pipefail
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+VENV="${VENV:-$HERE/.venv}"
+
+if [[ ! -d "$VENV" ]]; then
+    echo "venv not found at $VENV — run ./setup.sh first." >&2
+    exit 1
+fi
+: "${FOUNDRY_PROJECT_ENDPOINT:?set FOUNDRY_PROJECT_ENDPOINT (your Foundry project endpoint) and run 'az login' first}"
+
+exec "$VENV/bin/python" "$HERE/recovery_demo.py"
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/serve.sh b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/serve.sh
new file mode 100755
index 000000000000..a496a235ca1b
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/serve.sh
@@ -0,0 +1,46 @@
+#!/usr/bin/env bash
+# ─────────────────────────────────────────────────────────────────────────────
+# Start the durable agent locally (file-backed durable store, no hosted task
+# API) so you can drive it yourself — stream a response, crash it, reconnect.
+# See README.md "Manual exploration" for the curl recipe.
+#
+#   az login
+#   export FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project>
+#   export AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o
+#   ./serve.sh
+# ─────────────────────────────────────────────────────────────────────────────
+set -euo pipefail
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+VENV="${VENV:-$HERE/.venv}"
+MAIN="$HERE/../src/durable-responses-agent-demo/main.py"
+
+if [[ ! -d "$VENV" ]]; then
+    echo "venv not found at $VENV — run ./setup.sh first." >&2
+    exit 1
+fi
+: "${FOUNDRY_PROJECT_ENDPOINT:?set FOUNDRY_PROJECT_ENDPOINT (your Foundry project endpoint) and run 'az login' first}"
+
+# Local durable backend — this is what removes the hosted /tasks API dependency.
+export AGENTSERVER_TASKS_BACKEND=local
+export AGENTSERVER_DURABLE_ROOT="${AGENTSERVER_DURABLE_ROOT:-$HERE/.durable}"
+# Enables the "crash" input sentinel so you can trigger a crash on demand.
+export DEMO_MODE=1
+export AZURE_AI_MODEL_DEPLOYMENT_NAME="${AZURE_AI_MODEL_DEPLOYMENT_NAME:-gpt-4o}"
+export NUM_PHASES="${NUM_PHASES:-3}"
+export INTRA_PHASE_COOLDOWN_SEC="${INTRA_PHASE_COOLDOWN_SEC:-1}"
+export INTER_PHASE_COOLDOWN_SEC="${INTER_PHASE_COOLDOWN_SEC:-1}"
+export TARGET_OUTPUT_TOKENS="${TARGET_OUTPUT_TOKENS:-80}"
+export PORT="${PORT:-8088}"
+
+# Fail fast with a clear message if the port is already taken.
+if "$VENV/bin/python" -c "import socket,sys; s=socket.socket(); r=s.connect_ex(('127.0.0.1', ${PORT})); s.close(); sys.exit(0 if r==0 else 1)"; then
+    echo "Port ${PORT} is already in use (a server may still be running). Stop it, or pick another port: PORT=8090 ./serve.sh" >&2
+    exit 1
+fi
+
+echo "Starting durable agent on http://localhost:${PORT}"
+echo "  durable root : ${AGENTSERVER_DURABLE_ROOT}  (tasks + responses are file-backed here)"
+echo "  crash input  : POST /responses with input \"crash\"  (DEMO_MODE=1)"
+echo "  stop         : Ctrl-C"
+exec "$VENV/bin/python" "$MAIN"
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/setup.sh b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/setup.sh
new file mode 100755
index 000000000000..6c2a987fe297
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/local/setup.sh
@@ -0,0 +1,34 @@
+#!/usr/bin/env bash
+# ─────────────────────────────────────────────────────────────────────────────
+# One-time setup: create a local venv and install the preview wheels + the
+# demo's runtime dependencies. Re-run any time to refresh.
+#
+#   ./setup.sh
+#
+# Override the interpreter or venv location:
+#   PYTHON=python3.12 VENV=/tmp/durable-demo-venv ./setup.sh
+# ─────────────────────────────────────────────────────────────────────────────
+set -euo pipefail
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+WHEELS="$(cd "$HERE/../../../../wheels" && pwd)"
+VENV="${VENV:-$HERE/.venv}"
+PYTHON="${PYTHON:-python3}"
+
+echo "==> Creating venv: $VENV"
+"$PYTHON" -m venv "$VENV"
+"$VENV/bin/pip" install --quiet --upgrade pip
+
+echo "==> Installing preview wheels from: $WHEELS"
+"$VENV/bin/pip" install --quiet "$WHEELS"/*.whl
+
+echo "==> Installing demo runtime deps (azure-ai-projects, azure-identity, httpx)"
+"$VENV/bin/pip" install --quiet azure-ai-projects==2.0.1 azure-identity==1.25.3 httpx
+
+echo ""
+echo "Done. Next:"
+echo "  az login"
+echo "  export FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project>"
+echo "  export AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o   # a model deployment in that project"
+echo "  ./run.sh        # automated crash -> recover demo"
+echo "  ./serve.sh      # or run the agent yourself for manual exploration"
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/.agentignore b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/.agentignore
new file mode 100644
index 000000000000..4e8de03ee83d
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/.agentignore
@@ -0,0 +1,40 @@
+# Files excluded from agent code deployment packaging.
+# Uses .gitignore syntax.
+# Note: only the root .agentignore is read; subdirectory files are not supported.
+#
+# To include a file that is excluded by default, use negation: !filename
+
+# azd tooling files
+agent.yaml
+agent.manifest.yaml
+azure.yaml
+.agentignore
+
+# Security / secrets
+.env
+.env.*
+.azure/
+.git/
+
+# Python
+__pycache__/
+.venv/
+venv/
+*.pyc
+*.pyo
+.mypy_cache/
+.pytest_cache/
+
+# .NET
+bin/
+obj/
+*.user
+*.suo
+.vs/
+
+# Node
+node_modules/
+
+# Docker (not used in code deploy)
+Dockerfile
+.dockerignore
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/.azdignore b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/.azdignore
new file mode 100644
index 000000000000..4a74eabf4196
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/.azdignore
@@ -0,0 +1,3 @@
+agent.manifest.yaml
+agent.yaml
+.env.example
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/.dockerignore b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/.dockerignore
new file mode 100644
index 000000000000..b709ec79bea7
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/.dockerignore
@@ -0,0 +1,26 @@
+**/__pycache__/
+**/*.py[cod]
+**/*.egg-info/
+.eggs/
+
+# Virtual environments
+.venv/
+venv/
+env/
+
+# IDE settings
+.vscode/
+.idea/
+
+# Version control
+.git/
+.gitignore
+
+# Docker files
+.dockerignore
+
+# Docs
+README.md
+
+# Local environment (never bake credentials into the image)
+.env
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/.env.example b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/.env.example
new file mode 100644
index 000000000000..86eb2456e8ca
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/.env.example
@@ -0,0 +1,10 @@
+# Foundry project endpoint — auto-injected in hosted containers.
+# Only set manually if running without `azd ai agent run`.
+# FOUNDRY_PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project>
+
+# Model deployment name — must match a deployment in your Foundry project.
+AZURE_AI_MODEL_DEPLOYMENT_NAME=
+
+# Application Insights — auto-injected in hosted containers.
+# Set for local telemetry (optional but recommended).
+# APPLICATIONINSIGHTS_CONNECTION_STRING=InstrumentationKey=...
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/Dockerfile b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/Dockerfile
new file mode 100644
index 000000000000..4521aca6cd95
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/Dockerfile
@@ -0,0 +1,24 @@
+FROM python:3.12-slim
+
+WORKDIR /app
+
+# Install local wheel packages first (built by ../../build.sh before docker build).
+# Bundles all three preview packages: core (durable-task primitive),
+# invocations (HTTP host), responses (OpenAI Responses API host).
+COPY wheels/ /tmp/wheels/
+RUN pip install --no-cache-dir /tmp/wheels/*.whl && rm -rf /tmp/wheels
+
+# Install remaining dependencies.
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+COPY main.py ./
+
+EXPOSE 8088
+
+# This is a demo image — enables the "crash" sentinel handling in main.py.
+# A production image would leave this off (default).
+ENV DEMO_MODE=1
+
+# Platform nanny worker handles restart on crash; we just run the agent.
+CMD ["python", "main.py"]
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/README.md b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/README.md
new file mode 100644
index 000000000000..ade9681d7641
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/README.md
@@ -0,0 +1,204 @@
+<!-- Begin standard disclaimer — do not modify -->
+**IMPORTANT!** All samples and other resources made available in this GitHub repository ("samples") are designed to assist in accelerating development of agents, solutions, and agent workflows for various scenarios. Review all provided resources and carefully test output behavior in the context of your use case. AI responses may be inaccurate and AI actions should be monitored with human oversight. Learn more in the transparency note for [Agent Service](https://learn.microsoft.com/en-us/azure/ai-foundry/responsible-ai/agents/transparency-note).
+
+Agents, solutions, or other output you create may be subject to legal and regulatory requirements, may require licenses, or may not be suitable for all industries, scenarios, or use cases. By using any sample, you are acknowledging that any output created using those samples are solely your responsibility, and that you will comply with all applicable laws, regulations, and relevant safety standards, terms of service, and codes of conduct.
+
+Third-party samples contained in this folder are subject to their own designated terms, and they have not been tested or verified by Microsoft or its affiliates.
+
+Microsoft has no responsibility to you or others with respect to any of these samples or any resulting output.
+<!-- End standard disclaimer -->
+
+# What this sample demonstrates
+
+A minimal "hello world" hosted agent using the **Bring Your Own** approach with the **Responses protocol**. It shows how to use the [`azure-ai-agentserver-responses`](https://pypi.org/project/azure-ai-agentserver-responses/) SDK to host a custom agent that calls a Foundry model via the Responses API and returns the reply through the standard Responses protocol contract.
+
+This is the simplest possible BYO integration — the protocol SDK handles the HTTP endpoints, SSE lifecycle, health probes, and OpenTelemetry tracing. You supply the model call using the [Foundry SDK (`azure-ai-projects`)](https://pypi.org/project/azure-ai-projects/).
+
+## How It Works
+
+### Model Integration
+
+The agent uses the Foundry SDK to create an OpenAI-compatible Responses client from the project endpoint. When a request arrives, the handler extracts the user's input text, calls the model via the Responses API, and returns the reply as a `TextResponse` — which the SDK automatically wraps in the correct SSE lifecycle events (`response.created` → `response.in_progress` → content events → `response.completed`).
+
+See [main.py](main.py) for the full implementation.
+
+### Agent Hosting
+
+The agent is hosted using the [Azure AI AgentServer Responses SDK](https://pypi.org/project/azure-ai-agentserver-responses/), which provisions a REST API endpoint compatible with the OpenAI Responses protocol.
+
+### Agent Deployment
+
+The hosted agent can be developed and deployed to Microsoft Foundry using the [Azure Developer CLI](https://learn.microsoft.com/en-us/azure/foundry/agents/quickstarts/quickstart-hosted-agent?view=foundry&pivots=azd).
+
+## Running the Agent Locally
+
+### Prerequisites
+
+Before running this sample, ensure you have:
+
+1. **Azure Developer CLI (`azd`)** (recommended)
+   - [Install azd](https://learn.microsoft.com/en-us/azure/developer/azure-developer-cli/install-azd) and the AI agent extension: `azd ext install azure.ai.agents`
+   - Authenticated: `azd auth login`
+
+2. **Azure CLI**
+   - Installed and authenticated: `az login`
+
+3. **Python 3.10 or higher**
+   - Verify your version: `python --version`
+
+> [!NOTE]
+> You do **not** need an existing [Microsoft Foundry](https://learn.microsoft.com/en-us/azure/ai-foundry/what-is-foundry?view=foundry) project or model deployment to get started — `azd provision` creates them for you. If you already have a project, see the [note below](#using-azd-recommended-for-cli-workflows) on how to target it.
+
+### Environment Variables
+
+See [`.env.example`](.env.example) or `.env` for the full list of environment variables this sample uses.
+
+| Variable | Required | Description |
+|----------|----------|-------------|
+| `FOUNDRY_PROJECT_ENDPOINT` | Yes | Foundry project endpoint. Auto-injected in hosted containers; set automatically by `azd ai agent run` locally. |
+| `AZURE_AI_MODEL_DEPLOYMENT_NAME` | Yes | Model deployment name — must match your Foundry project deployment. Declared in `agent.manifest.yaml`. |
+| `APPLICATIONINSIGHTS_CONNECTION_STRING` | Recommended | Enables telemetry. Auto-injected in hosted containers; set manually for local dev. |
+
+**Local development (without `azd`):**
+
+```bash
+# Copy and fill in values, then source
+cp .env.example .env  # skip if .env already exists
+# Edit .env with your values
+source .env
+```
+
+> [!NOTE]
+> When using `azd ai agent run`, environment variables are handled automatically — no manual setup needed.
+
+### Installing Dependencies
+
+> [!NOTE]
+> If using `azd ai agent run`, dependencies are installed automatically — skip to [Running the Sample](#running-the-sample).
+
+```bash
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
+
+### Running the Sample
+
+The recommended way to run and test hosted agents locally is with the Azure Developer CLI (`azd`) or the Foundry Toolkit VS Code extension.
+
+#### Using the Foundry Toolkit VS Code Extension
+
+The [Foundry Toolkit VS Code extension](https://learn.microsoft.com/en-us/azure/foundry/agents/quickstarts/quickstart-hosted-agent?view=foundry&pivots=vscode) has a built-in sample gallery. You can open this sample directly from the extension without cloning the repository, it scaffolds the project into a new workspace, generates `agent.yaml`, `.env`, and `.vscode/tasks.json` + `launch.json` automatically, and configures a one-click **F5** debug experience.
+
+Chat with a running agent using the **Agent Inspector**:
+
+1. Start the agent locally first using **Using `azd`** or **Without `azd`** above. The agent listens on `http://localhost:8088/`.
+2. Open the Command Palette (`Ctrl+Shift+P`) and run **Foundry Toolkit: Open Agent Inspector**.
+3. The Inspector auto-connects to the running agent. Send messages to chat with the agent and watch the streamed responses.
+
+#### Using [`azd`](https://learn.microsoft.com/en-us/azure/foundry/agents/quickstarts/quickstart-hosted-agent?view=foundry&pivots=azd) (recommended for CLI workflows)
+
+No cloning required. Create a new folder, point `azd` at the manifest on GitHub, and it sets up the sample and generates Bicep infrastructure, `agent.yaml`, and env config automatically:
+
+```bash
+# Create a new folder for the agent and navigate into it
+mkdir hello-world-agent && cd hello-world-agent
+
+# Initialize from the manifest — azd reads it, downloads the sample,
+# and generates Bicep infrastructure, agent.yaml, and env config
+azd ai agent init -m https://github.com/microsoft-foundry/foundry-samples/blob/main/samples/python/hosted-agents/bring-your-own/responses/hello-world/agent.manifest.yaml
+
+# Provision Azure resources (Foundry project, model deployment, App Insights)
+azd provision
+
+# Run the agent locally (handles env vars, Docker build, and startup)
+azd ai agent run
+```
+
+> [!NOTE]
+> If you've already cloned this repository, pass a local path to the manifest instead:
+> `azd ai agent init -m <path-to-repo>/samples/python/hosted-agents/bring-your-own/responses/hello-world/agent.manifest.yaml`
+
+> [!NOTE]
+> If you already have a Foundry project and model deployment, add `-p <project-id> -d <deployment-name>` to `azd ai agent init` to target existing resources. You can also skip provisioning entirely and configure env vars manually — see [Without `azd`](#without-azd).
+
+The agent starts on `http://localhost:8088/`. To invoke it:
+
+```bash
+azd ai agent invoke --local "What is Microsoft Foundry?"
+```
+
+Or use curl directly:
+
+```bash
+curl -sS -X POST http://localhost:8088/responses \
+  -H "Content-Type: application/json" \
+  -d '{"input": "What is Microsoft Foundry?", "stream": false}' | jq .
+```
+
+#### Without `azd`
+
+If running without `azd`, set environment variables manually (see [Environment Variables](#environment-variables)), then:
+
+```bash
+python main.py
+```
+
+### Deploying the Agent to Microsoft Foundry
+
+Once you've tested locally, deploy to Microsoft Foundry:
+
+```bash
+# Provision Azure resources (skip if already done during local setup)
+azd provision
+
+# Build, push, and deploy the agent to Foundry
+azd deploy
+```
+
+After deploying, invoke the agent running in Foundry:
+
+```bash
+azd ai agent invoke "What is Microsoft Foundry?"
+```
+
+To stream logs from the running agent:
+
+```bash
+azd ai agent monitor
+```
+
+For the full deployment guide, see [Azure AI Foundry hosted agents](https://aka.ms/azdaiagent/docs).
+
+#### Deploying with the Foundry Toolkit VS Code Extension
+
+1. Open the Command Palette (`Ctrl+Shift+P`) and run **Foundry Toolkit: Deploy Hosted Agent**. The extension opens a tab-based **Deploy Hosted Agent** wizard and reads `agent.yaml` to auto-populate what it can.
+2. If prompted, complete **Foundry Project Setup** to pick the subscription and Foundry project (or create a new one) to deploy to.
+3. On the **Basics** tab, configure the core deployment settings:
+   - **Deployment Method**: **Code** (upload as a ZIP) or **Container** (Docker image via ACR).
+   - For **Code**, pick a packaging option: **Remote** or **Local**.
+   - For **Container**, pick a registry option: default ACR, your own ACR, or a prebuilt ACR image.
+   - **Hosted Agent Name**: confirm the name to register with the hosting service.
+4. On the **Review + Deploy** tab, finalize the runtime and resources:
+   - Confirm the auto-detected runtime details (language, entry point, or Dockerfile).
+   - Pick a **CPU and Memory** size.
+   - Click **Deploy**. Fields are validated inline, and the extension handles the build/upload, agent version creation, and RBAC role assignment.
+5. After deployment, invoke the agent in the Agent Playground and stream live logs from the **Logs** tab.
+
+## Troubleshooting
+
+### Images built on Apple Silicon or other ARM64 machines do not work on our service
+
+We **recommend deploying with `azd deploy`**, which uses ACR remote build and always produces images with the correct architecture.
+
+If you choose to **build locally**, and your machine is **not `linux/amd64`** (for example, an Apple Silicon Mac), the image will **not be compatible with our service**, causing runtime failures.
+
+**Fix for local builds**
+
+Use this command to build the image locally:
+
+```shell
+docker build --platform=linux/amd64 -t image .
+```
+
+This forces the image to be built for the required `amd64` architecture.
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/agent.manifest.yaml b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/agent.manifest.yaml
new file mode 100644
index 000000000000..17e0dc713b99
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/agent.manifest.yaml
@@ -0,0 +1,31 @@
+# yaml-language-server: $schema=https://raw.githubusercontent.com/microsoft/AgentSchema/refs/heads/main/schemas/v1.0/AgentManifest.yaml
+name: hello-world-python-responses
+displayName: "Hello World (Python, Responses)"
+description: >
+  Minimal Hello World agent using the Responses protocol with a bring-your-own
+  approach. Calls a Foundry model via the Responses API and returns the
+  response.
+metadata:
+  tags:
+    - AI Agent Hosting
+    - Responses Protocol
+    - Bring Your Own
+    - Python
+template:
+  name: hello-world-python-responses
+  kind: hosted
+  protocols:
+    - protocol: responses
+      version: 1.0.0
+  environment_variables:
+    # FOUNDRY_PROJECT_ENDPOINT and APPLICATIONINSIGHTS_CONNECTION_STRING
+    # are injected by the platform (hosted) and translated by azd (local)
+    # — do NOT declare them here.
+    #
+    # Model deployment name — resolved from the resources section below.
+    - name: AZURE_AI_MODEL_DEPLOYMENT_NAME
+      value: "{{AZURE_AI_MODEL_DEPLOYMENT_NAME}}"
+resources:
+  - kind: model
+    id: gpt-4.1-mini
+    name: AZURE_AI_MODEL_DEPLOYMENT_NAME
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/agent.yaml b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/agent.yaml
new file mode 100644
index 000000000000..165df8cd042c
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/agent.yaml
@@ -0,0 +1,30 @@
+# yaml-language-server: $schema=https://raw.githubusercontent.com/microsoft/AgentSchema/refs/heads/main/schemas/v1.0/ContainerAgent.yaml
+
+kind: hosted
+name: durable-responses-agent-demo
+description: |
+    Minimal Hello World agent using the Responses protocol with a bring-your-own approach. Calls a Foundry model via the Responses API and returns the response.
+metadata:
+    tags:
+        - AI Agent Hosting
+        - Responses Protocol
+        - Bring Your Own
+        - Python
+protocols:
+    - protocol: responses
+      version: 1.0.0
+resources:
+    cpu: "1"
+    memory: 2Gi
+environment_variables:
+    - name: AZURE_AI_MODEL_DEPLOYMENT_NAME
+      value: gpt-4.1-mini
+    # Long-running demo: per-phase ≈ 12s LLM + 3×30s intra + 30s inter ≈ 132s,
+    # × 15 phases ≈ 33 min total — runs ~2x past the platform's 15-min
+    # sandbox-eviction window so each run exercises the durable-task lease
+    # keep-alive path. Local main.py defaults (10/20s, ~15 min) apply when
+    # running outside the hosted container for fast iteration.
+    - name: INTRA_PHASE_COOLDOWN_SEC
+      value: "30"
+    - name: INTER_PHASE_COOLDOWN_SEC
+      value: "30"
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/main.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/main.py
new file mode 100644
index 000000000000..ac3d91d6d28c
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/main.py
@@ -0,0 +1,341 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+"""Durable Responses Research Agent — Demo.
+
+A durable + steerable Responses-API agent that demonstrates four
+platform capabilities of the Azure AI Hosted Agent + the responses
+package. It is a faithful port of the invocations ``durable-agent-demo``
+(same 15-phase × 4-subcall research plan, same cooldown cadence, same
+~33-min runtime) onto the responses package's spec-025 durability
+primitives — so the behaviour matches while the mechanism is the
+one-OutputItem-per-subcall ``stream.checkpoint()`` pattern.
+
+1. **Long-running responses run uninterrupted past the platform's
+   sandbox-eviction window.** 15 research phases × 4 LLM subcalls each,
+   with intra-phase and inter-phase cooldowns (~132s/phase ≈ 33 min
+   total) — ~2x the 15-min eviction window, so every run exercises the
+   durable-task lease keep-alive path.
+
+2. **Recovery from container crashes.** When the container dies, the
+   platform's nanny worker brings it back within ~1 min and the
+   framework re-invokes this handler with ``context.is_recovery is True``.
+   Recovery uses the **one-OutputItem-per-subcall** pattern: the persisted
+   response *is* the watermark. The handler seeds its stream from
+   ``context.persisted_response`` and resumes at
+   ``len(stream.response.output)`` — completed (checkpointed) subcalls
+   survive and are replayed to reconnecting clients via the
+   ``response.in_progress`` reset; the interrupted subcall re-runs.
+
+3. **Steering.** POSTing a follow-up turn (with ``previous_response_id``
+   pointing at the still-running one) queues the input as a steering
+   input. The agent observes
+   ``cancellation_signal.is_set() and context.pending_input_count > 0``,
+   winds down at the next phase boundary, and re-enters with
+   ``context.is_steered_turn is True`` carrying the new input.
+
+4. **Operator cancel.** ``POST /responses/{id}/cancel`` fires
+   ``cancellation_signal`` + stamps ``context.client_cancelled``; the
+   framework forces the response to ``status="cancelled"`` regardless of
+   what the handler emits.
+
+Special behaviour: ``POST /responses`` with input "crash" (when the
+container has ``DEMO_MODE=1``) forces ``os._exit(137)`` shortly after
+returning, so the platform's nanny worker can demonstrate the recovery
+path.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+from typing import Any
+
+from azure.ai.projects.aio import AIProjectClient
+from azure.identity.aio import DefaultAzureCredential
+
+from azure.ai.agentserver.responses import (
+    CreateResponse,
+    ResponseContext,
+    ResponseEventStream,
+    ResponsesAgentServerHost,
+    ResponsesServerOptions,
+)
+
+logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+logger = logging.getLogger(__name__)
+
+
+# ── Config (same knobs as the invocations durable-agent-demo) ────────────
+
+_endpoint = os.environ["FOUNDRY_PROJECT_ENDPOINT"]
+_model = os.environ.get("AZURE_AI_MODEL_DEPLOYMENT_NAME", "gpt-4.1-mini")
+
+# 15 research phases × 4 subcalls each, with cooldowns, spans the
+# sandbox-eviction window (~33 min hosted). Hosted cooldowns are set to
+# 30s in agent.yaml; the defaults here (10/20s, ~15 min) apply for fast
+# local iteration.
+PHASE_TITLES = [
+    "Decomposing topic into focused research questions",
+    "Surveying foundational literature and key concepts",
+    "Identifying leading researchers and institutions",
+    "Mapping the historical trajectory of the field",
+    "Analyzing recent breakthroughs and publications",
+    "Examining competing theories and methodological debates",
+    "Evaluating experimental evidence and data quality",
+    "Mapping connections to adjacent fields",
+    "Identifying open problems and knowledge gaps",
+    "Assessing real-world applications and current adoption",
+    "Analyzing funding landscape and research trends",
+    "Surveying ethical considerations and societal implications",
+    "Projecting near-term and long-term outlook",
+    "Synthesizing findings into a coherent narrative",
+    "Generating key insights and concrete recommendations",
+]
+
+_SUB_CALL_ROLES = [
+    (
+        "research",
+        "Conduct an in-depth investigation of the assigned aspect. Include "
+        "specific findings, examples, and references where you can. Aim for "
+        "substantive, multi-paragraph content.",
+    ),
+    (
+        "critique",
+        "Critically evaluate the research above. Identify weak claims, gaps, "
+        "competing interpretations, and quality concerns. Be specific.",
+    ),
+    (
+        "refine",
+        "Revise the original research, incorporating the critique. Strengthen "
+        "weak claims, address gaps, and clarify uncertainty. Produce a "
+        "tightened, more rigorous version.",
+    ),
+    (
+        "synthesize",
+        "Distill the refined material into 2-3 paragraphs of key takeaways "
+        "suitable for someone briefing a decision-maker on this phase.",
+    ),
+]
+
+NUM_PHASES = max(1, int(os.environ.get("NUM_PHASES", str(len(PHASE_TITLES)))))
+CALLS_PER_PHASE = max(1, min(len(_SUB_CALL_ROLES), int(os.environ.get("CALLS_PER_PHASE", "4"))))
+TARGET_OUTPUT_TOKENS = int(os.environ.get("TARGET_OUTPUT_TOKENS", "1500"))
+INTRA_PHASE_COOLDOWN_SEC = float(os.environ.get("INTRA_PHASE_COOLDOWN_SEC", "10"))
+INTER_PHASE_COOLDOWN_SEC = float(os.environ.get("INTER_PHASE_COOLDOWN_SEC", "20"))
+DEMO_MODE = os.environ.get("DEMO_MODE") == "1"
+
+
+def _phase_title(i: int) -> str:
+    return PHASE_TITLES[i] if i < len(PHASE_TITLES) else f"Continued research (phase {i + 1})"
+
+
+def _item_text(item: object) -> str:
+    """Extract the ``output_text`` of a (seeded or just-emitted) output item.
+
+    ``context.persisted_response`` / ``stream.response.output`` expose typed
+    ``OutputItem`` models (MutableMappings, not plain ``dict``s), so access via
+    duck-typed ``.get()``. Used to chain each subcall onto the previous one's
+    text — including across a crash, where the previous subcall is read back
+    from the seeded persisted snapshot.
+    """
+    get = getattr(item, "get", None)
+    if not callable(get):
+        return ""
+    for part in get("content") or []:
+        part_get = getattr(part, "get", None)
+        if callable(part_get) and part_get("type") == "output_text":
+            return part_get("text", "") or ""
+    return ""
+
+
+# ── Upstream client (lazy — survives recovery re-invocation cleanly) ──
+
+_openai_client: Any = None
+_project_client: Any = None
+_credential: Any = None
+
+
+def _client() -> Any:
+    global _openai_client, _project_client, _credential
+    if _openai_client is None:
+        _credential = DefaultAzureCredential()
+        _project_client = AIProjectClient(endpoint=_endpoint, credential=_credential)
+        _openai_client = _project_client.get_openai_client()
+    return _openai_client
+
+
+# ── Durability config + host registration ────────────────────────────
+
+app = ResponsesAgentServerHost(
+    options=ResponsesServerOptions(
+        durable_background=True,
+        steerable_conversations=True,
+    ),
+)
+
+
+# ── Helpers ─────────────────────────────────────────────────────────────
+
+
+async def _stream_subcall(instructions: str, user_input: str, signals: tuple[asyncio.Event, ...]) -> Any:
+    """Stream one LLM subcall's token deltas. Stops early if a signal fires."""
+    stream_obj = await _client().responses.create(
+        model=_model,
+        instructions=instructions,
+        input=user_input,
+        store=False,
+        stream=True,
+        max_output_tokens=TARGET_OUTPUT_TOKENS,
+    )
+    async for event in stream_obj:
+        if any(sig.is_set() for sig in signals):
+            return
+        if event.type == "response.output_text.delta":
+            yield event.delta
+
+
+async def _cooldown(context: ResponseContext, cancellation_signal: asyncio.Event, duration_sec: float) -> None:
+    """Cooldown wait. Wakes on cancel; defers to recovery on shutdown."""
+    slept = 0.0
+    while slept < duration_sec:
+        if context.shutdown.is_set():
+            await context.exit_for_recovery()
+        if cancellation_signal.is_set():
+            return
+        await asyncio.sleep(0.5)
+        slept += 0.5
+
+
+# ── Handler ──────────────────────────────────────────────────────────────
+
+
+@app.response_handler
+async def handler(
+    request: CreateResponse,
+    context: ResponseContext,
+    cancellation_signal: asyncio.Event,
+):
+    """15-phase × 4-subcall durable + steerable research handler.
+
+    **One OutputItem per subcall** (research → critique → refine →
+    synthesize), and ``yield stream.checkpoint()`` after each — so a crash
+    loses at most the one subcall that was actively streaming (matching the
+    invocations demo's per-subcall recovery granularity). The persisted
+    response IS the watermark: ``len(stream.response.output)`` is the number
+    of durably-completed subcalls, so on recovery the handler seeds its
+    stream from ``context.persisted_response`` and resumes at the first
+    un-checkpointed subcall. Subcalls chain (each takes the previous one's
+    text as input); on recovery the previous subcall's text is read back
+    from the seeded snapshot.
+    """
+    topic = (await context.get_input_text()) or ""
+
+    # Demo-only crash trigger.
+    if DEMO_MODE and topic.strip().lower() in ("crash", "kill", "💥"):
+        logger.critical("CRASH triggered via input=%r — exiting in 300ms", topic)
+
+        async def _crash() -> None:
+            await asyncio.sleep(0.3)
+            os._exit(137)
+
+        asyncio.create_task(_crash())
+        stream = ResponseEventStream(response_id=context.response_id, request=request)
+        yield stream.emit_created()
+        yield stream.emit_failed(
+            code="server_error",
+            message="Demo-mode crash trigger fired; process exiting in 300ms.",
+        )
+        return
+
+    # ── Recovery branch: seed from the persisted snapshot ────────────
+    # Each completed subcall is one persisted output item, so the item
+    # count is the subcall watermark.
+    if context.is_recovery and context.persisted_response is not None:
+        stream = ResponseEventStream(response_id=context.response_id, response=context.persisted_response)
+        done_subcalls = len(stream.response.output)
+    else:
+        stream = ResponseEventStream(response_id=context.response_id, request=request)
+        done_subcalls = 0
+
+    yield stream.emit_created()  # framework dedups the duplicate on recovery
+
+    # ── Pre-entry: shutdown and cancellation are DISTINCT surfaces ───
+    if context.shutdown.is_set():
+        await context.exit_for_recovery()
+    if cancellation_signal.is_set():
+        if context.pending_input_count > 0:
+            yield stream.emit_completed()  # steering pre-entry — finish cleanly
+        return  # client cancel — framework forces "cancelled"
+
+    yield stream.emit_in_progress()  # client-visible reset point on recovery
+
+    # ── Drive the subcalls — one OutputItem + checkpoint per subcall ──
+    # Flatten (phase, subcall) into a single step index so the persisted
+    # output-item count is the resume cursor.
+    total_subcalls = NUM_PHASES * CALLS_PER_PHASE
+    for step in range(done_subcalls, total_subcalls):
+        phase_idx, sub_idx = divmod(step, CALLS_PER_PHASE)
+        title = _phase_title(phase_idx)
+        role_name, role_prompt = _SUB_CALL_ROLES[sub_idx]
+
+        # Chain onto the previous subcall in this phase (reset at sub_idx 0).
+        # On recovery the previous subcall is read back from the seeded item.
+        prev_text = "" if sub_idx == 0 else _item_text(stream.response.output[step - 1])
+
+        instructions = (
+            f"You are a research analyst working on the topic: '{topic}'.\n"
+            f"Current phase: '{title}'.\nYour role in this sub-step: {role_name}.\n\n{role_prompt}"
+        )
+        user_input = (
+            f"Topic: {topic}\nPhase: {title}\n\nPrevious sub-step output:\n{prev_text}"
+            if prev_text
+            else f"Topic: {topic}\nPhase: {title}"
+        )
+
+        message = stream.add_output_item_message()
+        message.internal_metadata["phase"] = phase_idx  # observability; stripped on egress
+        message.internal_metadata["subcall"] = role_name
+        yield message.emit_added()
+        text = message.add_text_content()
+        yield text.emit_added()
+        yield text.emit_delta(f"=== Phase {phase_idx + 1}/{NUM_PHASES} — {title} · {role_name} ===\n\n")
+
+        async for delta in _stream_subcall(instructions, user_input, (cancellation_signal, context.shutdown)):
+            yield text.emit_delta(delta)
+
+        # Mid-subcall shutdown: defer BEFORE closing the item, so the item
+        # never enters the snapshot and this subcall re-runs on recovery.
+        if context.shutdown.is_set():
+            await context.exit_for_recovery()
+
+        yield text.emit_text_done()
+        yield text.emit_done()
+        yield message.emit_done()  # item now in stream.response.output
+
+        # Steering / client cancel mid-subcall: wind down without advancing
+        # the watermark (don't checkpoint this subcall).
+        if cancellation_signal.is_set():
+            break
+
+        yield stream.checkpoint()  # subcall durable; on to the next
+
+        # Cooldown: intra-phase between subcalls, inter-phase after the
+        # last subcall of a phase. Skipped after the final subcall.
+        if step + 1 < total_subcalls:
+            last_sub_of_phase = sub_idx + 1 == CALLS_PER_PHASE
+            cooldown = INTER_PHASE_COOLDOWN_SEC if last_sub_of_phase else INTRA_PHASE_COOLDOWN_SEC
+            if cooldown > 0:
+                await _cooldown(context, cancellation_signal, cooldown)
+                if cancellation_signal.is_set():
+                    break
+
+    yield stream.emit_completed()
+
+
+def main() -> None:
+    app.run()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/requirements.txt b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/requirements.txt
new file mode 100644
index 000000000000..306e38caffec
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/durable-responses-agent-demo/src/durable-responses-agent-demo/requirements.txt
@@ -0,0 +1,6 @@
+# azure-ai-agentserver-{core,invocations,responses} wheels are installed
+# from /tmp/wheels/ at docker-build time (see Dockerfile). The wheels are
+# staged into ./wheels/ by ../../build.sh from the central
+# sdk/agentserver/wheels/ directory.
+azure-ai-projects==2.0.1
+azure-identity==1.25.3
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_01_getting_started.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_01_getting_started.py
deleted file mode 100644
index f8973e28858e..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_01_getting_started.py
+++ /dev/null
@@ -1,63 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 01 — Getting Started (echo handler).
-
-Simplest possible handler: reads the user's input text and echoes it back
-as a single non-streaming message using ``TextResponse``.
-
-``TextResponse`` handles the full SSE lifecycle automatically:
-``response.created`` → ``response.in_progress`` → message/content events
-→ ``response.completed``.
-
-Usage::
-
-    # Start the server
-    python sample_01_getting_started.py
-
-    # Send a request
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "echo", "input": "Hello, world!"}'
-    # -> {"id": "...", "status": "completed", "output": [{"type": "message",
-    #     "content": [{"type": "output_text", "text": "Echo: Hello, world!"}]}]}
-
-    # Stream the response
-    curl -N -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "echo", "input": "Hello, world!", "stream": true}'
-    # -> event: response.created            data: {"response": {"status": "in_progress", ...}}
-    # -> event: response.in_progress        data: {"response": {"status": "in_progress", ...}}
-    # -> event: response.output_item.added  data: {"item": {"type": "message", ...}}
-    # -> event: response.content_part.added data: {"part": {"type": "output_text", ...}}
-    # -> event: response.output_text.delta  data: {"delta": "Echo: Hello, world!"}
-    # -> event: response.output_text.done   data: {"text": "Echo: Hello, world!"}
-    # -> event: response.content_part.done  data: {"part": {"type": "output_text", ...}}
-    # -> event: response.output_item.done   data: {"item": {"type": "message", ...}}
-    # -> event: response.completed          data: {"response": {"status": "completed", ...}}
-"""
-
-import asyncio
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponsesAgentServerHost,
-    TextResponse,
-)
-
-app = ResponsesAgentServerHost()
-
-
-@app.response_handler
-async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
-    """Echo the user's input back as a single message."""
-    input_text = await context.get_input_text()
-    return TextResponse(context, request, text=f"Echo: {input_text}")
-
-
-def main() -> None:
-    app.run()
-
-
-if __name__ == "__main__":
-    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_02_streaming_text_deltas.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_02_streaming_text_deltas.py
deleted file mode 100644
index 4bfff9c214e0..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_02_streaming_text_deltas.py
+++ /dev/null
@@ -1,75 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 02 — Token-by-Token Streaming.
-
-Demonstrates token-by-token streaming using ``TextResponse`` with
-``text``.  Each chunk yielded by the async generator is
-emitted as a separate ``output_text.delta`` SSE event, enabling
-real-time token-by-token streaming to the client.
-
-The ``configure`` callback sets ``Response.temperature`` on the response
-envelope before ``response.created`` is emitted.
-
-Usage::
-
-    # Start the server
-    python sample_02_streaming_text_deltas.py
-
-    # Stream token-by-token deltas
-    curl -N -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "streaming", "input": "world", "stream": true}'
-    # -> event: response.created            data: {"response": {"status": "in_progress", ...}}
-    # -> event: response.in_progress        data: {"response": {"status": "in_progress", ...}}
-    # -> event: response.output_item.added  data: {"item": {"type": "message", ...}}
-    # -> event: response.content_part.added data: {"part": {"type": "output_text", ...}}
-    # -> event: response.output_text.delta  data: {"delta": "Hello"}
-    # -> event: response.output_text.delta  data: {"delta": ", "}
-    # -> event: response.output_text.delta  data: {"delta": "world"}
-    # -> event: response.output_text.delta  data: {"delta": "! "}
-    # -> event: response.output_text.delta  data: {"delta": "How "}
-    # -> event: response.output_text.delta  data: {"delta": "are "}
-    # -> event: response.output_text.delta  data: {"delta": "you?"}
-    # -> event: response.output_text.done   data: {"text": "Hello, world! How are you?"}
-    # -> event: response.content_part.done  data: {"part": {"type": "output_text", ...}}
-    # -> event: response.output_item.done   data: {"item": {"type": "message", ...}}
-    # -> event: response.completed          data: {"response": {"status": "completed", ...}}
-"""
-
-import asyncio
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponsesAgentServerHost,
-    TextResponse,
-)
-
-app = ResponsesAgentServerHost()
-
-
-@app.response_handler
-async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
-    """Stream tokens one at a time using TextResponse."""
-    user_text = await context.get_input_text() or "world"
-
-    async def generate_tokens():
-        tokens = ["Hello", ", ", user_text, "! ", "How ", "are ", "you?"]
-        for token in tokens:
-            await asyncio.sleep(0.1)
-            yield token
-
-    return TextResponse(
-        context,
-        request,
-        configure=lambda response: setattr(response, "temperature", 0.7),
-        text=generate_tokens(),
-    )
-
-
-def main() -> None:
-    app.run()
-
-
-if __name__ == "__main__":
-    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_03_full_control.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_03_full_control.py
deleted file mode 100644
index 53b759418747..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_03_full_control.py
+++ /dev/null
@@ -1,167 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 03 — ResponseEventStream — Beyond TextResponse.
-
-When your handler needs to emit function calls, reasoning items, multiple
-outputs, or set custom Response properties, step up from ``TextResponse``
-to ``ResponseEventStream``.  Start with **convenience generators** — they
-handle the event lifecycle for you.  Drop down to **builders** only when
-you need fine-grained control over individual events.
-
-This sample shows three ways to emit the same greeting — all produce the
-identical SSE event sequence:
-
-  1. **Convenience** — ``output_item_message(text)``
-  2. **Streaming**  — ``aoutput_item_message(async_iterable)``
-  3. **Builder**    — ``add_output_item_message()`` → ``add_text_content()``
-     → ``emit_delta()`` / ``emit_done()``
-
-Usage::
-
-    # Start the server
-    python sample_03_full_control.py
-
-    # Send a request
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "greeting", "input": "Hi there!"}'
-    # -> {"output": [{"type": "message", "content": [{"type": "output_text",
-    #     "text": "Hello! You said: \"Hi there!\""}]}]}
-
-    # Stream the response
-    curl -N -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "greeting", "input": "Hi there!", "stream": true}'
-    # -> event: response.created            data: {"response": {"status": "in_progress", ...}}
-    # -> event: response.in_progress        data: {"response": {"status": "in_progress", ...}}
-    # -> event: response.output_item.added  data: {"item": {"type": "message", ...}}
-    # -> event: response.content_part.added data: {"part": {"type": "output_text", ...}}
-    # -> event: response.output_text.delta  data: {"delta": "Hello! You said: ..."}
-    # -> event: response.output_text.done   data: {"text": "Hello! You said: ..."}
-    # -> event: response.content_part.done  data: {"part": {"type": "output_text", ...}}
-    # -> event: response.output_item.done   data: {"item": {"type": "message", ...}}
-    # -> event: response.completed          data: {"response": {"status": "completed", ...}}
-"""
-
-import asyncio
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponseEventStream,
-    ResponsesAgentServerHost,
-)
-
-app = ResponsesAgentServerHost()
-
-
-# ── Variant 1: Convenience ──────────────────────────────────────────────
-# Use ``output_item_message(text)`` to emit a complete text message in one
-# call.  The convenience generator handles all inner events for you.
-
-
-@app.response_handler
-async def handler(
-    request: CreateResponse,
-    context: ResponseContext,
-    cancellation_signal: asyncio.Event,
-):
-    """Emit a greeting using the convenience generator."""
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-
-    # Configure Response properties BEFORE emit_created().
-    stream.response.temperature = 0.7
-    stream.response.max_output_tokens = 1024
-
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    # Emit a complete text message in one call.
-    input_text = await context.get_input_text()
-    for evt in stream.output_item_message(f'Hello! You said: "{input_text}"'):
-        yield evt
-
-    yield stream.emit_completed()
-
-
-# ── Variant 2: Streaming ────────────────────────────────────────────────
-# When your handler calls an LLM that produces tokens incrementally, pass
-# an ``AsyncIterable[str]`` to ``aoutput_item_message()``.  Each chunk
-# becomes a separate ``response.output_text.delta`` SSE event.
-
-
-async def handler_streaming(
-    request: CreateResponse,
-    context: ResponseContext,
-    cancellation_signal: asyncio.Event,
-):
-    """Stream tokens using the async convenience generator."""
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    # Stream tokens as they arrive — each chunk becomes a delta event.
-    async for evt in stream.aoutput_item_message(
-        _generate_tokens(await context.get_input_text()),
-    ):
-        yield evt
-
-    yield stream.emit_completed()
-
-
-async def _generate_tokens(input_text: str):
-    """Simulate an LLM producing tokens one at a time."""
-    tokens = ["Hello! ", "You ", "said: ", f'"{input_text}"']
-    for token in tokens:
-        await asyncio.sleep(0.1)
-        yield token
-
-
-# ── Variant 3: Builder (full event control) ─────────────────────────────
-# When you need to interleave non-event work between individual delta/done
-# calls within a content part, or set custom properties on the output item
-# before ``emit_added()``, drop down to the builder API.
-
-
-async def handler_builder(
-    request: CreateResponse,
-    context: ResponseContext,
-    cancellation_signal: asyncio.Event,
-):
-    """Demonstrate all builder events step by step."""
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-
-    # Configure Response properties BEFORE emit_created().
-    stream.response.temperature = 0.7
-    stream.response.max_output_tokens = 1024
-
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    # Add a message output item.
-    message = stream.add_output_item_message()
-    yield message.emit_added()
-
-    # Add text content to the message.
-    text_part = message.add_text_content()
-    yield text_part.emit_added()
-
-    # Emit the text body — delta first, then the final "done" with full text.
-    input_text = await context.get_input_text() or "Hello"
-    reply = f'Hello! You said: "{input_text}"'
-    yield text_part.emit_delta(reply)
-
-    yield text_part.emit_text_done()
-    yield text_part.emit_done()
-    yield message.emit_done()
-
-    yield stream.emit_completed()
-
-
-def main() -> None:
-    app.run()
-
-
-if __name__ == "__main__":
-    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_04_function_calling.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_04_function_calling.py
deleted file mode 100644
index 62a6ee7dd3b4..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_04_function_calling.py
+++ /dev/null
@@ -1,143 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 04 — Function Calling (two-turn pattern).
-
-Demonstrates a two-turn function-calling flow:
-
-  **Turn 1** — The handler emits a ``function_call`` output item asking the
-  client to call ``get_weather`` with specific arguments.
-
-  **Turn 2** — The client re-invokes the handler with a
-  ``function_call_output`` item in the input.  The handler reads that output
-  and responds with a text message.
-
-The handler is shown first using convenience generators, then with full
-builder control.
-
-Usage::
-
-    # Start the server
-    python sample_04_function_calling.py
-
-    # Turn 1 — triggers a function call
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "test", "input": "What is the weather in Seattle?"}'
-    # -> {"output": [{"type": "function_call", "name": "get_weather",
-    #     "call_id": "call_weather_1", "arguments": "{\"location\": \"Seattle\", ...}"}]}
-
-    # Turn 2 — submit function output, receive text
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "test", "input": [{"type": "function_call_output",
-             "call_id": "call_weather_1", "output": "72F and sunny"}]}'
-    # -> {"output": [{"type": "message", "content": [{"type": "output_text",
-    #     "text": "The weather is: 72F and sunny"}]}]}
-"""
-
-from __future__ import annotations
-
-import asyncio
-import json
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponseEventStream,
-    ResponsesAgentServerHost,
-)
-from azure.ai.agentserver.responses.models import FunctionCallOutputItemParam
-
-app = ResponsesAgentServerHost()
-
-
-async def _find_function_call_output(context: ResponseContext) -> str | None:
-    """Return the output string from the first function_call_output item, or None."""
-    for item in await context.get_input_items():
-        if isinstance(item, FunctionCallOutputItemParam):
-            output = item.output
-            if isinstance(output, str):
-                return output
-    return None
-
-
-# ── Variant 1: Convenience ──────────────────────────────────────────────
-# Use ``output_item_function_call()`` and ``output_item_message()`` to emit
-# complete output items in one call each.
-
-
-@app.response_handler
-async def handler(
-    request: CreateResponse,
-    context: ResponseContext,
-    cancellation_signal: asyncio.Event,
-):
-    """Two-turn function-calling handler using convenience generators."""
-    tool_output = await _find_function_call_output(context)
-
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    if tool_output is not None:
-        # Turn 2: we have the tool result — produce a final text message.
-        async for event in stream.aoutput_item_message(f"The weather is: {tool_output}"):
-            yield event
-    else:
-        # Turn 1: ask the client to call get_weather.
-        arguments = json.dumps({"location": "Seattle", "unit": "fahrenheit"})
-        async for event in stream.aoutput_item_function_call("get_weather", "call_weather_1", arguments):
-            yield event
-
-    yield stream.emit_completed()
-
-
-# ── Variant 2: Builder (full event control) ─────────────────────────────
-# When you need to set custom properties on the function call item before
-# ``emit_added()``, or interleave non-event work between builder calls,
-# use the builder API.
-
-
-async def handler_builder(
-    request: CreateResponse,
-    context: ResponseContext,
-    cancellation_signal: asyncio.Event,
-):
-    """Two-turn function-calling handler using the builder API."""
-    tool_output = await _find_function_call_output(context)
-
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    if tool_output is not None:
-        # Turn 2: function output received — return the weather as text.
-        message = stream.add_output_item_message()
-        yield message.emit_added()
-
-        text_part = message.add_text_content()
-        yield text_part.emit_added()
-
-        reply = f"The weather is: {tool_output}"
-        yield text_part.emit_delta(reply)
-        yield text_part.emit_text_done()
-        yield text_part.emit_done()
-        yield message.emit_done()
-    else:
-        # Turn 1: emit a function call for "get_weather".
-        arguments = json.dumps({"location": "Seattle", "unit": "fahrenheit"})
-        fc = stream.add_output_item_function_call(name="get_weather", call_id="call_weather_1")
-        yield fc.emit_added()
-        yield fc.emit_arguments_delta(arguments)
-        yield fc.emit_arguments_done(arguments)
-        yield fc.emit_done()
-
-    yield stream.emit_completed()
-
-
-def main() -> None:
-    app.run()
-
-
-if __name__ == "__main__":
-    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_05_conversation_history.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_05_conversation_history.py
deleted file mode 100644
index 4efd2652effc..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_05_conversation_history.py
+++ /dev/null
@@ -1,86 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 05 — Conversation History — Study Tutor.
-
-Demonstrates reading conversation history via ``context.get_history()``
-using ``TextResponse``.  The study
-tutor references previous turns to give contextual follow-up answers,
-demonstrating multi-turn conversational flows using
-``previous_response_id``.
-
-The server is configured with
-``ResponsesServerOptions(default_fetch_history_count=20)`` to limit the
-number of history items fetched per request.
-
-Usage::
-
-    # Start the server
-    python sample_05_conversation_history.py
-
-    # Turn 1 — initial message (no history)
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "tutor", "input": "Explain photosynthesis."}'
-    # -> {"id": "resp_...", "output": [{"type": "message", "content":
-    #     [{"type": "output_text", "text": "Welcome! I'm your study tutor. You asked: ..."}]}]}
-
-    # Turn 2 — chain via previous_response_id (use the id from Turn 1)
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "tutor", "input": "What role does chlorophyll play?", "previous_response_id": "<id-from-turn-1>"}'
-    # -> {"output": [{"type": "message", "content": [{"type": "output_text",
-    #     "text": "[Turn 2] Building on our previous discussion ..."}]}]}
-"""
-
-import asyncio
-from collections.abc import Sequence
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponsesAgentServerHost,
-    ResponsesServerOptions,
-    TextResponse,
-)
-from azure.ai.agentserver.responses.models import OutputItem
-
-app = ResponsesAgentServerHost(
-    options=ResponsesServerOptions(default_fetch_history_count=20),
-)
-
-
-def _build_reply(current_input: str, history: Sequence[OutputItem]) -> str:
-    """Compose a study-tutor reply that references the conversation history."""
-    history_messages = [item for item in history if getattr(item, "type", None) == "message"]
-    turn_number = len(history_messages) + 1
-
-    if not history_messages:
-        return f'Welcome! I\'m your study tutor. You asked: "{current_input}". Let me help you understand that topic.'
-
-    last = history_messages[-1]
-    last_text = "(none)"
-    if last.get("content"):
-        raw = last["content"][0].get("text", "(none)")
-        last_text = raw[:50] + "..." if len(raw) > 50 else raw
-
-    return (
-        f"[Turn {turn_number}] Building on our previous discussion "
-        f'(last answer: "{last_text}"), '
-        f'you asked: "{current_input}".'
-    )
-
-
-@app.response_handler
-async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
-    """Study tutor that reads and references conversation history."""
-    history = await context.get_history()
-    current_input = await context.get_input_text()
-    return TextResponse(context, request, text=_build_reply(current_input, history))
-
-
-def main() -> None:
-    app.run()
-
-
-if __name__ == "__main__":
-    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_06_multi_output.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_06_multi_output.py
deleted file mode 100644
index 6b02bdf84b77..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_06_multi_output.py
+++ /dev/null
@@ -1,145 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 06 — Multi-Output — Math Problem Solver with Reasoning.
-
-Builds a math problem solver that shows its work.  The agent emits a
-**reasoning** item (the thought process) followed by a **message** item
-(the final answer).  This demonstrates streaming multiple output types in
-a single response — first using convenience generators, then with full
-builder control.
-
-Usage::
-
-    # Start the server
-    python sample_06_multi_output.py
-
-    # Send a math question
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "math", "input": "What is 6 times 7?"}'
-    # -> {"output": [{"type": "reasoning", ...},
-    #     {"type": "message", "content": [{"type": "output_text",
-    #      "text": "The answer is 42. Here's how: 6 × 7 = 42. ..."}]}]}
-
-    # Stream to see reasoning + answer arrive in sequence
-    curl -N -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "math", "input": "What is 6 times 7?", "stream": true}'
-    # -> event: response.created            data: {"response": {"status": "in_progress", ...}}
-    # -> event: response.in_progress        data: {"response": {"status": "in_progress", ...}}
-    # -> event: response.output_item.added  data: {"item": {"type": "reasoning", ...}}
-    # -> event: response.output_item.done   data: {"item": {"type": "reasoning", ...}}
-    # -> event: response.output_item.added  data: {"item": {"type": "message", ...}}
-    # -> event: response.content_part.added data: {"part": {"type": "output_text", ...}}
-    # -> event: response.output_text.delta  data: {"delta": "The answer is 42. ..."}
-    # -> event: response.output_text.done   data: {"text": "The answer is 42. ..."}
-    # -> event: response.content_part.done  data: {"part": {"type": "output_text", ...}}
-    # -> event: response.output_item.done   data: {"item": {"type": "message", ...}}
-    # -> event: response.completed          data: {"response": {"status": "completed", ...}}
-"""
-
-import asyncio
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponseEventStream,
-    ResponsesAgentServerHost,
-)
-
-app = ResponsesAgentServerHost()
-
-
-# ── Variant 1: Convenience ──────────────────────────────────────────────
-# Use ``output_item_reasoning_item()`` and ``output_item_message()`` to
-# emit complete output items with one call each.
-
-
-@app.response_handler
-async def handler(
-    request: CreateResponse,
-    context: ResponseContext,
-    cancellation_signal: asyncio.Event,
-):
-    """Emit reasoning and answer using convenience generators."""
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-    question = await context.get_input_text() or "What is 6 times 7?"
-
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    # Output item 0: Reasoning — show the thought process.
-    thought = (
-        f'The user asked: "{question}". '
-        "I need to identify the mathematical operation, "
-        "compute the result, and explain the steps."
-    )
-    for evt in stream.output_item_reasoning_item(thought):
-        yield evt
-
-    # Output item 1: Message — the final answer.
-    answer = "The answer is 42. Here's how: 6 × 7 = 42. The multiplication of 6 and 7 gives 42."
-    for evt in stream.output_item_message(answer):
-        yield evt
-
-    yield stream.emit_completed()
-
-
-# ── Variant 2: Builder (full event control) ─────────────────────────────
-# When you need multiple summary parts in a single reasoning item, set
-# custom properties on output items before ``emit_added()``, or interleave
-# non-event work between builder calls, use the builder API.
-
-
-async def handler_builder(
-    request: CreateResponse,
-    context: ResponseContext,
-    cancellation_signal: asyncio.Event,
-):
-    """Emit reasoning and answer using the builder API."""
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-    question = await context.get_input_text() or "What is 6 times 7?"
-
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    # Output item 0: Reasoning — show the thought process.
-    reasoning = stream.add_output_item_reasoning_item()
-    yield reasoning.emit_added()
-
-    summary = reasoning.add_summary_part()
-    yield summary.emit_added()
-
-    thought = (
-        f'The user asked: "{question}". '
-        "I need to identify the mathematical operation, "
-        "compute the result, and explain the steps."
-    )
-    yield summary.emit_text_delta(thought)
-    yield summary.emit_text_done(thought)
-    yield summary.emit_done()
-
-    yield reasoning.emit_done()
-
-    # Output item 1: Message — the final answer.
-    message = stream.add_output_item_message()
-    yield message.emit_added()
-
-    text_part = message.add_text_content()
-    yield text_part.emit_added()
-
-    answer = "The answer is 42. Here's how: 6 × 7 = 42. The multiplication of 6 and 7 gives 42."
-    yield text_part.emit_delta(answer)
-    yield text_part.emit_text_done()
-    yield text_part.emit_done()
-    yield message.emit_done()
-
-    yield stream.emit_completed()
-
-
-def main() -> None:
-    app.run()
-
-
-if __name__ == "__main__":
-    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_07_customization.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_07_customization.py
deleted file mode 100644
index b01485ea29de..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_07_customization.py
+++ /dev/null
@@ -1,64 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 07 — Customization Options.
-
-Shows how to configure the server with custom runtime options:
-
-  - ``ResponsesServerOptions`` for default model, SSE keep-alive, and
-    shutdown grace period.
-  - ``log_level`` on the host for verbose logging.
-  - A handler that relies on ``request.model``, which is automatically
-    filled from ``default_model`` when the client omits it.
-
-Usage::
-
-    # Start the server (with DEBUG logging)
-    python sample_07_customization.py
-
-    # Send a request (model defaults to gpt-4o via default_model)
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"input": "Hello!"}'
-    # -> {"output": [{"type": "message", "content":
-    #     [{"type": "output_text", "text": "[model=gpt-4o] Echo: Hello!"}]}]}
-
-    # Override the model explicitly
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "custom", "input": "Hello!"}'
-    # -> {"output": [{"type": "message", "content":
-    #     [{"type": "output_text", "text": "[model=custom] Echo: Hello!"}]}]}
-"""
-
-import asyncio
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponsesAgentServerHost,
-    ResponsesServerOptions,
-    TextResponse,
-)
-
-options = ResponsesServerOptions(
-    default_model="gpt-4o",
-    sse_keep_alive_interval_seconds=5,
-    shutdown_grace_period_seconds=15,
-)
-
-app = ResponsesAgentServerHost(options=options, log_level="DEBUG")
-
-
-@app.response_handler
-async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
-    """Echo handler that reports which model is being used."""
-    input_text = await context.get_input_text()
-    return TextResponse(context, request, text=f"[model={request.model}] Echo: {input_text}")
-
-
-def main() -> None:
-    app.run()
-
-
-if __name__ == "__main__":
-    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_08_mixin_composition.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_08_mixin_composition.py
deleted file mode 100644
index 666774772b28..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_08_mixin_composition.py
+++ /dev/null
@@ -1,81 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 08 — Mixin Composition (multi-protocol).
-
-Demonstrates running both the **Invocations** and **Responses** protocols
-on a single server using Python's cooperative (mixin) inheritance.
-
-Endpoints exposed:
-    POST  /invocations               — Invocation protocol
-    POST  /responses                  — Responses protocol
-    GET   /readiness                  — Health probe (from core)
-
-Usage::
-
-    # Start the dual-protocol server
-    python sample_08_mixin_composition.py
-
-    # Hit the Invocation endpoint
-    curl -X POST http://localhost:8088/invocations \
-        -H "Content-Type: application/json" \
-        -d '{"message": "Hello!"}'
-    # -> {"invocation_id": "...", "status": "completed",
-    #     "output": "[Invocation] Echo: Hello!"}
-
-    # Hit the Responses endpoint
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "test", "input": "Hello!"}'
-    # -> {"output": [{"type": "message", "content":
-    #     [{"type": "output_text", "text": "[Response] Echo: Hello!"}]}]}
-"""
-
-import asyncio
-
-from azure.ai.agentserver.invocations import InvocationAgentServerHost
-from starlette.requests import Request
-from starlette.responses import JSONResponse, Response
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponsesAgentServerHost,
-    TextResponse,
-)
-
-
-class MyHost(InvocationAgentServerHost, ResponsesAgentServerHost):
-    pass
-
-
-app = MyHost()
-
-
-@app.invoke_handler
-async def handle_invoke(request: Request) -> Response:
-    """Echo invocation: returns the message from the JSON body."""
-    data = await request.json()
-    invocation_id = request.state.invocation_id
-    message = data.get("message", "")
-    return JSONResponse(
-        {
-            "invocation_id": invocation_id,
-            "status": "completed",
-            "output": f"[Invocation] Echo: {message}",
-        }
-    )
-
-
-@app.response_handler
-async def handle_response(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
-    """Echo response: returns the user's input text."""
-    input_text = await context.get_input_text()
-    return TextResponse(context, request, text=f"[Response] Echo: {input_text}")
-
-
-def main() -> None:
-    app.run()
-
-
-if __name__ == "__main__":
-    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_09_self_hosting.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_09_self_hosting.py
deleted file mode 100644
index aa212ab654af..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_09_self_hosting.py
+++ /dev/null
@@ -1,64 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 09 — Self-Hosting (mounting into an existing Starlette app).
-
-Shows how to mount the ``ResponsesAgentServerHost`` into a parent
-Starlette application so responses endpoints live under a custom
-URL prefix (e.g. ``/api/responses``).
-
-Because ``ResponsesAgentServerHost`` **is** a Starlette application,
-it can be used as a sub-application via ``starlette.routing.Mount``.
-
-Usage::
-
-    # Start the server
-    python sample_09_self_hosting.py
-
-    # Responses are mounted under /api
-    curl -X POST http://localhost:8000/api/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "test", "input": "Hello!"}'
-    # -> {"output": [{"type": "message", "content":
-    #     [{"type": "output_text", "text": "Self-hosted echo: Hello!"}]}]}
-"""
-
-import asyncio
-
-from starlette.applications import Starlette
-from starlette.routing import Mount
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponsesAgentServerHost,
-    TextResponse,
-)
-
-# Create the responses host (it IS a Starlette app)
-responses_app = ResponsesAgentServerHost()
-
-
-@responses_app.response_handler
-async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
-    """Echo handler mounted under /api."""
-    input_text = await context.get_input_text()
-    return TextResponse(context, request, text=f"Self-hosted echo: {input_text}")
-
-
-# Mount into a parent Starlette app
-app = Starlette(
-    routes=[
-        Mount("/api", app=responses_app),
-    ]
-)
-# Now responses are at /api/responses
-
-
-def main() -> None:
-    import uvicorn
-
-    uvicorn.run(app)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_10_streaming_upstream.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_10_streaming_upstream.py
deleted file mode 100644
index 060480873a2a..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_10_streaming_upstream.py
+++ /dev/null
@@ -1,177 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 10 — Streaming Upstream (forward to OpenAI-compatible server).
-
-Demonstrates how to forward a request to an upstream OpenAI-compatible API
-that returns streaming Server-Sent Events, translating each upstream
-chunk into local response events using the ``openai`` Python SDK.
-
-The handler **owns the response lifecycle** — it constructs its own
-``response.created``, ``response.in_progress``, and terminal events — while
-translating upstream **content events** (output items, text deltas,
-function-call arguments, reasoning, tool calls) and yielding them directly.
-Both model stacks share the same JSON wire contract, so content events
-round-trip with full fidelity.
-
-This is **not** a transparent proxy.  The sample showcases type
-compatibility between the two model stacks.  In practice you would add
-orchestration logic — filtering outputs, injecting items, calling multiple
-upstreams, or transforming content — between the upstream call and the
-``yield``.
-
-Usage::
-
-    # Start the server (set upstream endpoint and API key)
-    UPSTREAM_ENDPOINT=http://localhost:5211 OPENAI_API_KEY=your-key \
-        python sample_10_streaming_upstream.py
-
-    # Send a streaming request
-    curl -N -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "gpt-4o-mini", "input": "Say hello!", "stream": true}'
-    # -> event: response.created            data: {"response": {"status": "in_progress", ...}}
-    # -> event: response.in_progress        data: {"response": {"status": "in_progress", ...}}
-    # -> event: response.output_item.added  data: {"item": {"type": "message", ...}}
-    # -> event: response.content_part.added data: {"part": {"type": "output_text", ...}}
-    # -> event: response.output_text.delta  data: {"delta": "..."}
-    # -> ...                                     (more deltas)
-    # -> event: response.output_text.done   data: {"text": "..."}
-    # -> event: response.output_item.done   data: {"item": {"type": "message", ...}}
-    # -> event: response.completed          data: {"response": {"status": "completed", ...}}
-"""
-
-import asyncio
-import os
-from typing import Any, cast
-
-import openai
-import openai.types.responses.response_stream_event
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponsesAgentServerHost,
-)
-
-app = ResponsesAgentServerHost()
-
-upstream = openai.AsyncOpenAI(
-    base_url=os.environ.get("UPSTREAM_ENDPOINT", "https://api.openai.com/v1"),
-    api_key=os.environ.get("OPENAI_API_KEY", "your-api-key"),
-)
-
-
-def _build_response_snapshot(request: CreateResponse, context: ResponseContext) -> dict[str, Any]:
-    """Construct a response snapshot dict from request + context."""
-    snapshot: dict[str, Any] = {
-        "id": context.response_id,
-        "object": "response",
-        "status": "in_progress",
-        "model": request.model or "",
-        "output": [],
-    }
-    if request.metadata is not None:
-        snapshot["metadata"] = request.metadata
-    if request.background is not None:
-        snapshot["background"] = request.background
-    if request.previous_response_id is not None:
-        snapshot["previous_response_id"] = request.previous_response_id
-    # Normalize conversation to ConversationReference form.
-    conv = request.conversation
-    if isinstance(conv, str):
-        snapshot["conversation"] = {"id": conv}
-    elif isinstance(conv, dict) and conv.get("id"):
-        snapshot["conversation"] = {"id": conv["id"]}
-    return snapshot
-
-
-def my_function_tool(x: int) -> int:
-    return x * 2
-
-
-@app.response_handler
-async def handler(
-    request: CreateResponse,
-    context: ResponseContext,
-    cancellation_signal: asyncio.Event,
-):
-    """Forward to upstream with streaming, translate content events back."""
-
-    # Build the upstream request — translate every input item.
-    # Both model stacks share the same JSON wire contract, so
-    # serializing our Item to dict round-trips to the OpenAI SDK.
-    input_items = [item.as_dict() for item in await context.get_input_items()]
-
-    # This handler owns the response lifecycle — construct the
-    # response snapshot directly instead of forwarding the upstream's.
-    # Seeding from the request preserves metadata, conversation, model.
-    snapshot = _build_response_snapshot(request, context)
-
-    # Lifecycle events nest the response snapshot under "response"
-    # — matching the SSE wire format.
-    yield {"type": "response.created", "response": snapshot}
-    yield {"type": "response.in_progress", "response": snapshot}
-
-    # Stream from the upstream.  Translate content events (output
-    # items, deltas, etc.) and yield them directly.  Skip upstream
-    # lifecycle events — we own the response envelope.
-    output_items: list[dict[str, Any]] = []
-    upstream_failed = False
-
-    async with await upstream.responses.create(
-        model=request.model or "gpt-4o-mini",
-        input=input_items,  # type: ignore[arg-type]
-        stream=True,
-    ) as upstream_stream:
-        upstream_stream = cast(
-            openai.AsyncStream[openai.types.responses.response_stream_event.ResponseStreamEvent], upstream_stream
-        )
-        async for event in upstream_stream:
-            # Skip lifecycle events — we own the response envelope.
-            if event.type in ("response.created", "response.in_progress"):
-                continue
-
-            if event.type == "response.completed":
-                break
-
-            if event.type == "response.failed":
-                upstream_failed = True
-                break
-
-            # Do any custom orchestration or manipulation of the event stream here.
-            # In this example, we filter out reasoning text events as being too
-            # noisy.
-            if event.type.startswith("response.reasoning_text"):
-                continue
-
-            # Translate the upstream event to a dict via the openai SDK.
-            evt = event.model_dump()
-
-            # Clear upstream response_id on output items so the
-            # orchestrator's auto-stamp fills in this server's ID.
-            if event.type == "response.output_item.added":
-                evt.get("item", {}).pop("response_id", None)
-            elif event.type == "response.output_item.done":
-                item = evt.get("item", {})
-                item.pop("response_id", None)
-                output_items.append(item)
-
-            yield evt
-
-    # Emit terminal event — the handler decides the outcome.
-    if upstream_failed:
-        snapshot["status"] = "failed"
-        snapshot["error"] = {"code": "server_error", "message": "Upstream request failed"}
-        yield {"type": "response.failed", "response": snapshot}
-    else:
-        snapshot["status"] = "completed"
-        snapshot["output"] = output_items
-        yield {"type": "response.completed", "response": snapshot}
-
-
-def main() -> None:
-    app.run()
-
-
-if __name__ == "__main__":
-    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_11_non_streaming_upstream.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_11_non_streaming_upstream.py
deleted file mode 100644
index 63239e29c716..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_11_non_streaming_upstream.py
+++ /dev/null
@@ -1,119 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 11 — Non-Streaming Upstream (call upstream, build event stream).
-
-Demonstrates forwarding a request to an upstream OpenAI-compatible API
-that returns a complete (non-streaming) response, then using the builder
-API to construct output items for the client.
-
-The handler calls the upstream without streaming, waits for the complete
-response, and uses ``output_item_message`` and ``output_item_reasoning_item``
-to emit ``output_item.added`` / ``output_item.done`` pairs for each item.
-
-This pattern is useful when your handler needs to inspect or transform the
-full response before streaming it to the client — for example, filtering
-output items, injecting additional context, or calling multiple upstreams.
-
-Usage::
-
-    # Start the server (set upstream endpoint and API key)
-    UPSTREAM_ENDPOINT=http://localhost:5211 OPENAI_API_KEY=your-key \
-        python sample_11_non_streaming_upstream.py
-
-    # Send a request
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "gpt-4o-mini", "input": "Say hello!"}'
-    # -> {"output": [{"type": "message", "content":
-    #     [{"type": "output_text", "text": "Hello! ..."}]}]}
-
-    # Stream the response
-    curl -N -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "gpt-4o-mini", "input": "Say hello!", "stream": true}'
-    # -> event: response.created            data: {"response": {"status": "in_progress", ...}}
-    # -> event: response.in_progress        data: {"response": {"status": "in_progress", ...}}
-    # -> event: response.output_item.added  data: {"item": {"type": "message", ...}}
-    # -> event: response.content_part.added data: {"part": {"type": "output_text", ...}}
-    # -> event: response.output_text.delta  data: {"delta": "..."}
-    # -> event: response.output_text.done   data: {"text": "..."}
-    # -> event: response.content_part.done  data: {"part": {"type": "output_text", ...}}
-    # -> event: response.output_item.done   data: {"item": {"type": "message", ...}}
-    # -> event: response.completed          data: {"response": {"status": "completed", ...}}
-"""
-
-import asyncio
-import os
-
-import openai
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponseEventStream,
-    ResponsesAgentServerHost,
-)
-
-app = ResponsesAgentServerHost()
-
-
-@app.response_handler
-async def handler(
-    request: CreateResponse,
-    context: ResponseContext,
-    cancellation_signal: asyncio.Event,
-):
-    """Call upstream (non-streaming), emit every output item."""
-    upstream = openai.AsyncOpenAI(
-        base_url=os.environ.get("UPSTREAM_ENDPOINT", "https://api.openai.com/v1"),
-        api_key=os.environ.get("OPENAI_API_KEY", "your-api-key"),
-    )
-
-    # Build the upstream request — translate every input item.
-    # Both model stacks share the same JSON wire contract, so
-    # serializing our Item to dict round-trips to the OpenAI SDK.
-    input_items = [item.as_dict() for item in await context.get_input_items()]
-
-    # Call upstream without streaming and get the complete response.
-    result = await upstream.responses.create(
-        model=request.model or "gpt-4o-mini",
-        input=input_items,  # type: ignore[arg-type]
-    )
-
-    # Build a standard SSE event stream.  Seed from the request to
-    # preserve metadata, conversation, and agent reference.
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    # Translate every upstream output item back into local events.
-    # Use the convenience generators to emit the full lifecycle for
-    # each output-item type.
-    for upstream_item in result.output:
-        if upstream_item.type == "message":
-            # Extract text content from the message.
-            output_text = ""
-            for part in upstream_item.content:
-                if part.type == "output_text":
-                    output_text += part.text
-            for event in stream.output_item_message(output_text):
-                yield event
-        elif upstream_item.type == "reasoning":
-            # Extract reasoning summary text.
-            summary = ""
-            for part in upstream_item.summary:
-                if part.type == "summary_text":
-                    summary += part.text
-            for event in stream.output_item_reasoning_item(summary):
-                yield event
-        # Add additional item types as needed (function_call, etc.)
-
-    yield stream.emit_completed()
-
-
-def main() -> None:
-    app.run()
-
-
-if __name__ == "__main__":
-    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_12_image_generation.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_12_image_generation.py
deleted file mode 100644
index bbaa43fd2750..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_12_image_generation.py
+++ /dev/null
@@ -1,101 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 12 — Image Generation — Returning images from a handler.
-
-This sample demonstrates three ways to return base64-encoded image data
-as an ``image_generation_call`` output item:
-
-  1. **Convenience** — ``output_item_image_gen_call(result_b64)`` one-liner.
-  2. **Streaming partials** — Use the builder to emit partial images
-     between the ``generating`` and ``completed`` states.
-  3. **Full control** — Manual builder lifecycle with ``emit_added()``,
-     state transitions, and ``emit_done(result)``.
-
-Usage::
-
-    python sample_12_image_generation.py
-
-    # Convenience handler
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "image", "input": "Draw a red square"}'
-
-    # Streaming partials
-    curl -N -X POST http://localhost:8088/responses?handler=streaming \
-        -H "Content-Type: application/json" \
-        -d '{"model": "image", "input": "Draw a blue circle", "stream": true}'
-"""
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponseEventStream,
-    ResponsesAgentServerHost,
-)
-
-app = ResponsesAgentServerHost()
-
-# A tiny 1x1 red PNG pixel (base64-encoded) used as a synthetic image.
-TINY_IMAGE_B64 = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR4nGP4z8BQDwAEgAF/pooBPQAAAABJRU5ErkJggg=="
-
-
-# ── Variant 1: Convenience ──────────────────────────────────────────────
-@app.create("image.convenience")
-async def convenience_handler(request: CreateResponse, context: ResponseContext):
-    """Return an image using the convenience one-liner."""
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    # One call emits: added → in_progress → generating → completed → done(result)
-    async for event in stream.aoutput_item_image_gen_call(TINY_IMAGE_B64):
-        yield event
-
-    yield stream.emit_completed()
-
-
-# ── Variant 2: Streaming partial images ─────────────────────────────────
-@app.create("image.streaming")
-async def streaming_handler(request: CreateResponse, context: ResponseContext):
-    """Stream partial image renders before the final result."""
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    ig = stream.add_output_item_image_gen_call()
-    yield ig.emit_added()
-    yield ig.emit_in_progress()
-    yield ig.emit_generating()
-
-    # Simulate streaming partial renders
-    for i in range(3):
-        yield ig.emit_partial_image(f"partial_{i}_{TINY_IMAGE_B64[:20]}")
-
-    yield ig.emit_completed()
-    yield ig.emit_done(TINY_IMAGE_B64)
-
-    yield stream.emit_completed()
-
-
-# ── Variant 3: Full control ─────────────────────────────────────────────
-@app.create("image.full_control")
-async def full_control_handler(request: CreateResponse, context: ResponseContext):
-    """Full manual control over the image generation lifecycle."""
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    ig = stream.add_output_item_image_gen_call()
-    yield ig.emit_added()
-    yield ig.emit_in_progress()
-    yield ig.emit_generating()
-    yield ig.emit_completed()
-    yield ig.emit_done(TINY_IMAGE_B64)
-
-    yield stream.emit_completed()
-
-
-if __name__ == "__main__":
-    import uvicorn
-
-    uvicorn.run(app.build(), host="0.0.0.0", port=8088)
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_13_image_input.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_13_image_input.py
deleted file mode 100644
index 0f85d2caec61..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_13_image_input.py
+++ /dev/null
@@ -1,116 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 13 — Image Input — Receiving images from the caller.
-
-Callers can send images in three ways: via URL, as a base64 ``data:`` URL
-embedded in the ``image_url`` field, or via ``file_id``.  This sample
-registers a handler for each input method and echoes back what was received.
-
-The ``data_url`` utility module provides helpers for decoding inline
-base64 image data.
-
-Usage::
-
-    python sample_13_image_input.py
-
-    # URL input
-    curl -X POST http://localhost:8088/responses?handler=url \
-        -H "Content-Type: application/json" \
-        -d '{
-          "model": "img", "input": [
-            {"role": "user", "content": [
-              {"type": "input_image", "image_url": "https://example.com/photo.png"}
-            ]}
-          ]
-        }'
-
-    # Base64 data URL input
-    curl -X POST http://localhost:8088/responses?handler=base64 \
-        -H "Content-Type: application/json" \
-        -d '{
-          "model": "img", "input": [
-            {"role": "user", "content": [
-              {"type": "input_image", "image_url": "data:image/png;base64,iVBORw0KGgo..."}
-            ]}
-          ]
-        }'
-
-    # File ID input
-    curl -X POST http://localhost:8088/responses?handler=fileid \
-        -H "Content-Type: application/json" \
-        -d '{
-          "model": "img", "input": [
-            {"role": "user", "content": [
-              {"type": "input_image", "file_id": "/images/photo.png"}
-            ]}
-          ]
-        }'
-"""
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponsesAgentServerHost,
-    TextResponse,
-)
-from azure.ai.agentserver.responses._data_url import get_media_type, is_data_url, try_decode_bytes
-from azure.ai.agentserver.responses.models import ItemMessage, MessageContentInputImageContent
-
-app = ResponsesAgentServerHost()
-
-
-def _extract_images(items):
-    """Extract ``MessageContentInputImageContent`` from expanded input items."""
-    images = []
-    for item in items:
-        if not isinstance(item, ItemMessage):
-            continue
-        for content in item.content or []:
-            if isinstance(content, MessageContentInputImageContent):
-                images.append(content)
-    return images
-
-
-# ── Handler 1: Image URL ────────────────────────────────────────────────
-@app.create("image_input.url")
-async def url_handler(request: CreateResponse, context: ResponseContext):
-    """Echo back the image URL received from the caller."""
-    items = await context.get_input_items()
-    images = _extract_images(items)
-
-    urls = [img.image_url for img in images if img.image_url and not is_data_url(img.image_url)]
-    return TextResponse(context, request, text=f"Received {len(urls)} image URL(s): {', '.join(urls)}")
-
-
-# ── Handler 2: Base64 data URL ──────────────────────────────────────────
-@app.create("image_input.base64")
-async def base64_handler(request: CreateResponse, context: ResponseContext):
-    """Decode inline base64 image data and report media type + size."""
-    items = await context.get_input_items()
-    images = _extract_images(items)
-
-    results = []
-    for img in images:
-        if img.image_url and is_data_url(img.image_url):
-            raw = try_decode_bytes(img.image_url)
-            media = get_media_type(img.image_url)
-            size = len(raw) if raw else 0
-            results.append(f"{media or 'unknown'} ({size} bytes)")
-    return TextResponse(context, request, text=f"Decoded {len(results)} image(s): {'; '.join(results)}")
-
-
-# ── Handler 3: File ID ──────────────────────────────────────────────────
-@app.create("image_input.file_id")
-async def file_id_handler(request: CreateResponse, context: ResponseContext):
-    """Echo back the file_id received from the caller."""
-    items = await context.get_input_items()
-    images = _extract_images(items)
-
-    file_ids = [img.file_id for img in images if img.file_id]
-    return TextResponse(context, request, text=f"Received {len(file_ids)} file ID(s): {', '.join(file_ids)}")
-
-
-if __name__ == "__main__":
-    import uvicorn
-
-    uvicorn.run(app.build(), host="0.0.0.0", port=8088)
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_14_file_inputs.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_14_file_inputs.py
deleted file mode 100644
index 6636d3a3f829..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_14_file_inputs.py
+++ /dev/null
@@ -1,113 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 14 — File Inputs — Receiving files from the caller.
-
-Callers can send files in three ways: as a base64 ``data:`` URL in
-``file_data``, via ``file_url``, or via ``file_id``.  This sample
-registers a handler for each input method and echoes back what was received.
-
-Usage::
-
-    python sample_14_file_inputs.py
-
-    # Base64 data URL input
-    curl -X POST http://localhost:8088/responses?handler=base64 \
-        -H "Content-Type: application/json" \
-        -d '{
-          "model": "files", "input": [
-            {"role": "user", "content": [
-              {"type": "input_file", "file_data": "data:application/pdf;base64,JVBERi0..."}
-            ]}
-          ]
-        }'
-
-    # URL input
-    curl -X POST http://localhost:8088/responses?handler=url \
-        -H "Content-Type: application/json" \
-        -d '{
-          "model": "files", "input": [
-            {"role": "user", "content": [
-              {"type": "input_file", "file_url": "https://example.com/report.pdf"}
-            ]}
-          ]
-        }'
-
-    # File ID input
-    curl -X POST http://localhost:8088/responses?handler=fileid \
-        -H "Content-Type: application/json" \
-        -d '{
-          "model": "files", "input": [
-            {"role": "user", "content": [
-              {"type": "input_file", "file_id": "/reports/summary.pdf"}
-            ]}
-          ]
-        }'
-"""
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponsesAgentServerHost,
-    TextResponse,
-)
-from azure.ai.agentserver.responses._data_url import get_media_type, is_data_url, try_decode_bytes
-from azure.ai.agentserver.responses.models import ItemMessage, MessageContentInputFileContent
-
-app = ResponsesAgentServerHost()
-
-
-def _extract_files(items):
-    """Extract ``MessageContentInputFileContent`` from expanded input items."""
-    files = []
-    for item in items:
-        if not isinstance(item, ItemMessage):
-            continue
-        for content in item.content or []:
-            if isinstance(content, MessageContentInputFileContent):
-                files.append(content)
-    return files
-
-
-# ── Handler 1: Base64 data URL ──────────────────────────────────────────
-@app.create("file_input.base64")
-async def base64_handler(request: CreateResponse, context: ResponseContext):
-    """Decode inline base64 file data and report media type + size."""
-    items = await context.get_input_items()
-    files = _extract_files(items)
-
-    results = []
-    for f in files:
-        if f.file_data and is_data_url(f.file_data):
-            raw = try_decode_bytes(f.file_data)
-            media = get_media_type(f.file_data)
-            size = len(raw) if raw else 0
-            results.append(f"{media or 'unknown'} ({size} bytes)")
-    return TextResponse(context, request, text=f"Decoded {len(results)} file(s): {'; '.join(results)}")
-
-
-# ── Handler 2: File URL ─────────────────────────────────────────────────
-@app.create("file_input.url")
-async def url_handler(request: CreateResponse, context: ResponseContext):
-    """Echo back the file URL received from the caller."""
-    items = await context.get_input_items()
-    files = _extract_files(items)
-
-    urls = [f.file_url for f in files if f.file_url]
-    return TextResponse(context, request, text=f"Received {len(urls)} file URL(s): {', '.join(urls)}")
-
-
-# ── Handler 3: File ID ──────────────────────────────────────────────────
-@app.create("file_input.file_id")
-async def file_id_handler(request: CreateResponse, context: ResponseContext):
-    """Echo back the file_id received from the caller."""
-    items = await context.get_input_items()
-    files = _extract_files(items)
-
-    file_ids = [f.file_id for f in files if f.file_id]
-    return TextResponse(context, request, text=f"Received {len(file_ids)} file ID(s): {', '.join(file_ids)}")
-
-
-if __name__ == "__main__":
-    import uvicorn
-
-    uvicorn.run(app.build(), host="0.0.0.0", port=8088)
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_15_annotations.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_15_annotations.py
deleted file mode 100644
index 71685cde9c58..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_15_annotations.py
+++ /dev/null
@@ -1,65 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 15 — Annotations — Attaching file references, citations, and URLs.
-
-Messages can carry annotations that reference files, cite sources, or link
-to URLs.  This sample shows how to emit ``file_path``, ``file_citation``,
-and ``url_citation`` annotations using the convenience
-``output_item_message(text, annotations=[...])`` API.
-
-Usage::
-
-    python sample_15_annotations.py
-
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "annotated", "input": "Show me the sources"}'
-"""
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponseEventStream,
-    ResponsesAgentServerHost,
-)
-from azure.ai.agentserver.responses.models import (
-    FileCitationBody,
-    FilePath,
-    UrlCitationBody,
-)
-
-app = ResponsesAgentServerHost()
-
-
-@app.create("annotations")
-async def annotations_handler(request: CreateResponse, context: ResponseContext):
-    """Return a message with file_path, file_citation, and url_citation annotations."""
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    annotations = [
-        FilePath(file_id="/reports/monthly-summary.pdf", index=0),
-        FilePath(file_id="/exports/data.csv", index=1),
-        FileCitationBody(file_id="/sources/research-paper.pdf", index=2, filename="research-paper.pdf"),
-        UrlCitationBody(
-            url="https://example.com/docs/guide",
-            start_index=0,
-            end_index=29,
-            title="Developer Guide",
-        ),
-    ]
-
-    async for event in stream.aoutput_item_message(
-        "Here are your files and sources.",
-        annotations=annotations,
-    ):
-        yield event
-
-    yield stream.emit_completed()
-
-
-if __name__ == "__main__":
-    import uvicorn
-
-    uvicorn.run(app.build(), host="0.0.0.0", port=8088)
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_16_structured_outputs.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_16_structured_outputs.py
deleted file mode 100644
index d39b2dde18c5..000000000000
--- a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_16_structured_outputs.py
+++ /dev/null
@@ -1,77 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-"""Sample 16 — Structured Outputs — Returning arbitrary structured JSON.
-
-Return structured JSON data as a ``structured_outputs`` output item.
-This sample demonstrates two approaches:
-
-  1. **Convenience** — ``output_item_structured_outputs(data)``
-  2. **Full control** — ``add_output_item_structured_outputs()`` builder
-
-Usage::
-
-    python sample_16_structured_outputs.py
-
-    curl -X POST http://localhost:8088/responses \
-        -H "Content-Type: application/json" \
-        -d '{"model": "analysis", "input": "Analyze the product reviews"}'
-"""
-
-from azure.ai.agentserver.responses import (
-    CreateResponse,
-    ResponseContext,
-    ResponseEventStream,
-    ResponsesAgentServerHost,
-)
-from azure.ai.agentserver.responses.models._generated import StructuredOutputsOutputItem
-
-app = ResponsesAgentServerHost()
-
-
-# ── Variant 1: Convenience ──────────────────────────────────────────────
-@app.create("structured.convenience")
-async def convenience_handler(request: CreateResponse, context: ResponseContext):
-    """Return structured analysis results using the convenience method."""
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    result = {
-        "sentiment": "positive",
-        "confidence": 0.95,
-        "topics": ["product-quality", "customer-service"],
-        "files": [
-            {
-                "name": "report.pdf",
-                "url": "https://storage.example.com/files/report.pdf",
-                "media_type": "application/pdf",
-            },
-        ],
-    }
-
-    async for event in stream.aoutput_item_structured_outputs(result):
-        yield event
-
-    yield stream.emit_completed()
-
-
-# ── Variant 2: Full control ─────────────────────────────────────────────
-@app.create("structured.full_control")
-async def full_control_handler(request: CreateResponse, context: ResponseContext):
-    """Return structured data using the builder for manual lifecycle control."""
-    stream = ResponseEventStream(response_id=context.response_id, request=request)
-    yield stream.emit_created()
-    yield stream.emit_in_progress()
-
-    builder = stream.add_output_item_structured_outputs()
-    item = StructuredOutputsOutputItem(id=builder.item_id, output={"status": "ok", "count": 42})
-    yield builder.emit_added(item)
-    yield builder.emit_done(item)
-
-    yield stream.emit_completed()
-
-
-if __name__ == "__main__":
-    import uvicorn
-
-    uvicorn.run(app.build(), host="0.0.0.0", port=8088)
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_18_durable_copilot.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_18_durable_copilot.py
new file mode 100644
index 000000000000..0d5e9a9a1390
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_18_durable_copilot.py
@@ -0,0 +1,460 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+"""Sample 18 — Durable Copilot (stateful conversation via GitHub Copilot SDK).
+
+Wraps the **GitHub Copilot Python SDK** (``github-copilot-sdk``) in a
+steerable durable response handler.  The Copilot SDK is the upstream
+framework that owns conversational durability — this handler is the
+bridge.
+
+Recovery model:
+
+- The Copilot session id is the framework-computed
+  ``context.conversation_chain_id`` — a deterministic, crash-stable
+  identifier shared by every turn in the same conversation. No
+  per-handler allocation, no metadata round-trip on first use.
+  The fresh-entry path uses ``client.create_session(session_id=…)``;
+  the recovery and follow-up steerable-turn path uses
+  ``client.resume_session(session_id, …)`` — the SDK's documented
+  reattach API.
+- Before sending the user's input, the handler reads the session's
+  persisted event history via ``session.get_messages()``, scans for
+  ``UserMessageData`` events, and skips ``session.send`` if the most
+  recent user message's content equals this turn's input. The
+  **upstream session event log is the source of truth** for "did I
+  already send this turn". No handler-managed metadata watermark, no
+  metadata flush ordering, no race between persistence and side effect.
+- On a steered cancellation that fires pre-entry, we still send the
+  user input to Copilot so the message is preserved in the
+  conversation history — otherwise the newer turn that supersedes us
+  would lose context.
+- On crash recovery, we never start a fresh session. Recovery always
+  reattaches via ``resume_session``.
+
+Streaming model (live deltas + recovery replay):
+
+- The Copilot SDK emits incremental tokens via
+  ``AssistantMessageDeltaData`` events as the model generates the
+  response. The handler forwards each event's ``delta_content`` as an
+  ``output_text.delta`` SSE event the moment it arrives, so clients see
+  characters appear live rather than in one batched dump at the end of
+  the turn. ``AssistantMessageData`` (the assembled-final-message event
+  delivered once generation completes) is used only as a fallback for
+  the rare case the SDK emits the final message without any prior
+  deltas.
+- On crash recovery, when the handler re-enters with
+  ``context.is_recovery == True``, it first reads the upstream session's
+  persisted assistant content for the current user turn via
+  ``session.get_messages()`` and emits the accumulated text as a single
+  ``output_text.delta`` event. The recovered client therefore sees:
+  ``response.in_progress`` (with zero output items) → one delta with the
+  accumulated text → live deltas continuing from where the upstream
+  Copilot session is. This is a deliberate simplification — the
+  original per-token delta sequence isn't preserved; we collapse the
+  pre-crash deltas into a single replay chunk and then resume live
+  streaming.
+
+Limitations:
+
+- The Copilot SDK does not checkpoint within an assistant response. If
+  Copilot finished a partial reply before the crash, we replay that
+  partial text on recovery; whether the upstream session continues to
+  emit more deltas after we re-attach depends on the Copilot SDK's
+  resume semantics. For workflows where strict per-token continuity
+  matters, decompose into smaller queries (see ``sample_19``) or use a
+  framework with native node-level checkpointing (see ``sample_21``).
+- If a prior turn's user input was identical to this turn's input AND
+  that prior turn completed normally, the "last user matches input"
+  heuristic will incorrectly skip the send. Rare in normal use; for
+  workflows where this matters, decompose or disambiguate at the
+  application level.
+
+Requirements::
+
+    pip install github-copilot-sdk
+    # GitHub Copilot CLI installed and authenticated.
+
+Usage::
+
+    python sample_18_durable_copilot.py
+
+    curl -N -X POST http://localhost:8088/responses \\
+        -H "Content-Type: application/json" \\
+        -d '{"model": "copilot", "input": "Write a Python fibonacci function",
+             "stream": true, "store": true, "background": true}'
+
+    # Steer with a follow-up
+    curl -N -X POST http://localhost:8088/responses \\
+        -H "Content-Type: application/json" \\
+        -d '{"model": "copilot", "input": "Make it iterative instead",
+             "stream": true, "store": true, "background": true,
+             "previous_response_id": "<id>"}'
+
+    # Simulate mid-stream shutdown
+    SIMULATE_SHUTDOWN_MS=1500 python sample_18_durable_copilot.py
+"""
+
+import asyncio
+import os
+from typing import Any
+
+from copilot import CopilotClient  # type: ignore[import-untyped]
+from copilot._jsonrpc import JsonRpcError  # type: ignore[import-untyped]
+from copilot.generated.session_events import (  # type: ignore[import-untyped]
+    AssistantMessageData,
+    AssistantMessageDeltaData,
+    SessionIdleData,
+    UserMessageData,
+)
+from copilot.session import PermissionHandler  # type: ignore[import-untyped]
+
+from azure.ai.agentserver.responses import (
+    CreateResponse,
+    ResponseContext,
+    ResponseEventStream,
+    ResponsesAgentServerHost,
+    ResponsesServerOptions,
+)
+from azure.ai.agentserver.responses.models._generated import ResponseObject
+
+options = ResponsesServerOptions(
+    durable_background=True,
+    steerable_conversations=True,
+)
+app = ResponsesAgentServerHost(options=options)
+
+_SIMULATE_SHUTDOWN_MS = int(os.environ.get("SIMULATE_SHUTDOWN_MS", "0"))
+
+# Allow operators / tests to pick the Copilot model via env var. Default is
+# a small, low-cost model that is generally available; operators with access
+# to a specific model can override at deploy time.
+_COPILOT_MODEL = os.environ.get("COPILOT_MODEL", "gpt-5-mini")
+
+
+async def _open_session(
+    client: Any,
+    session_id: str,
+    context: ResponseContext,
+) -> Any:
+    """Open the Copilot session — ``resume_session`` if it pre-existed.
+
+    On a fresh turn we use ``create_session``; on crash recovery and on every
+    subsequent steerable turn we use ``resume_session``, the SDK's explicit
+    reattach API. ``context.is_recovery`` is True only when we are being
+    re-entered after a crash; ``context.is_steered_turn`` is True for
+    steerable follow-up turns. Both routes attempt to reattach.
+
+    If ``resume_session`` raises "Session not found" (the upstream Copilot
+    CLI was not given enough time to persist the session before the
+    previous process exited — most common after SIGTERM with a short
+    grace, or SIGKILL), we fall back to ``create_session``. We lose the
+    pre-crash conversation context for this turn, but the handler makes
+    forward progress instead of failing outright. This honours the
+    invariant that recovery and upstream-dependency hiccups should
+    NOT propagate up as task failures (which would orphan the response
+    and fail any queued steers).
+
+    Both paths pass ``streaming=True`` so the SDK emits
+    ``AssistantMessageDeltaData`` events with incremental ``delta_content``
+    as the model generates the response — without this the SDK only delivers
+    the final ``AssistantMessageData`` event once generation completes, and
+    the SSE client sees the whole answer in a single delta dump instead of
+    live characters.
+    """
+    if context.is_recovery or context.is_steered_turn:
+        try:
+            return await client.resume_session(
+                session_id,
+                on_permission_request=PermissionHandler.approve_all,
+                model=_COPILOT_MODEL,
+                streaming=True,
+            )
+        except JsonRpcError as exc:
+            # Copilot CLI couldn't find the prior session (didn't persist
+            # before the previous process exited, or aged out of the SDK's
+            # cache). Fall back to a fresh session so the turn doesn't
+            # fail outright.
+            msg = str(exc)
+            if "Session not found" not in msg and "not found" not in msg.lower():
+                raise
+            import logging  # pylint: disable=import-outside-toplevel
+
+            logging.getLogger(__name__).warning(
+                "Copilot session %s not found on resume (%s); creating fresh "
+                "session — pre-crash conversation context for this turn is lost.",
+                session_id,
+                msg,
+            )
+            # Fall through to create_session below.
+    return await client.create_session(
+        session_id=session_id,
+        on_permission_request=PermissionHandler.approve_all,
+        model=_COPILOT_MODEL,
+        streaming=True,
+    )
+
+
+async def _send_input_if_not_in_session(
+    session: Any,
+    context: ResponseContext,
+) -> bool:
+    """Send this turn's input to Copilot unless it is already in the session.
+
+    Returns True if a send happened on this call; False otherwise.
+
+    Detection rule: list the session's persisted event history via
+    ``session.get_messages()``, scan for ``UserMessageData`` payloads,
+    and skip the send if the most recent user message's content equals
+    this turn's input. The upstream session is the source of truth —
+    no handler-managed watermark, no metadata flush ordering.
+
+    See ``sample_17``'s ``_send_input_if_not_in_session`` docstring for
+    the full discussion of why this is deterministic for the realistic
+    crash window and what the (rare) "user repeats themselves" edge
+    case looks like.
+    """
+    input_text = await context.get_input_text()
+
+    try:
+        events = await session.get_messages()
+    except Exception:  # pylint: disable=broad-exception-caught
+        events = []
+
+    # Find the most recent user-message event.
+    last_user_text: str | None = None
+    for ev in reversed(events):
+        data = getattr(ev, "data", None)
+        if isinstance(data, UserMessageData):
+            content = getattr(data, "content", None)
+            if isinstance(content, str):
+                last_user_text = content
+            break
+
+    if last_user_text == input_text:
+        return False  # already in the session — skip
+
+    await session.send(input_text)
+    return True
+
+
+async def _gather_accumulated_assistant_text(session: Any, user_input_text: str) -> str:
+    """Return the upstream assistant content already emitted for this turn.
+
+    Used on crash recovery to surface whatever Copilot had already sent
+    before the crash as a single replay delta. Looks for the last
+    ``UserMessageData`` event whose content matches ``user_input_text``
+    and concatenates every ``AssistantMessageData`` event that follows
+    it in the session's persisted event log.
+
+    :param session: An open Copilot session (post-``resume_session``).
+    :type session: Any
+    :param user_input_text: The current turn's user input text.
+    :type user_input_text: str
+    :returns: Concatenated assistant content, or an empty string if the
+        upstream session has not produced any assistant content for
+        this turn yet.
+    :rtype: str
+    """
+    try:
+        events = await session.get_messages()
+    except Exception:  # pylint: disable=broad-exception-caught
+        return ""
+
+    # Find the index of the last UserMessageData event whose content
+    # matches the current turn's input.
+    last_user_index: int | None = None
+    for i, ev in enumerate(events):
+        data = getattr(ev, "data", None)
+        if isinstance(data, UserMessageData):
+            content = getattr(data, "content", None)
+            if isinstance(content, str) and content == user_input_text:
+                last_user_index = i
+
+    if last_user_index is None:
+        return ""
+
+    # Concatenate all AssistantMessageData content emitted after that
+    # user message.
+    parts: list[str] = []
+    for ev in events[last_user_index + 1 :]:
+        data = getattr(ev, "data", None)
+        if isinstance(data, AssistantMessageData):
+            content = getattr(data, "content", None)
+            if isinstance(content, str):
+                parts.append(content)
+    return "".join(parts)
+
+
+def _build_resumption_response(context: ResponseContext, request: CreateResponse) -> ResponseObject:
+    """Empty resumption response — see ``sample_17`` for full rationale."""
+    return ResponseObject(
+        {
+            "id": context.response_id,
+            "object": "response",
+            "status": "in_progress",
+            "output": [],
+            "model": request.model,
+        }
+    )
+
+
+@app.response_handler
+async def handler(
+    request: CreateResponse,
+    context: ResponseContext,
+    cancellation_signal: asyncio.Event,
+):
+    """Steerable Copilot SDK conversation."""
+    # ── Recovery branch ─────────────────────────────────────────────
+    if context.is_recovery:
+        stream = ResponseEventStream(
+            response_id=context.response_id,
+            response=_build_resumption_response(context, request),
+        )
+    else:
+        stream = ResponseEventStream(response_id=context.response_id, request=request)
+
+    yield stream.emit_created()
+
+    # ── Pre-entry cancellation / shutdown check ────────────────────
+    # On a STEERED pre-entry we still send the user's input to Copilot so
+    # it is preserved in conversation history. For other cancellation
+    # reasons (client-cancel) or shutdown we just return without touching
+    # the SDK — the framework forces ``cancelled`` for client-cancel and
+    # re-invokes the handler on the next restart for shutdown.
+    if cancellation_signal.is_set() or context.shutdown.is_set():
+        if cancellation_signal.is_set() and context.pending_input_count > 0:
+            session_id = context.conversation_chain_id
+            async with CopilotClient() as client:
+                async with await _open_session(client, session_id, context) as session:
+                    await _send_input_if_not_in_session(session, context)
+            yield stream.emit_completed()
+        return
+
+    yield stream.emit_in_progress()
+
+    shutdown_timer: asyncio.Task | None = None
+    if _SIMULATE_SHUTDOWN_MS > 0:
+        shutdown_timer = asyncio.create_task(_simulate_shutdown(context))
+
+    message = stream.add_output_item_message()
+    yield message.emit_added()
+    text = message.add_text_content()
+    yield text.emit_added()
+
+    session_id = context.conversation_chain_id
+
+    # ── Live delta streaming via asyncio.Queue ──────────────────────
+    # Copilot's SDK emits incremental tokens via ``AssistantMessageDeltaData``
+    # events as the model generates the response. We push each delta's
+    # ``delta_content`` into a queue and forward it as an
+    # ``output_text.delta`` SSE event the moment it arrives, so clients
+    # see characters appear live rather than in a single batched dump.
+    # ``AssistantMessageData`` is the FINAL assembled message (delivered
+    # once the response is complete); we ignore it on the delta path —
+    # the deltas have already accumulated to the same content — but use
+    # it as a fallback if the SDK emits the assembled message WITHOUT
+    # prior deltas (older versions / certain Copilot models).
+    _IDLE = object()
+    delta_queue: asyncio.Queue[Any] = asyncio.Queue()
+    _saw_delta = False
+
+    def on_event(event: Any) -> None:
+        nonlocal _saw_delta
+        data = getattr(event, "data", None)
+        if isinstance(data, AssistantMessageDeltaData):
+            chunk = getattr(data, "delta_content", None) or ""
+            if chunk:
+                _saw_delta = True
+                delta_queue.put_nowait(chunk)
+        elif isinstance(data, AssistantMessageData):
+            # Fallback: if the SDK delivered the full message without
+            # any prior deltas, forward it as a single delta so the
+            # client still receives the content.
+            if not _saw_delta:
+                content = getattr(data, "content", None) or ""
+                if content:
+                    delta_queue.put_nowait(content)
+        elif isinstance(data, SessionIdleData):
+            delta_queue.put_nowait(_IDLE)
+
+    accumulated = ""
+
+    async with CopilotClient() as client:
+        # Reattach on recovery (resume_session), create on fresh (create_session).
+        async with await _open_session(client, session_id, context) as session:
+            session.on(on_event)
+
+            # ── Recovery replay ─────────────────────────────────────
+            # On crash recovery / steerable reattach, the upstream
+            # session may already hold some accumulated assistant text
+            # for the current user turn (a partial or complete prior
+            # response). Emit it as a single delta so the recovered
+            # client sees the work that was already done before the
+            # crash. Live deltas continue from here.
+            if context.is_recovery or context.is_steered_turn:
+                user_input_text = await context.get_input_text()
+                replay = await _gather_accumulated_assistant_text(session, user_input_text)
+                if replay:
+                    accumulated += replay
+                    yield text.emit_delta(replay)
+
+            # Upstream-history-gated send: skipped when Copilot's
+            # persisted event log already has our user message as its
+            # most recent user event.
+            sent_this_attempt = await _send_input_if_not_in_session(session, context)
+
+            # Drain live events. If we sent input this attempt, wait
+            # for idle indefinitely (Copilot is generating). If we
+            # didn't send (recovery + already-in-session), the upstream
+            # session may still emit a few residual events on attach —
+            # poll with a short bounded timeout, then exit cleanly.
+            wait_timeout = None if sent_this_attempt else 2.0
+            while True:
+                if cancellation_signal.is_set() or context.shutdown.is_set():
+                    await session.abort()
+                    break
+                try:
+                    chunk = await asyncio.wait_for(
+                        delta_queue.get(),
+                        timeout=wait_timeout,
+                    )
+                except asyncio.TimeoutError:
+                    # No new events within the recovery polling window;
+                    # presume the upstream is idle and exit.
+                    break
+                if chunk is _IDLE:
+                    break
+                accumulated += chunk
+                yield text.emit_delta(chunk)
+
+    yield text.emit_text_done(accumulated.strip())
+    yield text.emit_done()
+    yield message.emit_done()
+
+    if shutdown_timer and not shutdown_timer.done():
+        shutdown_timer.cancel()
+
+    # Mid-stream shutdown: return without terminal so the framework
+    # re-invokes us; the recovery branch reattaches the same session via
+    # resume_session and the upstream-history check prevents re-sending.
+    if context.shutdown.is_set():
+        return
+
+    yield stream.emit_completed()
+
+
+async def _simulate_shutdown(context: ResponseContext) -> None:
+    """Fire SHUTTING_DOWN after a delay (local testing only)."""
+    await asyncio.sleep(_SIMULATE_SHUTDOWN_MS / 1000.0)
+    context.shutdown.set()
+
+
+def main() -> None:
+    app.run()
+
+
+if __name__ == "__main__":
+    main()
+
+import asyncio
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_19_durable_streaming.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_19_durable_streaming.py
new file mode 100644
index 000000000000..a888437fac69
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_19_durable_streaming.py
@@ -0,0 +1,236 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+"""Sample 19 — Durable streaming with handler-managed phase checkpoints.
+
+A durable response handler with NO upstream framework — checkpoints are
+managed entirely via ``context.conversation_chain_metadata``. This is the teaching shape
+of the recovery contract; samples that wrap real upstream frameworks
+(Claude, Copilot, LangGraph) layer additional reconciliation on top of
+the same pattern.
+
+The handler runs three phases (``analyze`` → ``generate`` → ``refine``)
+and emits one output item per phase. After each phase finishes it stamps
+``context.conversation_chain_metadata["phase_complete"]``. On a recovered entry, the
+handler reads the watermark, builds a resumption response containing the
+items for the completed phases, emits ``response.in_progress`` carrying
+the resumption response (the client-visible reset point), and resumes at
+the first incomplete phase.
+
+Demonstrates:
+
+- The recovery-aware default pattern from the handler guide.
+- Resumption response construction from handler-managed metadata only
+  (no upstream SDK).
+- ``ResponseEventStream(response=resumption)`` seeding.
+- Pre-entry / mid-stream / post-stream cancellation handling.
+- ``SIMULATE_SHUTDOWN_MS`` for local mid-stream-shutdown testing.
+
+What this sample does NOT demonstrate (covered by other samples):
+
+- Wrapping a stateful upstream SDK (see ``sample_17`` for Claude, ``18``
+  for Copilot, ``21`` for LangGraph).
+- Steerable multi-turn conversations (see ``sample_20``).
+
+Usage::
+
+    python sample_19_durable_streaming.py
+
+    curl -N -X POST http://localhost:8088/responses \\
+        -H "Content-Type: application/json" \\
+        -d '{"model": "streamer", "input": "Tell me a joke",
+             "stream": true, "store": true, "background": true}'
+
+    # Simulate mid-stream shutdown — handler checkpoints, returns without
+    # terminal, framework re-invokes on restart from the last completed phase.
+    SIMULATE_SHUTDOWN_MS=120 python sample_19_durable_streaming.py
+"""
+
+import asyncio
+import os
+from typing import Any
+
+from azure.ai.agentserver.responses import (
+    CreateResponse,
+    ResponseContext,
+    ResponseEventStream,
+    ResponsesAgentServerHost,
+    ResponsesServerOptions,
+)
+from azure.ai.agentserver.responses.models._generated import ResponseObject
+
+options = ResponsesServerOptions(durable_background=True)
+app = ResponsesAgentServerHost(options=options)
+
+_SIMULATE_SHUTDOWN_MS = int(os.environ.get("SIMULATE_SHUTDOWN_MS", "0"))
+
+# Phases run in order. Each emits one message output item and stamps
+# `phase_complete` in metadata after the item's `output_item.done`.
+_PHASE_ORDER: tuple[str, ...] = ("analyze", "generate", "refine")
+
+
+async def _phase_tokens(phase: str, prompt: str):
+    """Simulated upstream — produce a few tokens for the given phase.
+
+    Replace with your real LLM call, document analysis, etc.
+    """
+    text = {
+        "analyze": f"[analyze] Examining input: '{prompt}'.",
+        "generate": f"[generate] Drafting response for: '{prompt}'.",
+        "refine": f"[refine] Polished result for: '{prompt}'.",
+    }[phase]
+    for token in text.split():
+        await asyncio.sleep(0.03)
+        yield token + " "
+
+
+def _phase_message_payload(phase: str, text: str) -> dict[str, Any]:
+    """Serialize a fully-completed phase output item for the resumption response."""
+    return {
+        "type": "message",
+        "id": f"phase_{phase}_msg",
+        "role": "assistant",
+        "status": "completed",
+        "content": [{"type": "output_text", "text": text, "annotations": []}],
+    }
+
+
+def _completed_phase_index(context) -> int:
+    """Return the index of the next phase to run; 0 if nothing done yet."""
+    done = context.conversation_chain_metadata.get("phase_complete")
+    if not done or done not in _PHASE_ORDER:
+        return 0
+    return _PHASE_ORDER.index(done) + 1
+
+
+def _build_resumption_response(context: ResponseContext, request: CreateResponse) -> ResponseObject:
+    """Build the resumption response from completed phases recorded in metadata.
+
+    Only includes items for phases whose `output_item.done` was emitted in
+    a prior attempt. In-flight items from a crashed phase are excluded —
+    that phase will be re-run from scratch on this attempt.
+    """
+    next_phase = _completed_phase_index(context)
+    completed_texts = context.conversation_chain_metadata.get("phase_texts", {}) or {}
+    output: list[dict[str, Any]] = []
+    for phase in _PHASE_ORDER[:next_phase]:
+        text = completed_texts.get(phase, "")
+        output.append(_phase_message_payload(phase, text))
+    return ResponseObject(
+        {
+            "id": context.response_id,
+            "object": "response",
+            "status": "in_progress",
+            "output": output,
+            "model": request.model,
+        }
+    )
+
+
+@app.response_handler
+async def handler(
+    request: CreateResponse,
+    context: ResponseContext,
+    cancellation_signal: asyncio.Event,
+):
+    """Three-phase durable streaming handler with crash recovery."""
+    # ── Recovery branch ─────────────────────────────────────────────
+    # On recovery, seed the stream with a resumption response derived from
+    # metadata watermarks. The library treats this run's ``response.in_progress``
+    # as the client-visible snapshot reset (see the handler guide's
+    # Durability section).
+    if context.is_recovery:
+        stream = ResponseEventStream(
+            response_id=context.response_id,
+            response=_build_resumption_response(context, request),
+        )
+    else:
+        stream = ResponseEventStream(response_id=context.response_id, request=request)
+
+    yield stream.emit_created()  # library tolerates duplicate on recovery
+
+    # ── Pre-entry cancellation/shutdown check ──────────────────────
+    # This sample does NOT enable steerable_conversations, so STEERED
+    # cannot occur. Shutdown and client-cancel are independent, mutually
+    # exclusive surfaces — check shutdown FIRST.
+    if context.shutdown.is_set():
+        # Graceful shutdown before we started: defer to next-lifetime
+        # recovery. The unified primitive raises internally and works in
+        # this streaming async-generator shape.
+        await context.exit_for_recovery()
+    if cancellation_signal.is_set():
+        # Client-cancelled: return without a terminal (framework forces
+        # ``cancelled``).
+        return
+
+    yield stream.emit_in_progress()
+
+    # Optional local shutdown simulation.
+    shutdown_timer: asyncio.Task | None = None
+    if _SIMULATE_SHUTDOWN_MS > 0:
+        shutdown_timer = asyncio.create_task(_simulate_shutdown(context))
+
+    input_text = await context.get_input_text()
+    phase_texts: dict[str, str] = dict(context.conversation_chain_metadata.get("phase_texts", {}) or {})
+
+    # Run phases starting at the first one not yet completed.
+    start = _completed_phase_index(context)
+    for phase in _PHASE_ORDER[start:]:
+        message = stream.add_output_item_message()
+        yield message.emit_added()
+        text = message.add_text_content()
+        yield text.emit_added()
+
+        accumulated = ""
+        async for token in _phase_tokens(phase, input_text):
+            if cancellation_signal.is_set() or context.shutdown.is_set():
+                break
+            accumulated += token
+            yield text.emit_delta(token)
+
+        # Always close builders for the current phase so the persisted
+        # event stream is well-formed even if the phase was cancelled.
+        # Whether this phase counts as "complete" for recovery purposes
+        # is decided below by the watermark.
+        yield text.emit_text_done(accumulated.strip())
+        yield text.emit_done()
+        yield message.emit_done()
+
+        # ── Mid-stream cancellation/shutdown check ─────────────────
+        # If cancelled or shutdown mid-phase, do NOT advance the watermark —
+        # the phase output is not durably committed from a recovery
+        # standpoint, and a recovered attempt should re-run this phase.
+        if cancellation_signal.is_set() or context.shutdown.is_set():
+            break
+
+        # Phase finished cleanly — advance the watermark so a recovery
+        # attempt skips this phase. Stamp BEFORE moving on so a crash
+        # before the next phase's add still finds this phase complete.
+        phase_texts[phase] = accumulated.strip()
+        context.conversation_chain_metadata["phase_texts"] = phase_texts
+        context.conversation_chain_metadata["phase_complete"] = phase
+
+    if shutdown_timer and not shutdown_timer.done():
+        shutdown_timer.cancel()
+
+    # ── Post-stream shutdown check ──────────────────────────────────
+    # Shutdown mid-stream: defer to next-lifetime recovery so the
+    # framework re-invokes us; the recovery branch above picks up from
+    # the last completed phase.
+    if context.shutdown.is_set():
+        await context.exit_for_recovery()
+
+    yield stream.emit_completed()
+
+
+async def _simulate_shutdown(context: ResponseContext) -> None:
+    """Fire SHUTTING_DOWN after a delay (local testing only)."""
+    await asyncio.sleep(_SIMULATE_SHUTDOWN_MS / 1000.0)
+    context.shutdown.set()
+
+
+def main() -> None:
+    app.run()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_20_durable_steering.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_20_durable_steering.py
new file mode 100644
index 000000000000..ecb7b29b7a53
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_20_durable_steering.py
@@ -0,0 +1,206 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+"""Sample 20 — Durable steering with cancellation × recovery composition.
+
+A steerable durable handler with NO upstream framework. Demonstrates how
+the cancellation policy and the crash recovery contract compose when
+steering, client cancel, and shutdown interleave with crash recovery.
+
+Differences from ``sample_19``:
+
+- ``steerable_conversations=True`` — each new turn supersedes the prior
+  one; the prior turn's handler observes ``context._cancellation_signal.is_set()``
+  with no cause flag (steering pressure — neither ``client_cancelled``
+  nor ``shutdown.is_set()`` is set).
+- A single message item per turn (no phases). Recovery within a turn
+  doesn't try to checkpoint partial token output — the resumption
+  response is empty and the recovered attempt re-streams from scratch.
+  This is the realistic case for handlers wrapping non-deterministic
+  upstreams (LLMs): you can't pick up exactly where you left off, so
+  you start the turn over and let the client redraw on the reset.
+- A ``turn_count`` watermark survives across turns; useful for
+  conversation-level scaffolding.
+
+What this sample demonstrates:
+
+- Steerable handler that ends a turn cleanly on STEERED (close builders +
+  ``emit_completed`` with partial content).
+- Mid-stream shutdown returns without terminal — recovery re-runs the
+  turn from scratch.
+- ``context.is_recovery`` branch produces an empty resumption response
+  that signals the client to reset.
+- Cross-turn state via ``turn_count`` survives crashes.
+
+What this sample does NOT demonstrate:
+
+- Per-token checkpointing (impractical for non-deterministic upstreams).
+- Wrapping a stateful upstream SDK (see ``sample_17``, ``18``, ``21``).
+
+Usage::
+
+    python sample_20_durable_steering.py
+
+    # Turn 1
+    curl -N -X POST http://localhost:8088/responses \\
+        -H "Content-Type: application/json" \\
+        -d '{"model": "agent", "input": "Explain quantum computing",
+             "store": true, "background": true}'
+
+    # Steer (supersede turn 1)
+    curl -X POST http://localhost:8088/responses \\
+        -H "Content-Type: application/json" \\
+        -d '{"model": "agent", "input": "Actually explain relativity",
+             "store": true, "background": true, "previous_response_id": "<id>"}'
+
+    # Simulate mid-stream shutdown
+    SIMULATE_SHUTDOWN_MS=200 python sample_20_durable_steering.py
+"""
+
+import asyncio
+import os
+
+from azure.ai.agentserver.responses import (
+    CreateResponse,
+    ResponseContext,
+    ResponseEventStream,
+    ResponsesAgentServerHost,
+    ResponsesServerOptions,
+)
+from azure.ai.agentserver.responses.models._generated import ResponseObject
+
+options = ResponsesServerOptions(
+    durable_background=True,
+    steerable_conversations=True,
+)
+app = ResponsesAgentServerHost(options=options)
+
+_SIMULATE_SHUTDOWN_MS = int(os.environ.get("SIMULATE_SHUTDOWN_MS", "0"))
+
+
+async def _simulate_llm_stream(prompt: str):
+    """Simulate an LLM producing tokens. Replace with your real LLM call."""
+    words = f"Let me explain {prompt} in detail. Comprehensive answer here.".split()
+    for word in words:
+        await asyncio.sleep(0.05)
+        yield word + " "
+
+
+def _build_resumption_response(context: ResponseContext, request: CreateResponse) -> ResponseObject:
+    """Build an empty resumption response.
+
+    For a single-turn handler with a non-deterministic upstream there is
+    nothing to safely carry forward from a crashed mid-stream attempt —
+    the partial token stream cannot be byte-matched to a re-attempted
+    stream, so we discard it and let the recovered attempt produce
+    everything fresh. The empty payload tells the client to reset its
+    view.
+    """
+    return ResponseObject(
+        {
+            "id": context.response_id,
+            "object": "response",
+            "status": "in_progress",
+            "output": [],
+            "model": request.model,
+        }
+    )
+
+
+@app.response_handler
+async def handler(
+    request: CreateResponse,
+    context: ResponseContext,
+    cancellation_signal: asyncio.Event,
+):
+    """Steerable durable handler with cancellation × recovery composition."""
+    # ── Recovery branch ─────────────────────────────────────────────
+    if context.is_recovery:
+        stream = ResponseEventStream(
+            response_id=context.response_id,
+            response=_build_resumption_response(context, request),
+        )
+    else:
+        stream = ResponseEventStream(response_id=context.response_id, request=request)
+
+    yield stream.emit_created()
+
+    # ── Pre-entry cancellation/shutdown check ────────
+    # Shutdown and cancellation are independent, mutually exclusive
+    # surfaces — check shutdown FIRST. (Shutdown does NOT fire
+    # cancellation_signal.)
+    if context.shutdown.is_set():
+        # Graceful shutdown before we started: defer to next-lifetime
+        # recovery (the framework re-invokes us on restart).
+        await context.exit_for_recovery()
+    if cancellation_signal.is_set():
+        if context.pending_input_count > 0:
+            # Steering pre-entry: emit completed so the partial output
+            # (none in this case) becomes valid context for the drain
+            # turn that follows.
+            yield stream.emit_completed()
+        # Otherwise: client-cancelled (framework forces ``cancelled``) —
+        # return silently without a terminal.
+        return
+
+    yield stream.emit_in_progress()
+
+    # Cross-turn state: bump the turn counter. This survives crashes
+    # and turn boundaries since it lives in `context.conversation_chain_metadata`.
+    turn_count = int(context.conversation_chain_metadata.get("turn_count", 0)) + 1
+    context.conversation_chain_metadata["turn_count"] = turn_count
+
+    # Optional local shutdown simulation.
+    shutdown_timer: asyncio.Task | None = None
+    if _SIMULATE_SHUTDOWN_MS > 0:
+        shutdown_timer = asyncio.create_task(_simulate_shutdown(context))
+
+    message = stream.add_output_item_message()
+    yield message.emit_added()
+    text = message.add_text_content()
+    yield text.emit_added()
+
+    input_text = await context.get_input_text()
+    accumulated = ""
+
+    # ── Mid-stream cancellation/shutdown check ──────
+    async for token in _simulate_llm_stream(input_text):
+        if cancellation_signal.is_set() or context.shutdown.is_set():
+            break
+        accumulated += token
+        yield text.emit_delta(token)
+
+    # Always close builders so the persisted event stream is well-formed
+    # — even on a cancelled / steered turn. The partial content is valid
+    # context for steerable conversations.
+    yield text.emit_text_done(accumulated.strip())
+    yield text.emit_done()
+    yield message.emit_done()
+
+    if shutdown_timer and not shutdown_timer.done():
+        shutdown_timer.cancel()
+
+    # ── Post-stream shutdown check ────────────────
+    # Shutdown mid-stream: defer to next-lifetime recovery so the
+    # framework re-invokes us; the recovery branch above re-streams from
+    # scratch.
+    if context.shutdown.is_set():
+        await context.exit_for_recovery()
+
+    # All other cases (steered, client-cancelled, normal completion):
+    # emit the terminal event. The framework overrides status for
+    # client-cancel; for steered, partial output is valid context.
+    yield stream.emit_completed()
+
+
+async def _simulate_shutdown(context: ResponseContext) -> None:
+    """Fire SHUTTING_DOWN after a delay (local testing only)."""
+    await asyncio.sleep(_SIMULATE_SHUTDOWN_MS / 1000.0)
+    context.shutdown.set()
+
+
+def main() -> None:
+    app.run()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_21_durable_langgraph.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_21_durable_langgraph.py
new file mode 100644
index 000000000000..84f1aa7a15cf
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_21_durable_langgraph.py
@@ -0,0 +1,416 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+"""Sample 21 — Durable LangGraph with SqliteSaver checkpointing.
+
+Wraps a LangGraph ``StateGraph`` in a steerable durable response handler.
+LangGraph's ``SqliteSaver`` checkpointer is the canonical example of an
+**upstream framework that owns durability** — the SDK does the heavy
+lifting; the response handler is just the bridge.
+
+This sample implements the recovery contract:
+
+- ``context.conversation_chain_metadata`` only stores a small ``stable_checkpoint_id``
+  watermark — the last graph checkpoint where the handler successfully
+  emitted an AI reply.
+- On recovered entry, the handler queries the graph's current state,
+  builds a resumption response from the AI messages already in the
+  graph history, and emits ``response.in_progress`` carrying it (the
+  client-visible reset point).
+- The recovered attempt then resumes ``graph.stream(None, ...)`` from
+  the current graph state. SqliteSaver guarantees node-boundary
+  recovery, so no node is re-executed.
+- Steering between turns is handled by ``fork_session``-style
+  ``graph.update_state(...)`` from the stable checkpoint.
+
+Demonstrates:
+
+- LangGraph native checkpointing (``SqliteSaver`` is the source of truth).
+- ``graph.stream()`` for inter-node cancellation.
+- Recovery contract: resumption response + reset ``in_progress``.
+- Cancellation policy applied at pre-entry / mid-stream / post-stream.
+- Fork-on-steer for new turns that supersede a prior one.
+
+Requirements::
+
+    pip install langgraph langgraph-checkpoint-sqlite langchain-core
+
+Usage::
+
+    python sample_21_durable_langgraph.py
+
+    # Turn 1
+    curl -N -X POST http://localhost:8088/responses \\
+        -H "Content-Type: application/json" \\
+        -d '{"model": "langgraph", "input": "Research quantum computing",
+             "stream": true, "store": true, "background": true}'
+
+    # Steer (fork from stable checkpoint with new message)
+    curl -N -X POST http://localhost:8088/responses \\
+        -H "Content-Type: application/json" \\
+        -d '{"model": "langgraph", "input": "Focus on error correction",
+             "stream": true, "store": true, "background": true,
+             "previous_response_id": "<id>"}'
+
+    # Simulate mid-node shutdown
+    SIMULATE_SHUTDOWN_MS=2500 python sample_21_durable_langgraph.py
+"""
+
+import asyncio
+import os
+import sqlite3
+import typing
+from pathlib import Path
+from typing import Any
+
+from langchain_core.messages import AIMessage, HumanMessage
+from langgraph.checkpoint.sqlite import SqliteSaver
+from langgraph.graph import END, START, StateGraph, add_messages
+from langgraph.types import Command, interrupt
+
+from azure.ai.agentserver.responses import (
+    CreateResponse,
+    ResponseContext,
+    ResponseEventStream,
+    ResponsesAgentServerHost,
+    ResponsesServerOptions,
+)
+from azure.ai.agentserver.responses.models._generated import ResponseObject
+
+
+# ─── Graph State ────────────────────────────────────────────────────────────
+
+
+class ConversationState(typing.TypedDict):
+    """Multi-turn conversation state with LangGraph's add_messages reducer."""
+
+    messages: typing.Annotated[list, add_messages]
+    is_complete: bool
+
+
+# ─── Graph Nodes ────────────────────────────────────────────────────────────
+
+_STEP_DELAY = 1.0  # Seconds per node — makes inter-node cancel observable
+
+
+async def analyze_input(state: ConversationState) -> dict[str, Any]:
+    """Simulate intent detection / input analysis."""
+    await asyncio.sleep(_STEP_DELAY)
+    return {}
+
+
+async def generate_response(state: ConversationState) -> dict[str, Any]:
+    """Generate AI response (replace with real LLM call)."""
+    await asyncio.sleep(_STEP_DELAY)
+    messages = state["messages"]
+    user_msgs = [m for m in messages if isinstance(m, HumanMessage)]
+    turn = len(user_msgs)
+    last = user_msgs[-1].content if user_msgs else ""
+    reply = f"Turn {turn}: Processing '{last}' with full context from {turn} turns."
+    return {"messages": [AIMessage(content=reply)]}
+
+
+async def refine_response(state: ConversationState) -> dict[str, Any]:
+    """Post-processing (safety checks, formatting)."""
+    await asyncio.sleep(_STEP_DELAY * 0.5)
+    return {}
+
+
+def wait_for_user(state: ConversationState) -> dict[str, Any]:
+    """Pause graph — wait for next human message via interrupt."""
+    user_input: str = interrupt({"prompt": "Next message (or 'done'):"})
+    if user_input.strip().lower() == "done":
+        return {"is_complete": True}
+    return {"messages": [HumanMessage(content=user_input)], "is_complete": False}
+
+
+def _should_continue(state: ConversationState) -> str:
+    if state.get("is_complete", False):
+        return "end"
+    return "continue"
+
+
+# ─── Persistent Checkpointer ───────────────────────────────────────────────
+
+_DATA_DIR = Path.home() / ".durable-sessions" / "langgraph-responses"
+_DATA_DIR.mkdir(parents=True, exist_ok=True)
+_DB_PATH = _DATA_DIR / "checkpoints.db"
+
+_conn = sqlite3.connect(str(_DB_PATH), check_same_thread=False)
+_checkpointer = SqliteSaver(_conn)
+_checkpointer.setup()
+
+
+# ─── Build Graph ────────────────────────────────────────────────────────────
+
+
+def _build_graph() -> Any:
+    """Multi-node graph: analyze → generate → refine → wait_for_user (loop)."""
+    builder = StateGraph(ConversationState)
+    builder.add_node("analyze_input", analyze_input)
+    builder.add_node("generate_response", generate_response)
+    builder.add_node("refine_response", refine_response)
+    builder.add_node("wait_for_user", wait_for_user)
+
+    builder.add_edge(START, "analyze_input")
+    builder.add_edge("analyze_input", "generate_response")
+    builder.add_edge("generate_response", "refine_response")
+    builder.add_edge("refine_response", "wait_for_user")
+    builder.add_conditional_edges("wait_for_user", _should_continue, {"continue": "analyze_input", "end": END})
+    return builder.compile(checkpointer=_checkpointer)
+
+
+_graph = _build_graph()
+
+
+# ─── Server ─────────────────────────────────────────────────────────────────
+
+options = ResponsesServerOptions(
+    durable_background=True,
+    steerable_conversations=True,
+)
+app = ResponsesAgentServerHost(options=options)
+
+_SIMULATE_SHUTDOWN_MS = int(os.environ.get("SIMULATE_SHUTDOWN_MS", "0"))
+
+
+def _invoke_cancellable(
+    graph: Any,
+    graph_input: Any,
+    config: dict[str, Any],
+    cancel_event: asyncio.Event,
+) -> tuple[bool, list[str]]:
+    """Stream graph node-by-node with inter-node cancellation.
+
+    Returns (completed, node_names_executed).
+    """
+    nodes_executed: list[str] = []
+    for chunk in graph.stream(graph_input, config, stream_mode="updates"):
+        for node_name in chunk:
+            if node_name != "__end__":
+                nodes_executed.append(node_name)
+        if cancel_event.is_set():
+            return False, nodes_executed
+    return True, nodes_executed
+
+
+def _fork_from_checkpoint(
+    graph: Any,
+    config: dict[str, Any],
+    target_checkpoint_id: str,
+    new_message: str,
+) -> bool:
+    """Fork graph state from a stable checkpoint with a new message."""
+    target_config = {"configurable": {**config["configurable"], "checkpoint_id": target_checkpoint_id}}
+    target = graph.get_state(target_config)
+    if not target or not target.config:
+        return False
+    graph.update_state(
+        target.config,
+        values={"messages": [HumanMessage(content=new_message)]},
+        as_node="wait_for_user",
+    )
+    return True
+
+
+def _build_resumption_response(
+    context: ResponseContext,
+    request: CreateResponse,
+    thread_config: dict[str, Any],
+) -> ResponseObject:
+    """Build the recovery resumption response from current graph state.
+
+    LangGraph is the source of truth for "what's safely committed" — each
+    AI message in graph state was emitted at a node boundary checkpointed
+    by SqliteSaver. We materialize one ``message`` output item per AI
+    message currently in graph state. The recovered attempt then resumes
+    ``graph.stream(None, ...)`` from the live checkpoint and any new AI
+    messages get appended as fresh output items.
+    """
+    try:
+        state = _graph.get_state(thread_config)
+    except Exception:  # pylint: disable=broad-except
+        state = None
+
+    output: list[dict[str, Any]] = []
+    if state is not None:
+        messages = state.values.get("messages", []) if state.values else []
+        for idx, msg in enumerate(m for m in messages if isinstance(m, AIMessage)):
+            output.append(
+                {
+                    "type": "message",
+                    "id": f"recovered_ai_{idx}",
+                    "role": "assistant",
+                    "status": "completed",
+                    "content": [
+                        {
+                            "type": "output_text",
+                            "text": str(msg.content),
+                            "annotations": [],
+                        }
+                    ],
+                }
+            )
+
+    return ResponseObject(
+        {
+            "id": context.response_id,
+            "object": "response",
+            "status": "in_progress",
+            "output": output,
+            "model": request.model,
+        }
+    )
+
+
+@app.response_handler
+async def handler(
+    request: CreateResponse,
+    context: ResponseContext,
+    cancellation_signal: asyncio.Event,
+):
+    """LangGraph with SqliteSaver checkpoints + recovery contract."""
+    input_text = await context.get_input_text()
+
+    thread_id = context.conversation_id or context.response_id
+    thread_config: dict[str, Any] = {"configurable": {"thread_id": thread_id}}
+
+    # ── Recovery branch ─────────────────────────────────────────────
+    # On recovered entry, seed the stream with a resumption response
+    # built from the graph's current state (the upstream framework's
+    # source of truth). The recovery `response.in_progress` emitted
+    # below is the client-visible reset point.
+    if context.is_recovery:
+        resp_stream = ResponseEventStream(
+            response_id=context.response_id,
+            response=_build_resumption_response(context, request, thread_config),
+        )
+    else:
+        resp_stream = ResponseEventStream(response_id=context.response_id, request=request)
+
+    yield resp_stream.emit_created()
+
+    # ── Phase 1: Pre-entry cancel / shutdown ───────────────────────
+    # Still inject the message into graph state so next turn has context.
+    # Only emit completed for steering. Others (client-cancel, shutdown):
+    # just return.
+    if cancellation_signal.is_set() or context.shutdown.is_set():
+        stable_cp = context.conversation_chain_metadata.get("stable_checkpoint_id")
+        if stable_cp:
+            await asyncio.to_thread(_fork_from_checkpoint, _graph, thread_config, stable_cp, input_text)
+        if cancellation_signal.is_set() and context.pending_input_count > 0:
+            yield resp_stream.emit_completed()
+        return
+
+    yield resp_stream.emit_in_progress()
+
+    # Shutdown simulation
+    shutdown_timer: asyncio.Task | None = None
+    if _SIMULATE_SHUTDOWN_MS > 0:
+        shutdown_timer = asyncio.create_task(_simulate_shutdown(context))
+
+    # ── Fork-on-steer (fresh-entry only) ────────────────────────────
+    # If this turn is the *successor* of a steered turn AND there is a
+    # stable checkpoint to fork from, branch the graph to that point
+    # with the new message. Skip on a recovered entry — we never want to
+    # re-fork on recovery; the SqliteSaver state IS the source of truth.
+    stable_cp = context.conversation_chain_metadata.get("stable_checkpoint_id")
+    if not context.is_recovery and stable_cp and context.is_steered_turn:
+        forked = await asyncio.to_thread(_fork_from_checkpoint, _graph, thread_config, stable_cp, input_text)
+        if forked:
+            completed, nodes = await asyncio.to_thread(
+                _invoke_cancellable, _graph, None, thread_config, cancellation_signal
+            )
+            # Emit node progress as function call outputs
+            for node in nodes:
+                fn_call = resp_stream.add_output_item_function_call(name=node, call_id=f"node_{node}", arguments="{}")
+                yield fn_call.emit_added()
+                yield fn_call.emit_done()
+
+            if not completed or cancellation_signal.is_set():
+                if shutdown_timer and not shutdown_timer.done():
+                    shutdown_timer.cancel()
+                # Shutdown: return without terminal → re-entered on restart.
+                if context.shutdown.is_set():
+                    return
+                yield resp_stream.emit_completed()
+                return
+
+            # Save new stable checkpoint
+            state = await asyncio.to_thread(_graph.get_state, thread_config)
+            context.conversation_chain_metadata["stable_checkpoint_id"] = state.config["configurable"]["checkpoint_id"]
+            # Emit the AI reply
+            for event in _build_reply_events(resp_stream, state):
+                yield event
+            if shutdown_timer and not shutdown_timer.done():
+                shutdown_timer.cancel()
+            yield resp_stream.emit_completed()
+            return
+
+    # ── Phase 2: Normal invocation (graph.stream with inter-node cancel) ─
+    state = await asyncio.to_thread(_graph.get_state, thread_config)
+
+    if state.next:
+        graph_input = Command(resume=input_text)
+    else:
+        graph_input = {"messages": [HumanMessage(content=input_text)], "is_complete": False}
+
+    completed, nodes = await asyncio.to_thread(
+        _invoke_cancellable, _graph, graph_input, thread_config, cancellation_signal
+    )
+
+    for node in nodes:
+        fn_call = resp_stream.add_output_item_function_call(name=node, call_id=f"node_{node}", arguments="{}")
+        yield fn_call.emit_added()
+        yield fn_call.emit_done()
+
+    if shutdown_timer and not shutdown_timer.done():
+        shutdown_timer.cancel()
+
+    # ── Phase 3: Post-completion handling ───────────────────────────
+    if not completed or cancellation_signal.is_set():
+        # Shutdown: return without terminal → re-entered on restart.
+        if context.shutdown.is_set():
+            return
+        yield resp_stream.emit_completed()
+        return
+
+    # Save stable checkpoint reference
+    state = await asyncio.to_thread(_graph.get_state, thread_config)
+    context.conversation_chain_metadata["stable_checkpoint_id"] = state.config["configurable"]["checkpoint_id"]
+
+    for event in _build_reply_events(resp_stream, state):
+        yield event
+    yield resp_stream.emit_completed()
+
+
+def _build_reply_events(resp_stream: ResponseEventStream, state: Any) -> list[Any]:
+    """Build response events for the latest AI message from graph state."""
+    messages = state.values.get("messages", [])
+    ai_messages = [m for m in messages if isinstance(m, AIMessage)]
+    if not ai_messages:
+        return []
+    reply = ai_messages[-1].content
+    message = resp_stream.add_output_item_message()
+    text = message.add_text_content()
+    return [
+        message.emit_added(),
+        text.emit_added(),
+        text.emit_delta(reply),
+        text.emit_text_done(),
+        text.emit_done(),
+        message.emit_done(),
+    ]
+
+
+async def _simulate_shutdown(context: ResponseContext) -> None:
+    """Fire SHUTTING_DOWN after a delay (local testing only)."""
+    await asyncio.sleep(_SIMULATE_SHUTDOWN_MS / 1000.0)
+    context.shutdown.set()
+
+
+def main() -> None:
+    app.run()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_22_durable_multiturn.py b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_22_durable_multiturn.py
new file mode 100644
index 000000000000..d4b765cf03fa
--- /dev/null
+++ b/sdk/agentserver/azure-ai-agentserver-responses/samples/sample_22_durable_multiturn.py
@@ -0,0 +1,87 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+"""Sample 22 — Durable Multi-turn (serial conversation, no steering).
+
+A self-contained multi-turn handler with no external LLM dependency.
+Demonstrates the perpetual task lifecycle: each turn completes, the task
+suspends, and the next turn resumes it.
+
+Without steering, the framework serializes turns via a conversation lock.
+If turn A is executing when turn B arrives, turn B waits (not cancels).
+
+Key concepts:
+- ``durable_background=True``, ``steerable_conversations=False``
+- Conversation history via ``context.get_history()`` (framework-managed)
+- Metadata for bounded execution state only (turn counter)
+- Crash recovery: handler re-invoked, same input + history → same output
+
+Usage::
+
+    python sample_22_durable_multiturn.py
+
+    # Turn 1
+    curl -X POST http://localhost:8088/responses \
+        -H "Content-Type: application/json" \
+        -d '{"model": "chat", "input": "My name is Alice", "store": true, "background": true}'
+
+    # Turn 2 (reference previous for conversation context)
+    curl -X POST http://localhost:8088/responses \
+        -H "Content-Type: application/json" \
+        -d '{"model": "chat", "input": "What is my name?", "store": true, "background": true, "previous_response_id": "<id>"}'
+
+    # End conversation
+    curl -X POST http://localhost:8088/responses \
+        -H "Content-Type: application/json" \
+        -d '{"model": "chat", "input": "done", "store": true, "background": true, "previous_response_id": "<id>"}'
+"""
+
+import asyncio
+
+from azure.ai.agentserver.responses import (
+    CreateResponse,
+    ResponseContext,
+    ResponsesAgentServerHost,
+    ResponsesServerOptions,
+    TextResponse,
+)
+
+options = ResponsesServerOptions(
+    durable_background=True,
+    steerable_conversations=False,
+)
+app = ResponsesAgentServerHost(options=options)
+
+
+@app.response_handler
+async def handler(
+    request: CreateResponse,
+    context: ResponseContext,
+    cancellation_signal: asyncio.Event,
+):
+    """Multi-turn handler with perpetual task lifecycle."""
+    input_text = await context.get_input_text()
+    turn_count = context.conversation_chain_metadata.get("turn_count", 0) + 1
+
+    # Explicit session termination
+    if input_text.strip().lower() == "done":
+        context.conversation_chain_metadata.clear()
+        return TextResponse(context, request, text=f"Done! Session complete after {turn_count - 1} turns. Goodbye!")
+
+    # Get conversation history from framework store
+    history_items = await context.get_history()
+
+    # Generate reply (replace with your LLM of choice)
+    reply = (
+        f"Turn {turn_count}: You said '{input_text}'. " f"I have {len(history_items)} items of conversation context."
+    )
+
+    context.conversation_chain_metadata["turn_count"] = turn_count
+    return TextResponse(context, request, text=reply)
+
+
+def main() -> None:
+    app.run()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/sdk/agentserver/skills/README.md b/sdk/agentserver/skills/README.md
new file mode 100644
index 000000000000..6ad4a4835e4c
--- /dev/null
+++ b/sdk/agentserver/skills/README.md
@@ -0,0 +1,46 @@
+# Agentserver standalone skills
+
+Four "AI-coding-agent skill" files — standalone, portable, copy-into-
+your-project artifacts that give a coding agent (GitHub Copilot, etc.)
+enough context to use each primitive correctly.
+
+| Skill | What it teaches | Companion package |
+|-------|-----------------|-------------------|
+| [`durable-task-skill.md`](durable-task-skill.md) | The `@task` durable primitive — crash-resilient long-running handlers, lease + recovery, steering, multi-turn | `azure-ai-agentserver-core` |
+| [`streaming-skill.md`](streaming-skill.md) | The `streams` registry — producer/subscriber fan-out, replay backings, durable streaming, `Last-Event-ID` reconnect | `azure-ai-agentserver-core` |
+| [`invocations-skill.md`](invocations-skill.md) | The `InvocationAgentServerHost` — free-form invocations protocol HTTP + WebSocket host, long-running + polling, multi-turn via `agent_session_id` | `azure-ai-agentserver-invocations` |
+| [`responses-skill.md`](responses-skill.md) | The `ResponsesAgentServerHost` — OpenAI Responses API host, builder events, durable + steerable conversations | `azure-ai-agentserver-responses` |
+
+The two **host** skills (invocations, responses) are alternatives —
+pick the one whose wire protocol matches your client. The two **core**
+skills (durable-task, streaming) are usually composed *with* whichever
+host you pick, to add crash recovery and producer/subscriber streaming
+respectively.
+
+## Why standalone
+
+Each skill has a YAML frontmatter block (`name:` + `description:`)
+that Copilot's skill system recognises, and a body shaped for an
+LLM: explicit WHEN / WHEN NOT, a minimal runnable pattern, decision
+shortcuts. They're meant to be **copied** into a project — drop the
+markdown file next to your code and Copilot picks it up.
+
+## Why on the demo branch
+
+Skills + the [preview wheels](../wheels/) form a single distribution
+unit: the skill teaches the API, the wheels provide the implementation.
+Both live on this branch (`feature/agentserver-durable-agent-demo`)
+so a downstream project can clone one branch and get everything it
+needs to build durable / streaming / Responses-API agents.
+
+The long-form developer guides each skill references
+(`durable-task-guide.md`, `streaming-guide.md`,
+`handler-implementation-guide.md`, etc.) live in the corresponding
+package's `docs/` folder — they're SOT reference documentation tied
+to the package, while the skills are portable.
+
+## Refreshing
+
+After substantive API or contract changes to a package, the matching
+skill should be updated by hand. Skills are not auto-generated — they
+distil tribal knowledge that no single doc captures.
diff --git a/sdk/agentserver/skills/durable-task-skill.md b/sdk/agentserver/skills/durable-task-skill.md
new file mode 100644
index 000000000000..59179318aa67
--- /dev/null
+++ b/sdk/agentserver/skills/durable-task-skill.md
@@ -0,0 +1,201 @@
+---
+name: agentserver-durable-tasks
+description: 'Build crash-resilient long-running agent handlers using the `@task` primitive from `azure-ai-agentserver-core`. WHEN: "make my agent crash-resilient", "resume after restart", "long-running agent (>15 min)", "steer / interrupt a running agent turn", "multi-turn conversation that survives container restarts", "hosted agent that needs lease + checkpoint recovery", "agent with cancel / cooperative shutdown", "pass large inputs up to 2 MB to a task (function or steering inputs)". DO NOT USE FOR: persisting conversation history (use LangGraph / your DB), storing large checkpoints (`ctx.metadata` is intentionally small — watermarks only), workflow orchestration (use Temporal / Durable Functions), or competing-consumer queues. PRIVATE PREVIEW: the `@task` primitive ships only via pre-release wheels checked into this branch (see references); the surrounding `azure-ai-agentserver-*` packages are on PyPI at stable versions.'
+---
+
+# Agentserver Durable Tasks (`@task`) — Standalone Skill
+
+> **Standalone document.** Copy this file into your project to give your
+> AI coding agent (GitHub Copilot, etc.) the context it needs to use the
+> `@task` primitive correctly. Pair it with the checked-in pre-release
+> wheels (see *Packaging* below) — that's all your project needs to start
+> building durable agents.
+
+The `@task` decorator in `azure-ai-agentserver-core.durable` turns a single
+agent function into a **crash-resilient, steerable, long-running** primitive
+backed by a hosted task store. The framework handles lease acquisition,
+recovery from container restarts, checkpoint metadata persistence, and
+cooperative cancel — the handler stays simple.
+
+## When to use
+
+Use `@task` when **any** of these apply:
+
+- The agent run lasts long enough that container reclaim / crash is a real
+  risk and a "restart from scratch" recovery is too expensive.
+- You need **steering** — a new user input arriving mid-turn should
+  cooperatively wind down the current turn and re-enter with the new
+  input (instead of stacking turns).
+- You need **multi-turn conversations that survive restarts** — turn N+1
+  must resume the persisted state of turn N even if the container died
+  between them.
+- The agent is **hosted** (e.g., Foundry Hosted Agent) and you want the
+  platform's lease-renewal keep-alive path to extend the sandbox idle
+  timer past the eviction window.
+
+## When NOT to use
+
+`@task` is intentionally narrow. Do **not** use it for:
+
+- **Conversation history persistence.** `ctx.metadata` is *not* a chat
+  log store — it's for small watermarks and dedup tokens (max ~tens of
+  KB). Persist messages, tool outputs, and large state through your
+  agent framework's native store (LangGraph `SqliteSaver`, your own DB,
+  etc.). The two are complementary: `@task` provides the *durable
+  outer boundary*; your framework provides the *content store*.
+- **Large checkpoint state.** Same reason. If you want to snapshot
+  20 MB of intermediate computation between checkpoints, write it to
+  your own storage and put only a pointer (object ID, URL) in
+  `ctx.metadata`.
+- **Workflow orchestration.** Fan-out/fan-in, child workflows, signals,
+  timers as first-class primitives → use Temporal or Durable Functions.
+  `@task` is the thin durable boundary around a *single* agent function;
+  it can live *inside* such an engine but doesn't replace it.
+- **Competing-consumer queues.** A `task_id` identifies one logical
+  unit of work owned by one current lifetime. If you want N workers
+  pulling jobs off a shared queue, use a queue.
+- **Deterministic replay.** `@task` is not Temporal-style replay. After
+  a crash the handler re-runs from the top with whatever state survived;
+  determinism inside the handler is the developer's responsibility. The
+  "at-most-once side effect" pattern below covers the standard case.
+
+## Minimal pattern
+
+```python
+from azure.ai.agentserver.core.durable import task, TaskContext, Suspended
+
+@task(name="my_agent", steerable=True)
+async def my_agent(ctx: TaskContext[dict]) -> dict:
+    topic = ctx.input["topic"]
+
+    # ctx.metadata is small, durable, survives crashes
+    completed = ctx.metadata.get("completed_phases", 0)
+    results: list = ctx.metadata.get("results", [])
+
+    if ctx.entry_mode == "recovered":
+        # we crashed mid-run; resume from last checkpoint
+        await emit_recovered_marker(completed)
+
+    for phase_idx in range(completed, TOTAL_PHASES):
+        if ctx.cancel.is_set():
+            # steering arrived (or operator cancelled) — wind down
+            return await _wind_down(ctx, phase_idx, results)
+
+        result = await do_one_phase(topic, phase_idx)
+        results.append(result)
+
+        # === CHECKPOINT ===
+        ctx.metadata["completed_phases"] = phase_idx + 1
+        ctx.metadata["results"] = results  # keep small!
+        await ctx.metadata.flush()
+
+    return {"phases_completed": TOTAL_PHASES, "results": results}
+```
+
+**Dispatching** from your HTTP handler:
+
+```python
+from azure.ai.agentserver.core.durable import TaskConflictError
+
+# One durable task per session — steering finds the active run.
+try:
+    await my_agent.start(task_id=session_id, input={"topic": topic})
+    status = "started"
+except TaskConflictError:
+    # Already active + steerable → framework queued our input as a
+    # steering signal; current turn winds down at next checkpoint.
+    status = "steered"
+```
+
+**Streaming** progress back to a (re)connecting client:
+
+```python
+from azure.ai.agentserver.core.streaming import streams
+
+# Producer (inside the handler) emits to a process-level stream id
+# (typically the per-turn invocation id from the handler's input):
+#     stream = await streams.get_or_create(invocation_id)
+#     await stream.emit({"event": "progress", "step": "fetch"})
+#     ...
+#     await stream.close()
+
+# Consumer (HTTP layer) attaches BEFORE starting the task:
+stream = await streams.get_or_create(invocation_id)
+run = await my_agent.start(task_id=session_id,
+                            input={"invocation_id": invocation_id, ...})
+async for ev in stream.subscribe(after=0):
+    yield f"data: {ev}\n\n"
+result = await run.result()  # TaskRun is awaitable; awaits result()
+```
+
+## Pick the right metadata
+
+Rule of thumb: **store the smallest watermark that lets you resume
+correctly**. If you can derive everything else by re-running the
+non-side-effectful part of the handler, do that.
+
+| Good in `ctx.metadata` | Bad in `ctx.metadata` |
+|---|---|
+| `"completed_phases": 7` | Full chat transcript |
+| `"last_input_id": "msg_..."` | Generated artifacts (KBs+) |
+| `"output_store_key": "s3://..."` (a pointer) | The thing the pointer points to |
+| `"dedup_token": "uuid-abc"` | Vector embeddings |
+
+Always call `await ctx.metadata.flush()` at the end of a checkpoint
+boundary. That's the durable persistence point — a crash before flush
+re-runs the phase; a crash after flush skips it.
+
+## Hosted vs local
+
+In hosted environments (`FOUNDRY_HOSTING_ENVIRONMENT` set by the platform)
+`@task` uses the HTTP-backed `HostedTaskProvider` against the Foundry
+task-storage API automatically — no opt-in env var required.
+
+In local development (no `FOUNDRY_HOSTING_ENVIRONMENT`) `@task` uses
+`LocalFileTaskProvider` rooted at `~/.durable-tasks/` (override with
+`AGENTSERVER_DURABLE_TASKS_PATH` for tests). No service dependency for
+local iteration.
+
+## Packaging — private preview wheels
+
+The surrounding `azure-ai-agentserver-core` and
+`azure-ai-agentserver-invocations` packages are published on PyPI at
+stable versions. **The `@task` durable primitive is in private preview**
+and ships *only* via the pre-release wheels checked into this branch.
+There is no PyPI release for the `@task` API until it goes GA — installing
+the regular PyPI version of `azure-ai-agentserver-core` will not give you
+`azure.ai.agentserver.core.durable`.
+
+Consume the checked-in wheels per:
+
+- Wheel directory + README: [`sdk/agentserver/wheels/`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/wheels)
+
+## Authoritative references
+
+| Topic | Link |
+|---|---|
+| **Full developer guide** (mental model, lifecycle, API reference, patterns) | [`docs/durable-task-guide.md`](https://github.com/Azure/azure-sdk-for-python/blob/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-core/docs/durable-task-guide.md) |
+| **Streaming developer guide** (registry API, backings, per-turn id convention, exception/wire mapping) | [`docs/streaming-guide.md`](https://github.com/Azure/azure-sdk-for-python/blob/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-core/docs/streaming-guide.md) |
+| Minimal retry sample | [`samples/durable_retry/durable_retry.py`](https://github.com/Azure/azure-sdk-for-python/blob/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-core/samples/durable_retry/durable_retry.py) |
+| Streaming via the `streams` registry | [`samples/durable_streaming/durable_streaming.py`](https://github.com/Azure/azure-sdk-for-python/blob/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-core/samples/durable_streaming/durable_streaming.py) |
+| End-to-end **long-running + crash + steer** demo (Foundry hosted) | [`samples/durable-agent-demo/`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo) |
+| Multi-turn (suspend / resume) | [`samples/durable_multiturn/`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn) |
+| LangGraph integration | [`samples/durable_langgraph/`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_langgraph) |
+
+Read the developer guide first — it covers `EntryMode`, retry semantics,
+`Suspended`, steering queue backpressure, cancel-cause booleans
+(`timeout_exceeded`, `cancel_requested`, `pending_input_count`), shutdown
+via `ctx.exit_for_recovery()`, and the patterns referenced above. The
+samples ground the API in working code.
+
+## Decision shortcuts
+
+| Need | Use `@task`? | Why |
+|---|---|---|
+| Multi-turn chat that survives container restart | ✅ | Lease + recovery + checkpoint metadata |
+| Steerable long generation (user can change topic mid-run) | ✅ | `steerable=True` + `ctx.cancel.is_set()` |
+| Single short-lived (<30s) request/response | ❌ | Overkill — just write a normal handler |
+| Persist 100 MB of intermediate artifacts | ❌ | Use your own object store; put the pointer in metadata |
+| Pull jobs off a shared queue across N workers | ❌ | Wrong primitive — use a queue |
+| Fan-out 10 child workflows and join | ❌ | Use Temporal / Durable Functions |
+| Want exactly-once side effects | ⚠️ | Use the at-most-once pattern in the guide; framework provides at-most-once via dedup token, not exactly-once |
diff --git a/sdk/agentserver/skills/invocations-skill.md b/sdk/agentserver/skills/invocations-skill.md
new file mode 100644
index 000000000000..e3b11038b1d3
--- /dev/null
+++ b/sdk/agentserver/skills/invocations-skill.md
@@ -0,0 +1,288 @@
+---
+name: agentserver-invocations
+description: 'Build agent HTTP / WebSocket endpoints that speak the Azure AI Hosted Agents invocations protocol, using `InvocationAgentServerHost` from `azure-ai-agentserver-invocations`. WHEN: "expose my agent as a Foundry Hosted Agent invocations endpoint", "free-form POST /invocations + GET /invocations/{id} status + POST /invocations/{id}/cancel", "stream agent output as SSE", "bidirectional WebSocket streaming via /invocations_ws", "long-running invocations with polling", "publish an OpenAPI spec at /invocations/docs/openapi.json", "multi-turn conversations via agent_session_id grouping", "Foundry-hosted agent that needs platform-injected invocation/session IDs". DO NOT USE FOR: building OpenAI Responses API agents (use the `agentserver-responses` skill — the responses host adds the OpenAI wire protocol on top); OpenAI Chat Completions (different protocol, different host); raw `@task` durable computation with no HTTP surface (use the `@task` primitive from `agentserver-durable-tasks` skill directly); pure RPC microservices without per-invocation lifecycle (just use Starlette / FastAPI directly).'
+---
+
+# Agentserver Invocations (`InvocationAgentServerHost`) — Standalone Skill
+
+> **Standalone document.** Copy this file into your project to give your
+> AI coding agent (GitHub Copilot, etc.) the context it needs to build
+> agents on the Azure AI Hosted Agents *invocations* protocol using
+> `azure-ai-agentserver-invocations`. Pair it with the checked-in
+> pre-release wheels (see *Packaging* below) — that's all your project
+> needs to start.
+
+The `InvocationAgentServerHost` class in
+`azure.ai.agentserver.invocations` exposes the **invocations protocol**:
+a free-form HTTP API
+(`POST /invocations`, `GET /invocations/{id}`,
+`POST /invocations/{id}/cancel`,
+`GET /invocations/docs/openapi.json`) plus an optional
+full-duplex WebSocket transport (`/invocations_ws`). You bring the
+request / response wire format; the host owns the per-invocation
+identity, the session-id resolution, the response headers, distributed
+tracing, and the WebSocket lifecycle.
+
+## When to use
+
+Use `InvocationAgentServerHost` when **any** of these apply:
+
+- You're shipping an agent as a **Foundry Hosted Agent container** that
+  speaks the invocations protocol (the platform routes to
+  `/invocations*` and expects the platform-injected
+  `x-agent-invocation-id` header echoed back, the resolved
+  `x-agent-session-id` returned, etc.).
+- Your request / response shape is **free-form** (your own JSON, your
+  own SSE event taxonomy) — not bound by the OpenAI Responses API
+  contract.
+- You need **WebSocket bidirectional streaming** for tool calling,
+  back-pressure-sensitive flows, or full-duplex chat — registering
+  `@app.ws_handler` adds `/invocations_ws` on the same host.
+- You need **long-running invocations with polling**: POST returns
+  immediately with an `invocation_id`, GET retrieves status or result,
+  cancel terminates.
+- You want to **publish an OpenAPI spec** at
+  `/invocations/docs/openapi.json` for client discovery.
+- You need **multi-turn conversations** grouped by
+  `agent_session_id` (query param on POST, env var fallback, UUID
+  default) — the resolved session ID is on
+  `request.state.session_id` for handler-side state lookups.
+
+## When NOT to use
+
+`InvocationAgentServerHost` is intentionally narrow. Do **not** use it for:
+
+- **OpenAI Responses API agents.** Use the `agentserver-responses`
+  skill — `ResponsesAgentServerHost` adds the Responses-API wire
+  protocol (`POST /responses`, builder events,
+  `response.output_text.delta`, etc.) on top of the same core
+  framework. Don't try to hand-roll Responses API on top of
+  `InvocationAgentServerHost` — let the responses package own that
+  contract.
+- **OpenAI Chat Completions API agents.** Different protocol
+  (`/chat/completions`); neither host implements it.
+- **Raw `@task` durable computation** with no HTTP surface. Use the
+  `@task` decorator from `azure-ai-agentserver-core.durable` directly.
+  See the `agentserver-durable-tasks` skill.
+- **Pure RPC microservices** without per-invocation lifecycle
+  (no invocation_id, no session_id, no platform header echoing).
+  Use Starlette or FastAPI directly — the host's value is the
+  per-invocation lifecycle it owns for you.
+
+## Minimal pattern
+
+```python
+from azure.ai.agentserver.invocations import InvocationAgentServerHost
+from starlette.requests import Request
+from starlette.responses import JSONResponse, Response
+
+app = InvocationAgentServerHost()
+
+
+@app.invoke_handler  # POST /invocations  (required)
+async def handle(request: Request) -> Response:
+    data = await request.json()
+    return JSONResponse({"greeting": f"Hello, {data['name']}!"})
+
+
+if __name__ == "__main__":
+    app.run()  # binds :8088 by default
+```
+
+## Long-running + polling
+
+```python
+import asyncio
+from azure.ai.agentserver.invocations import InvocationAgentServerHost
+from starlette.requests import Request
+from starlette.responses import JSONResponse, Response
+
+_tasks: dict[str, asyncio.Task] = {}
+_results: dict[str, dict] = {}
+
+app = InvocationAgentServerHost()
+
+
+@app.invoke_handler
+async def start(request: Request) -> Response:
+    invocation_id = request.state.invocation_id   # framework-stamped
+    payload = await request.json()
+    _tasks[invocation_id] = asyncio.create_task(do_work(invocation_id, payload))
+    return JSONResponse({"invocation_id": invocation_id, "status": "running"}, status_code=202)
+
+
+@app.get_invocation_handler                       # GET /invocations/{id}
+async def get_status(request: Request) -> Response:
+    invocation_id = request.state.invocation_id
+    if invocation_id in _results:
+        return JSONResponse(_results[invocation_id])
+    return JSONResponse({"invocation_id": invocation_id, "status": "running"})
+
+
+@app.cancel_invocation_handler                    # POST /invocations/{id}/cancel
+async def cancel(request: Request) -> Response:
+    invocation_id = request.state.invocation_id
+    task = _tasks.pop(invocation_id, None)
+    if task is None:
+        return JSONResponse({"error": "not_found"}, status_code=404)
+    task.cancel()
+    return JSONResponse({"invocation_id": invocation_id, "status": "cancelled"})
+```
+
+## SSE streaming
+
+```python
+import json
+from azure.ai.agentserver.invocations import InvocationAgentServerHost
+from starlette.requests import Request
+from starlette.responses import Response, StreamingResponse
+
+app = InvocationAgentServerHost()
+
+
+@app.invoke_handler
+async def stream(request: Request) -> Response:
+    async def generate():
+        for word in ("Hello", " ", "world", "!"):
+            yield f"data: {json.dumps({'delta': word})}\n\n"
+
+    return StreamingResponse(generate(), media_type="text/event-stream")
+```
+
+For durable-streaming patterns where the producer (inside `@task`)
+fans out to N HTTP subscribers via SSE, with replay + reconnect, see
+the [`agentserver-streaming` skill](streaming-skill.md).
+
+## WebSocket bidirectional
+
+```python
+from starlette.websockets import WebSocket
+
+@app.ws_handler  # /invocations_ws (full-duplex; protocol = invocations_ws)
+async def ws(websocket: WebSocket) -> None:
+    async for message in websocket.iter_text():
+        await websocket.send_text(f"echo: {message}")
+```
+
+## Handler state (set by the framework)
+
+The host populates these on `request.state` before dispatching:
+
+| Attribute | Endpoints | Source |
+|---|---|---|
+| `request.state.invocation_id` | invoke / get / cancel | `x-agent-invocation-id` header (platform-injected when hosted) → generated UUID |
+| `request.state.session_id` | invoke / get / cancel | `agent_session_id` query param (invoke only, per spec) → `FOUNDRY_AGENT_SESSION_ID` env var → generated UUID (invoke only) |
+| `request.state.user_isolation_key` | invoke | `x-agent-user-isolation-key` header |
+| `request.state.chat_isolation_key` | invoke | `x-agent-chat-isolation-key` header |
+
+Per the invocation protocol contract,
+GET and cancel have **no platform-defined query parameters** — the
+session is implicit (env-var sourced). The framework resolves it from
+`FOUNDRY_AGENT_SESSION_ID` and stamps it on
+`request.state.session_id` for your handler regardless.
+
+## Composing with the durable primitive
+
+Pairing `InvocationAgentServerHost` with the `@task` primitive (see
+the [`agentserver-durable-tasks`](durable-task-skill.md) skill) is the
+canonical pattern for crash-resilient hosted agents:
+
+```python
+from azure.ai.agentserver.invocations import InvocationAgentServerHost
+from azure.ai.agentserver.core.durable import multi_turn_task, TaskContext
+
+app = InvocationAgentServerHost()
+
+
+@multi_turn_task(steerable=True)  # crash-resilient + steerable durable primitive
+async def research(ctx: TaskContext[dict]) -> dict:
+    # ctx.input is one turn's payload; ctx.entry_mode tells you whether
+    # this is a fresh turn, a resumed turn, or a crash-recovered re-entry.
+    ...
+    return result
+
+
+@app.invoke_handler
+async def handle(request: Request) -> Response:
+    payload = await request.json()
+    task_id = request.state.session_id   # one durable task per session
+    input_id = request.state.invocation_id  # per-turn id
+    await research.start(task_id=task_id, input=payload, input_id=input_id)
+    return JSONResponse({"status": "started", "invocation_id": input_id}, status_code=202)
+
+
+@app.cancel_invocation_handler
+async def cancel(request: Request) -> Response:
+    task_id = request.state.session_id
+    input_id = request.state.invocation_id
+    run = await research.get_active_run(task_id, input_id)
+    if run is None:
+        return JSONResponse({"status": "not_found"}, status_code=404)
+    await run.cancel()
+    return JSONResponse({"status": "cancelled"})
+```
+
+The [`durable-agent-demo`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo)
+sample wires this end-to-end with SSE streaming, file-backed task
+storage, and crash recovery.
+
+## Hosted vs local
+
+Auto-detected via `FOUNDRY_HOSTING_ENVIRONMENT`:
+
+- **Hosted** (Foundry Hosted Agent platform): platform injects
+  `x-agent-invocation-id`, `x-agent-user-isolation-key`,
+  `x-agent-chat-isolation-key`, `FOUNDRY_AGENT_SESSION_ID` env var,
+  routes `/invocations*` to your container, terminates `/invocations_ws`
+  WebSockets, exposes the OpenAPI spec from
+  `/invocations/docs/openapi.json` for client discovery.
+- **Local dev**: framework generates IDs as UUIDs when not supplied;
+  isolation keys are empty; session id falls through to UUID; the host
+  binds `:8088` by default.
+
+## Packaging — private preview wheels
+
+The current invocations host with the cancel/get session-id propagation
+fix (per the invocation protocol spec) and the durable-task integration
+ships only via the pre-release wheels checked into this branch. The
+regular PyPI version of `azure-ai-agentserver-invocations` predates
+these.
+
+Consume the checked-in wheels per:
+
+- Wheel directory + README: [`sdk/agentserver/wheels/`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/wheels)
+
+The wheels bundle all three preview packages (`core`,
+`invocations`, `responses`) so a single
+`pip install /path/to/wheels/*.whl` gives you the full surface.
+
+## Authoritative references
+
+| Topic | Link |
+|---|---|
+| **Package README** (decorator catalog, request/response headers, distributed tracing, WebSocket lifecycle) | [`README.md`](https://github.com/Azure/azure-sdk-for-python/blob/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/README.md) |
+| Multi-turn (suspend / resume on top of `@multi_turn_task`) | [`samples/durable_multiturn/`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_multiturn) |
+| End-to-end **long-running + crash + steer** demo (Foundry hosted) | [`samples/durable-agent-demo/`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo) |
+| Companion: durable-task primitive skill (the `@task` underneath) | [`durable-task-skill.md`](durable-task-skill.md) |
+| Companion: streaming registry skill (producer/subscriber fan-out + replay) | [`streaming-skill.md`](streaming-skill.md) |
+| Companion: responses-API host skill (when you need OpenAI Responses API wire format instead) | [`responses-skill.md`](responses-skill.md) |
+
+Read the package README first — it covers every decorator, the full
+request/response header table, distributed tracing semantics, and the
+WebSocket lifecycle (subprotocol negotiation, ping/pong, close codes).
+The protocol spec is the canonical wire contract. The samples ground
+the API in working code.
+
+## Decision shortcuts
+
+| Need | Use `InvocationAgentServerHost`? | Why |
+|---|---|---|
+| Free-form HTTP agent on Foundry Hosted Agents | ✅ | This is the host. |
+| WebSocket bidirectional streaming | ✅ | Register `@app.ws_handler`; no second package needed. |
+| Long-running invocation with polling + cancel | ✅ | Register `@app.invoke_handler` + `@app.get_invocation_handler` + `@app.cancel_invocation_handler`. |
+| Publish OpenAPI spec for client discovery | ✅ | Pass `openapi_spec={...}` to the constructor. |
+| OpenAI Responses API endpoint | ❌ | Use the `agentserver-responses` skill. |
+| OpenAI Chat Completions endpoint | ❌ | Different protocol — different host. |
+| Server-to-server background job, no HTTP | ❌ | Use the `@task` primitive directly. |
+| Pure RPC, no invocation_id / session_id / platform headers | ❌ | Use Starlette / FastAPI directly. |
+| Crash-resilient long-running agent | ⚠️ | Compose with `@task` / `@multi_turn_task` — see the "Composing with the durable primitive" section. |
diff --git a/sdk/agentserver/skills/responses-skill.md b/sdk/agentserver/skills/responses-skill.md
new file mode 100644
index 000000000000..51c00e437b52
--- /dev/null
+++ b/sdk/agentserver/skills/responses-skill.md
@@ -0,0 +1,232 @@
+---
+name: agentserver-responses
+description: 'Build OpenAI Responses API-compatible agents using the `ResponsesAgentServerHost` from `azure-ai-agentserver-responses`. WHEN: "expose my agent as an OpenAI Responses API endpoint", "implement /responses POST + GET + cancel + delete", "stream agent output as Responses SSE events (response.created, response.in_progress, response.output_text.delta, response.completed, etc.)", "emit Responses output items (messages, function calls, reasoning, structured outputs)", "background responses with SSE replay", "durable + crash-recoverable Responses API agent", "steerable Responses API multi-turn conversations (queue a new turn while one is running)", "Foundry-hosted Responses-API agent". DO NOT USE FOR: building agents on a different wire protocol (the invocations protocol — use `agentserver-durable-tasks` + `agentserver-streaming` skills; the OpenAI Chat Completions API — different host; raw `@task` durable computation that does NOT need a Responses-API HTTP surface). PRIVATE PREVIEW: ships only via pre-release wheels checked into this branch (see references); the regular PyPI version of `azure-ai-agentserver-responses` predates this surface and does not include `durable_background` / `steerable_conversations` / the per-request primitive dispatch.'
+---
+
+# Agentserver Responses (`ResponsesAgentServerHost`) — Standalone Skill
+
+> **Standalone document.** Copy this file into your project to give your
+> AI coding agent (GitHub Copilot, etc.) the context it needs to build
+> OpenAI Responses API-compatible agents on top of
+> `azure-ai-agentserver-responses`. Pair it with the checked-in
+> pre-release wheels (see *Packaging* below) — that's all your project
+> needs to start.
+
+The `ResponsesAgentServerHost` class in
+`azure.ai.agentserver.responses` exposes the OpenAI Responses API as an
+HTTP host. You register a single `@app.response_handler`-decorated
+coroutine; the framework owns the wire protocol (SSE event ordering,
+terminal-status invariants, background-mode lifecycle, GET-after-completion
+snapshots, /cancel semantics, /delete cleanup, the durable +
+steerable lifecycle when opted in).
+
+## When to use
+
+Use `ResponsesAgentServerHost` when **any** of these apply:
+
+- You need to expose an agent over the OpenAI Responses API wire format
+  so existing Responses-API clients (the `openai` Python SDK's
+  `responses.create`, raw HTTP callers reading `text/event-stream`,
+  etc.) work without modification.
+- You want background responses with SSE replay — POST returns
+  immediately, the handler runs in the background, and a subsequent
+  GET with `?stream=true` replays / live-streams the per-response
+  event log including a `?starting_after=N` reconnect cursor (the
+  Responses API's cursor convention — `N` is the
+  `sequence_number` of the last event the client received).
+- You need crash-recoverable agents (opt-in via
+  `ResponsesServerOptions(durable_background=True)`) — backed by the
+  `@task` primitive under the covers, but you write a normal
+  handler instead of touching the durable primitive directly.
+- You need steerable multi-turn conversations (opt-in via
+  `ResponsesServerOptions(steerable_conversations=True)`) — a new turn
+  posted on an in-flight conversation cooperatively winds down the
+  current turn at its next checkpoint and re-enters with the new input
+  on a fresh handler invocation, linked in a stable
+  `conversation_chain_id`.
+
+## When NOT to use
+
+`ResponsesAgentServerHost` is intentionally narrow. Do **not** use it for:
+
+- **Agents that speak a different wire protocol.** If you're building
+  for the invocations protocol (free-form request/response shape, no
+  OpenAI compatibility), use the `agentserver-durable-tasks` +
+  `agentserver-streaming` skills directly — that gives you the durable
+  primitive + HTTP wrapper without the Responses-API surface.
+- **OpenAI Chat Completions API agents.** Different protocol; this
+  host implements Responses (`/responses`), not Chat Completions
+  (`/chat/completions`).
+- **Raw `@task` durable computation** with no HTTP surface. Use the
+  `@task` decorator from `azure-ai-agentserver-core.durable` directly.
+- **Custom HTTP paths.** `ResponsesAgentServerHost` owns
+  `/responses*`. If you need additional endpoints, compose via
+  Starlette mounting or co-host another `AgentServerHost` subclass
+  via cooperative inheritance.
+
+## Minimal pattern
+
+```python
+import asyncio
+from azure.ai.agentserver.responses import (
+    ResponsesAgentServerHost,
+    ResponseContext,
+    TextResponse,
+)
+from azure.ai.agentserver.responses.models._generated import CreateResponse
+
+app = ResponsesAgentServerHost()
+
+
+# Handlers are async with exactly 3 positional parameters.
+# The 3rd arg `cancellation_signal` is an asyncio.Event the framework
+# fires on /cancel, non-bg POST disconnect, or steering pressure.
+# `context.shutdown` is a separate Event for server shutdown — observe
+# each independently.
+@app.response_handler
+async def my_handler(
+    request: CreateResponse,
+    context: ResponseContext,
+    cancellation_signal: asyncio.Event,
+):
+    # Simplest case — let the framework own the full SSE lifecycle:
+    return TextResponse(context, request, text="Hello, world!")
+
+
+if __name__ == "__main__":
+    app.run()  # binds :8088 by default
+```
+
+For full event control (function calls, multiple output items,
+streaming partials, structured outputs), use `ResponseEventStream` and
+yield events directly:
+
+```python
+from azure.ai.agentserver.responses.streaming._event_stream import ResponseEventStream
+
+@app.response_handler
+async def my_handler(request, context, cancellation_signal):
+    stream = ResponseEventStream(response_id=context.response_id, request=request)
+    yield stream.emit_created()
+    yield stream.emit_in_progress()
+    msg = stream.add_output_item_message()
+    yield msg.emit_added()
+    text = msg.add_text_content()
+    yield text.emit_added()
+    async for tok in upstream_llm():
+        if cancellation_signal.is_set():
+            break
+        yield text.emit_delta(tok)
+    yield text.emit_text_done(accumulated_text)
+    yield text.emit_done()
+    yield msg.emit_done()
+    yield stream.emit_completed()
+```
+
+## Cancel + shutdown observation
+
+Two **distinct** surfaces, two **distinct** handler responses:
+
+| Surface | Fires on | Handler should |
+|---|---|---|
+| `cancellation_signal` (3rd handler arg, `asyncio.Event`) | `/cancel` API call, non-bg POST disconnect, steering pressure | break work loop → close builders → emit `response.completed` (the framework overrides to `response.cancelled` if `context.client_cancelled is True`) |
+| `context.shutdown` (`asyncio.Event`) | server shutdown (SIGTERM, graceful drain) | `return await context.exit_for_recovery()` (durable + bg) or emit a quick terminal (others) |
+
+Shutdown does NOT fire the cancellation signal. Handlers that care
+about both must observe each independently.
+
+To distinguish steering from a client cancel inside the cancel branch:
+```python
+if cancellation_signal.is_set() and context.pending_input_count > 0:
+    # Steering pressure — a new turn is queued. Emit completed with
+    # whatever output is durably committed; the framework re-enters with
+    # the new input as ctx.input.
+    yield stream.emit_completed()
+    return
+```
+
+## Durable + steerable (opt-in)
+
+```python
+from azure.ai.agentserver.responses import ResponsesAgentServerHost, ResponsesServerOptions
+
+app = ResponsesAgentServerHost(options=ResponsesServerOptions(
+    durable_background=True,        # background responses survive process crashes
+    steerable_conversations=True,   # accept new turns on in-flight conversations
+))
+```
+
+When opted in, the handler also sees:
+
+| Field | Meaning |
+|---|---|
+| `context.is_recovery: bool` | `True` on a crash-recovered re-entry |
+| `context.is_steered_turn: bool` | `True` on the drain re-entry that follows a steering input |
+| `context.pending_input_count: int` | Live count of queued steering inputs |
+| `context.durable_metadata: DurableMetadataNamespace` | `MutableMapping` for handler-managed checkpoint state (small — watermarks, dedup tokens, NOT full conversation history). `await context.durable_metadata.flush()` for at-most-once side-effect fencing before an upstream call with observable side effects |
+| `await context.exit_for_recovery()` | Recovery primitive — `return await context.exit_for_recovery()` to leave the response `in_progress` so the next-lifetime recovery scanner picks it up |
+
+## Hosted vs local
+
+Both modes are auto-detected via `FOUNDRY_HOSTING_ENVIRONMENT`:
+
+- **Hosted** (Foundry Hosted Agent platform): response store auto-binds
+  to the Foundry hosted responses storage API; stream replay uses
+  file-backed storage under `${AGENTSERVER_DURABLE_ROOT}/streams/`;
+  durable task store uses the Foundry hosted task storage API; lease
+  renewal extends the sandbox idle-reclaim timer past the eviction
+  window.
+- **Local dev**: response store defaults to file-backed under
+  `${AGENTSERVER_DURABLE_ROOT:-~/.durable}/responses/`; stream replay
+  uses in-memory (durable_background=False) or file-backed
+  (durable_background=True) under
+  `${AGENTSERVER_DURABLE_ROOT}/streams/`; durable task store is
+  file-backed under `${AGENTSERVER_DURABLE_ROOT}/tasks/`.
+
+Operator override: `AGENTSERVER_TASKS_BACKEND=local|hosted` forces
+the task provider regardless of hosting detection. Useful for debugging
+hosted-only scenarios on a local workstation.
+
+## Packaging — private preview wheels
+
+The PyPI version of `azure-ai-agentserver-responses` predates the
+durable + steerable surface. **The current Responses API host with
+crash recovery, steering, and the per-request primitive dispatch
+ships only via the pre-release wheels checked into this branch.**
+
+Consume the checked-in wheels per:
+
+- Wheel directory + README: [`sdk/agentserver/wheels/`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/wheels)
+
+The wheels bundle all three preview packages (`core`,
+`invocations`, `responses`) so a single
+`pip install /path/to/wheels/*.whl` gives you the full surface.
+
+## Authoritative references
+
+| Topic | Link |
+|---|---|
+| **Handler implementation guide** (full patterns, builder API, terminal-status rules, cancellation matrix) | [`docs/handler-implementation-guide.md`](https://github.com/Azure/azure-sdk-for-python/blob/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-responses/docs/handler-implementation-guide.md) |
+| **Durable responses developer guide** (recovery contract, watermark patterns, upstream-framework integration, the `is_recovery` / `is_steered_turn` / `pending_input_count` surface) | [`docs/durable-responses-developer-guide.md`](https://github.com/Azure/azure-sdk-for-python/blob/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-responses/docs/durable-responses-developer-guide.md) |
+| Durable + steerable patterns (Copilot SDK, three-phase streaming with watermarks, steering drain, LangGraph integration, multi-turn) | [`samples/sample_18_durable_copilot.py`..`sample_22_durable_multiturn.py`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-responses/samples) |
+| Companion: durable-task primitive skill (the `@task` underneath) | [`durable-task-skill.md`](https://github.com/Azure/azure-sdk-for-python/blob/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/skills/durable-task-skill.md) |
+| Companion: streaming registry skill (the `streams` registry underneath) | [`streaming-skill.md`](https://github.com/Azure/azure-sdk-for-python/blob/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/skills/streaming-skill.md) |
+
+Read the handler implementation guide first — it covers the full
+event taxonomy (every SSE event type the host accepts, the builder
+methods for each, terminal-status invariants), the cancellation cause
+matrix, and the recovery primitive shape. The samples ground the API
+in working code.
+
+## Decision shortcuts
+
+| Need | Use `ResponsesAgentServerHost`? | Why |
+|---|---|---|
+| Expose agent as OpenAI Responses API endpoint | ✅ | This is the host. |
+| Background response with SSE replay (POST + GET ?stream=true) | ✅ | Framework owns the per-response stream registry + cursor. |
+| Multi-turn chat that survives container restart | ✅ | Opt into `durable_background=True` + `steerable_conversations=True`. |
+| Steerable long generation (user can change topic mid-run) | ✅ | Opt into `steerable_conversations=True`; observe `cancellation_signal` + `pending_input_count`. |
+| OpenAI Chat Completions API endpoint | ❌ | Different protocol — use a different host. |
+| Free-form invocations-protocol agent | ❌ | Use the `agentserver-durable-tasks` + `agentserver-streaming` skills directly. |
+| Server-to-server background job with no HTTP surface | ❌ | Use the `@task` primitive directly. |
+| Persist conversation history in `context.durable_metadata` | ❌ | Wrong — `durable_metadata` is for small watermarks. Use your own DB or framework store (LangGraph SqliteSaver, etc.) for content. |
diff --git a/sdk/agentserver/skills/streaming-skill.md b/sdk/agentserver/skills/streaming-skill.md
new file mode 100644
index 000000000000..8e1861c561b7
--- /dev/null
+++ b/sdk/agentserver/skills/streaming-skill.md
@@ -0,0 +1,247 @@
+---
+name: agentserver-streaming
+description: 'Emit events from one coroutine and fan them out to one or more subscribers (typically: your `@task` handler produces, your HTTP layer fans out as SSE / WebSocket / long-poll) using the `streams` registry from `azure-ai-agentserver-core`. WHEN: "stream tokens / progress events from my agent", "SSE endpoint", "fan an agent stream out to N subscribers", "let a late subscriber catch up via replay", "reconnect from `Last-Event-ID` cursor", "stream survives container crash + recovery", "bridge a single-consumer LLM SDK stream to N HTTP subscribers", "subscribe before invoke pattern", "durable streaming + checkpointed sequence numbers". DO NOT USE FOR: persisting business state (use `ctx.metadata` for tiny watermarks, your own store for content), cross-process pub/sub (one registry per process — use a real message bus), competing-consumer fan-out (every subscriber sees every event — not work-stealing), arbitrary back-pressure between producer and consumer (subscribers buffer per-subscriber; slow consumers grow their queue, not back-pressure the producer). PRIVATE PREVIEW: the `streaming` subpackage ships only via pre-release wheels checked into this branch (see references); the surrounding `azure-ai-agentserver-*` packages are on PyPI at stable versions.'
+---
+
+# Agentserver Streaming (`streams`) — Standalone Skill
+
+> **Standalone document.** Copy this file into your project to give your
+> AI coding agent (GitHub Copilot, etc.) the context it needs to use the
+> `streams` registry correctly. Pair it with the checked-in pre-release
+> wheels (see *Packaging* below) — that's all your project needs to start
+> building streaming endpoints on top of `@task`.
+
+The `streams` registry in `azure-ai-agentserver-core.streaming` is a
+process-level rendezvous between one **producer** coroutine (your
+agent handler) and one or more **subscriber** coroutines (typically your
+HTTP layer rendering SSE). You pick a *backing* once at app startup —
+live, in-memory replay, or file-backed replay — and from then on the
+producer and every subscriber just look the stream up by id.
+
+## When to use
+
+Use `streams` when **any** of these apply:
+
+- Your agent handler produces a stream of events (token deltas, phase
+  progress markers, intermediate tool calls) and one or more clients
+  want to see them as they happen.
+- The HTTP layer needs to fan a single producer out to **N
+  subscribers** (e.g., the originating client plus a tail / debug
+  client) without the producer knowing about subscribers.
+- Subscribers may **attach after the producer started** and need to
+  catch up via replay (use one of the replay backings).
+- Subscribers may **disconnect and reconnect** with a cursor
+  (`Last-Event-ID` for SSE) and resume without losing events.
+- The handler runs under `@task`, can **crash mid-stream**, and you
+  want a fresh subscriber after the recovery boundary to see the full
+  pre-crash + post-crash history (use the file-backed replay backing
+  plus the [Durable streaming primer](#durable-streaming-primer-task--streams)
+  below).
+
+## When NOT to use
+
+`streams` is intentionally narrow. Do **not** use it for:
+
+- **Cross-process or cross-machine pub/sub.** The registry lives in
+  one Python process. Multi-worker deployments need a real message
+  bus (Redis, NATS, Kafka). Each worker's registry is independent.
+- **Work-stealing / competing-consumer queues.** Every subscriber
+  receives every event. There is no acknowledge / nack protocol.
+  If you want N workers to share work, use a queue.
+- **Producer-side back-pressure.** Each subscriber has its own
+  per-subscriber buffer. A slow subscriber grows its own queue; the
+  producer never blocks on `emit()`. If a subscriber falls badly
+  behind it's the subscriber's problem — design for short-lived
+  HTTP streams.
+- **Business-state persistence.** The replay backings hold events for
+  *reconnect catch-up*, not as your system of record. Persist
+  business state through your agent framework's store and through
+  `@task` metadata watermarks.
+- **Stream durability across the registry's TTL / process boundary.**
+  `use_in_memory_replay(ttl_seconds=...)` evicts on TTL.
+  `use_file_backed_replay(...)` survives a process restart only for
+  the same stream id within the same on-disk directory.
+
+## Minimal pattern
+
+```python
+from azure.ai.agentserver.core.streaming import streams
+
+# 1. At app startup — pick a backing ONCE.
+streams.use_in_memory_replay(cursor_fn=lambda ev: ev["n"], ttl_seconds=600)
+
+# 2. The producer — typically your @task handler:
+async def produce(stream_id: str) -> None:
+    stream = await streams.get_or_create(stream_id)
+    try:
+        for n in range(5):
+            await stream.emit({"n": n, "msg": f"hello {n}"})
+    finally:
+        await stream.close()
+
+# 3. The subscriber — typically your HTTP handler.
+# Attach BEFORE the producer starts whenever you can.
+async def consume(stream_id: str) -> None:
+    stream = await streams.get_or_create(stream_id)
+    async for event in stream.subscribe():
+        print(event)
+    # Loop terminates cleanly when the producer calls close().
+```
+
+`streams.get_or_create(id)` is idempotent: the producer and the
+subscriber both call it with the same id and get the **same**
+`EventStream` instance back.
+
+## Pick the right backing
+
+| Backing | Use when | Reconnect / replay? | Survives process restart? |
+|---|---|---|---|
+| `use_in_memory_live()` *(default)* | Single subscriber attaches before the producer; lowest memory. | No — late subscribers miss earlier events. | No. |
+| `use_in_memory_replay(cursor_fn=..., ttl_seconds=...)` | Late subscribers / disconnect+reconnect within the same process; cursor-based catch-up. | Yes, up to TTL. | No. |
+| `use_file_backed_replay(root=..., cursor_fn=...)` | `@task` handler that can crash and recover; subscribers need monotonic event continuity across the crash boundary. | Yes, across process restarts for the same stream id + on-disk dir. | Yes. |
+
+Call exactly **one** configurator at app startup. Don't switch
+backings mid-process.
+
+## Pick the right stream id
+
+The stream id is the **per-turn natural identifier** — never the
+durable task id, because a `task_id` outlives a single turn:
+
+| Framework | Stream id | Source |
+|---|---|---|
+| `azure-ai-agentserver-invocations` (hosted agents) | `invocation_id` | HTTP layer's `request.state.invocation_id`; propagated to the `@task` handler via `ctx.input["invocation_id"]`. |
+| `azure-ai-agentserver-responses` (OpenAI-shaped) | `response_id` | The orchestrator knows it at every call site. |
+| Bare Python (no framework) | caller's choice (`str`) | Pick a natural per-turn id from your domain. |
+
+## Subscribe BEFORE you start the producer
+
+```python
+# 1. Resolve / create the stream first.
+stream = await streams.get_or_create(invocation_id)
+# 2. Start subscribing.
+async def pump():
+    async for ev in stream.subscribe():
+        yield render_sse(ev)
+# 3. NOW kick off the producer (e.g., the @task run).
+asyncio.create_task(run_task(invocation_id))
+```
+
+This guarantees the subscriber's queue exists by the time the very
+first `emit()` lands, so the live backing doesn't drop the early
+events. The replay backings can also catch you up after the fact via
+the cursor, but subscribe-before-start is the cheaper, simpler
+pattern when the HTTP layer owns both sides.
+
+## HTTP / SSE bridging shape
+
+```python
+from azure.ai.agentserver.core.streaming import (
+    streams,
+    EventStreamNotFoundError,
+)
+
+async def sse_endpoint(request):
+    invocation_id = request.path_params["id"]
+    # streams.get(id) raises NotFound for any id that isn't currently
+    # a live stream (never registered, deleted, or close-clock elapsed).
+    # streams.get_or_create cannot raise NotFound — it clears any
+    # tombstone and synthesises a fresh stream.
+    try:
+        stream = await streams.get(invocation_id)
+    except EventStreamNotFoundError:
+        return Response(404, "stream not found")
+
+    last_event_id = request.headers.get("last-event-id")
+    after = int(last_event_id) if last_event_id else None
+
+    async def body():
+        try:
+            async for ev in stream.subscribe(after=after):
+                yield f"id: {ev['n']}\ndata: {json.dumps(ev)}\n\n"
+        except EventStreamNotFoundError:
+            return  # stream was tombstoned mid-iteration; cleanly close
+    return StreamingResponse(body(), media_type="text/event-stream")
+```
+
+The streaming contract collapsed the prior `EventStreamGoneError`
+(`410 Gone`) and `EventStreamNotFoundError` (`404 Not Found`) into
+a single error type wire-mapped to `404`. Every "this id is not
+currently a live stream" condition raises `EventStreamNotFoundError`.
+
+The replay backings honor `after=<cursor>` for catch-up. The cursor
+itself comes from `cursor_fn(event)` you pass to the configurator —
+typically a monotonically increasing `sequence_number` you put on
+every event in your producer.
+
+## Durable streaming primer (`@task` + `streams`)
+
+The streaming registry is the natural pair for `@task`. The recipe:
+
+1. At app startup, call `streams.use_file_backed_replay(root=...,
+   cursor_fn=lambda ev: ev["sequence_number"])`.
+2. Producer (inside your `@task` handler) stamps every event with a
+   monotonically increasing `sequence_number`. After a crash, the
+   recovery boundary reads `stream.last_cursor()` to know where to
+   resume from.
+3. Subscribers reconnect with `Last-Event-ID: <sequence_number>` and
+   the file-backed replay backing replays the gap, then live-tails
+   from there.
+
+The `samples/durable-agent-demo/` end-to-end run demonstrates the full
+flow: subscriber connects mid-run, witnesses pre-crash events, sees
+the recovery boundary (`type=recovered`), then continues monotonically
+through the post-crash events with no gaps.
+
+## Bring your own `EventStream` implementation
+
+The bundled registry is the SDK-provided one. If you need different
+semantics (Redis-backed, pubsub-replicated, etc.) you implement the
+`EventStream` Protocol on your own class and ship your own peer
+registry — the SDK explicitly does not let third-party concrete
+classes plug into the bundled registry. The Protocol is small:
+`emit`, `close`, `subscribe`, `last_cursor`, plus three exception types.
+
+## Packaging — private preview wheels
+
+The surrounding `azure-ai-agentserver-core` and
+`azure-ai-agentserver-invocations` packages are published on PyPI at
+stable versions. **The `streaming` subpackage is in private preview**
+and ships *only* via the pre-release wheels checked into this branch.
+There is no PyPI release for `azure.ai.agentserver.core.streaming`
+until it goes GA — installing the regular PyPI version of
+`azure-ai-agentserver-core` will not give you the `streams` registry.
+
+Consume the checked-in wheels per:
+
+- Wheel directory + README: [`sdk/agentserver/wheels/`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/wheels)
+
+## Authoritative references
+
+| Topic | Link |
+|---|---|
+| **Full streaming developer guide** (configurators, EventStream Protocol, lifecycle, registry API, exception/wire mapping, recovery patterns, BYO impl) | [`docs/streaming-guide.md`](https://github.com/Azure/azure-sdk-for-python/blob/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-core/docs/streaming-guide.md) |
+| **Durable task developer guide** (the natural pair: `@task` produces, `streams` fans out) | [`docs/durable-task-guide.md`](https://github.com/Azure/azure-sdk-for-python/blob/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-core/docs/durable-task-guide.md) |
+| Bare-Python streaming sample | [`samples/durable_streaming/durable_streaming.py`](https://github.com/Azure/azure-sdk-for-python/blob/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-core/samples/durable_streaming/durable_streaming.py) |
+| End-to-end **long-running + crash + steer + SSE** demo (Foundry hosted) | [`samples/durable-agent-demo/`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable-agent-demo) |
+| Invocations streaming sample (research agent — SSE on POST + GET + `?last_event_id` reconnect) | [`samples/durable_research/`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_research) |
+| Invocations streaming sample (copilot agent — optional SSE on POST, polling fallback) | [`samples/durable_copilot/`](https://github.com/Azure/azure-sdk-for-python/tree/refs/heads/feature/agentserver-durable-preview-share/sdk/agentserver/azure-ai-agentserver-invocations/samples/durable_copilot) |
+
+Read the streaming developer guide first — it covers the full
+`EventStream` Protocol, the ACTIVE → CLOSED → GONE lifecycle, the
+`last_cursor()` rule-25 exemption for file-backed recovery,
+subscribe-before-start mechanics, the HTTP/SSE bridging shape, and
+the BYO peer-registry pattern.
+
+## Decision shortcuts
+
+| Need | Use `streams`? | Why |
+|---|---|---|
+| SSE endpoint that tails a `@task` handler's output | ✅ | The natural producer/subscriber rendezvous |
+| Late subscriber needs to catch up via cursor | ✅ | `use_in_memory_replay` or `use_file_backed_replay` |
+| Subscriber reconnects after a container crash | ✅ | `use_file_backed_replay` + monotonic `sequence_number` |
+| Fan one producer to N HTTP subscribers in the same process | ✅ | Every subscriber sees every event |
+| Multi-worker / cross-process pubsub | ❌ | Each worker has its own registry — use a real bus |
+| Work-stealing across N consumers | ❌ | Wrong primitive — use a queue |
+| Persist business state across turns | ❌ | Use `ctx.metadata` (small) + your own store (big) |
+| Producer needs back-pressure when subscriber is slow | ⚠️ | Per-subscriber queues grow; producer never blocks — design for short-lived HTTP streams |
diff --git a/sdk/agentserver/wheels/README.md b/sdk/agentserver/wheels/README.md
new file mode 100644
index 000000000000..e8ff01f1bca9
--- /dev/null
+++ b/sdk/agentserver/wheels/README.md
@@ -0,0 +1,36 @@
+# Checked-in preview wheels
+
+This directory ships the three `azure-ai-agentserver-*` packages as
+locally-built wheels so the `durable-agent-demo` docker image can
+`pip install /tmp/wheels/*.whl` without needing to publish each
+preview to PyPI.
+
+| Wheel | Source |
+|-------|--------|
+| `azure_ai_agentserver_core-*.whl` | `sdk/agentserver/azure-ai-agentserver-core` |
+| `azure_ai_agentserver_invocations-*.whl` | `sdk/agentserver/azure-ai-agentserver-invocations` |
+| `azure_ai_agentserver_responses-*.whl` | `sdk/agentserver/azure-ai-agentserver-responses` |
+
+## Consumption
+
+The `durable-agent-demo/build.sh` copies these wheels into the docker
+build context (`samples/durable-agent-demo/src/.../wheels/`). The
+sample's `Dockerfile` then runs `pip install --no-cache-dir /tmp/wheels/*.whl`
+to pull them in.
+
+Devs do NOT need to rebuild these — they're checked in.
+
+## Refreshing (maintainer-only)
+
+After source changes to any of the three packages, run:
+
+```bash
+sdk/agentserver/wheels/build-wheels.sh
+git add sdk/agentserver/wheels/*.whl
+git commit
+```
+
+The script removes stale `*.whl` files and re-builds at the version
+in each package's `_version.py`. No version bump is needed for
+unreleased `bN` previews — the same filename is overwritten with the
+new content.
diff --git a/sdk/agentserver/wheels/azure_ai_agentserver_core-2.0.0b7-py3-none-any.whl b/sdk/agentserver/wheels/azure_ai_agentserver_core-2.0.0b7-py3-none-any.whl
new file mode 100644
index 000000000000..8aa80e81ac92
Binary files /dev/null and b/sdk/agentserver/wheels/azure_ai_agentserver_core-2.0.0b7-py3-none-any.whl differ
diff --git a/sdk/agentserver/wheels/azure_ai_agentserver_invocations-1.0.0b6-py3-none-any.whl b/sdk/agentserver/wheels/azure_ai_agentserver_invocations-1.0.0b6-py3-none-any.whl
new file mode 100644
index 000000000000..65023295a340
Binary files /dev/null and b/sdk/agentserver/wheels/azure_ai_agentserver_invocations-1.0.0b6-py3-none-any.whl differ
diff --git a/sdk/agentserver/wheels/azure_ai_agentserver_responses-1.0.0b8-py3-none-any.whl b/sdk/agentserver/wheels/azure_ai_agentserver_responses-1.0.0b8-py3-none-any.whl
new file mode 100644
index 000000000000..7b0b7b22950a
Binary files /dev/null and b/sdk/agentserver/wheels/azure_ai_agentserver_responses-1.0.0b8-py3-none-any.whl differ
diff --git a/sdk/agentserver/wheels/build-wheels.sh b/sdk/agentserver/wheels/build-wheels.sh
new file mode 100755
index 000000000000..e900a40d6951
--- /dev/null
+++ b/sdk/agentserver/wheels/build-wheels.sh
@@ -0,0 +1,53 @@
+#!/usr/bin/env bash
+# ─────────────────────────────────────────────────────────────────────────────
+# Maintainer-only: rebuild the checked-in preview wheels.
+#
+# Output: refreshes *.whl files in this directory (sdk/agentserver/wheels/)
+#         alongside this script and README.md. Devs do NOT need to run this —
+#         the wheels are checked in. See README.md for consumption.
+#
+# Wheels included (azure-ai-agentserver-{core, invocations, responses}):
+#   - core         — durable-task primitives + storage_paths
+#   - invocations  — invocations protocol HTTP host
+#   - responses    — responses protocol HTTP host
+#
+# When to run:
+#   - After making source changes to any of the three packages that
+#     need to ship in the demo's docker image.
+#   - Before committing those source changes, so the wheels stay in sync.
+#
+# Usage (from anywhere):
+#   sdk/agentserver/wheels/build-wheels.sh
+# ─────────────────────────────────────────────────────────────────────────────
+
+set -euo pipefail
+
+WHEELS_DIR="$(cd "$(dirname "$0")" && pwd)"
+AGENTSERVER_ROOT="$(cd "$WHEELS_DIR/.." && pwd)"
+
+PACKAGES=(
+    "azure-ai-agentserver-core"
+    "azure-ai-agentserver-invocations"
+    "azure-ai-agentserver-responses"
+)
+
+echo "==> Rebuilding preview wheels into: $WHEELS_DIR"
+# Remove any stale wheel files but preserve README.md and the script itself.
+rm -f "$WHEELS_DIR"/*.whl
+
+for pkg in "${PACKAGES[@]}"; do
+    pkg_dir="$AGENTSERVER_ROOT/$pkg"
+    if [[ ! -d "$pkg_dir" ]]; then
+        echo "  !! Skipping $pkg — directory not found at $pkg_dir" >&2
+        continue
+    fi
+    echo "  - $pkg"
+    pip wheel --no-deps --quiet --wheel-dir "$WHEELS_DIR" "$pkg_dir"
+done
+
+echo ""
+echo "==> Refreshed wheels:"
+ls -la "$WHEELS_DIR"/*.whl
+
+echo ""
+echo "Next: git add sdk/agentserver/wheels/*.whl && commit."