MCP server process death is never detected: no reconnect, `initialized` stays True, tool calls raise raw `anyio.ClosedResourceError`

### Bug Description

On my own MCP server project I had an issue of always hitting max tool calls before I simplified the tool list, so I was curious if there were any similar tool list issues here. That's when I noticed tools can go dead mid-session: if the MCP server process dies, nothing notices, and every tool call after that fails for the rest of the session. On a long voice call that's going to happen eventually, servers restart.

What actually happens: the connection task never finds out the transport is gone, `initialized` keeps returning True, and every tool call for the rest of the session raises a raw `anyio.ClosedResourceError` with no message.

Root cause (traced on main, `livekit-agents/livekit/agents/llm/mcp.py`):

1. `_run_client` (~L115) holds the `ClientSession` open and parks on `await self._closing_ev.wait()`. When the server process dies the transport streams close underneath it, but nothing links stream death to that wait: the connection task stays parked forever (step 4 in the repro output, `task done=False`).
2. Because the task never unwinds, the `finally:` cleanup (~L138-141, `self._client = None`) never runs in this death mode, so the guarded `ToolError("Tool invocation failed: internal service is unavailable...")` (~L171, gated on `_client is None`) is never reached either.
3. Tool calls therefore fall through to `await self._client.call_tool(...)` (~L175) and raise a raw `anyio.ClosedResourceError` (empty message) into the agent layer, i.e. into the LLM's tool-call turn.
4. `initialized` (~L96) is just `self._client is not None`, so it keeps reporting True after death.
5. Nothing observes `_client_task` except `aclose()` (~L203-205), and nothing ever re-runs `initialize()`: no reconnect, no backoff, no health event. Tools registered on an `AgentSession` at startup remain permanently dead handles for the rest of the session.

### Expected Behavior

I expected some kind of reconnect attempt, or at least an error. Found "internal service is unavailable" in the ToolErrors sitting in mcp.py, but it never triggers when the server dies. Worse, it's a silent error, because `initialized` keeps saying True, and tool calls just throw anyio.ClosedResourceError with no message. Nothing worse than a silent error. At least it should fail loudly, or better yet it should still work.

### Reproduction Steps

1. Save the two files below and `pip install "livekit-agents[mcp]"`
2. `python repro.py`
3. Watch steps 4-6 of the output: the server process is dead, `initialized` still says True, and tool calls raise raw anyio errors

`dying_server.py` (a `ping` tool and a `crash` tool that exits the process, simulating any server death: crash, redeploy, OOM kill):

```python
import os
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("dying-server")

@mcp.tool()
def ping() -> str:
    """Returns pong."""
    return "pong"

@mcp.tool()
def crash() -> str:
    """Simulates the server process dying (crash, restart, OOM kill...)."""
    os._exit(1)

if __name__ == "__main__":
    mcp.run(transport="stdio")
```

`repro.py`:

```python
import asyncio, sys
from pathlib import Path

from livekit.agents.llm.mcp import MCPServerStdio

SERVER = str(Path(__file__).parent / "dying_server.py")

async def main() -> None:
    server = MCPServerStdio(command=sys.executable, args=[SERVER])
    await server.initialize()
    tools = await server.list_tools()
    print(f"1) initialized={server.initialized}, tools={[t.info.name for t in tools]}")

    ping = next(t for t in tools if t.info.name == "ping")
    crash = next(t for t in tools if t.info.name == "crash")

    print("2) ping ->", await ping(raw_arguments={}))

    try:
        await crash(raw_arguments={})   # the server process exits here
    except Exception as e:
        print(f"3) crash tool raised {type(e).__name__}: {str(e)[:60]}")

    await asyncio.sleep(1.0)
    print(f"4) one second after server death: initialized={server.initialized} "
          f"(connection task done={server._client_task.done()})")

    for n in (5, 6):
        try:
            await ping(raw_arguments={})
        except Exception as e:
            print(f"{n}) ping -> {type(e).__module__}.{type(e).__name__}: {str(e)[:70]!r}")
        await asyncio.sleep(0.5)

    print("7) no reconnect was attempted; initialized still", server.initialized)

asyncio.run(main())
```

Output:

```
1) initialized=True, tools=['ping', 'crash']
2) ping -> {"type":"text","text":"pong","annotations":null,"meta":null}
3) crash tool raised McpError: Connection closed
4) one second after server death: initialized=True (connection task done=False)
5) ping -> anyio.ClosedResourceError: ''
6) ping -> anyio.ClosedResourceError: ''
7) no reconnect was attempted; initialized still True
```

Step 4 is the heart of it: a full second after the server process is gone, the connection task has not observed the death.

### Operating System

Linux (repro is OS-independent)

### Models Used

None involved (pure MCP client lifecycle)

### Package Versions

```bash
livekit-agents 1.6.4 (also reproduced identically against main @ e2b2d09, 2026-07-02)
mcp: stdio transport (streamable-HTTP shares the same lifecycle path)
Python 3.10.12
```

### Session/Room/Call IDs

_No response_

### Proposed Solution

The fix seemed simple and partly already written, it just never runs. Something needs to watch the connection, then it can at least call the proper error, failing loudly. Once the error is tracked the obvious next step is reconnect.

### Additional Context

_No response_

### Screenshots and Recordings

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MCP server process death is never detected: no reconnect, `initialized` stays True, tool calls raise raw `anyio.ClosedResourceError` #6291

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

MCP server process death is never detected: no reconnect, initialized stays True, tool calls raise raw anyio.ClosedResourceError #6291

Description

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

MCP server process death is never detected: no reconnect, `initialized` stays True, tool calls raise raw `anyio.ClosedResourceError` #6291