Skip to content

Commit 0157444

Browse files
committed
docs: update interaction suite README for transports, auth, and decorator stacking
1 parent cec4a2d commit 0157444

1 file changed

Lines changed: 28 additions & 15 deletions

File tree

tests/interaction/README.md

Lines changed: 28 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@ running the suite before and after.
1010
uv run --frozen pytest tests/interaction/
1111
```
1212

13-
The whole suite is in-memory and event-driven; it runs in about a second.
13+
The whole suite is in-process and event-driven — including the streamable HTTP, SSE, and OAuth
14+
flows — with a single subprocess test for stdio.
1415

1516
## Ground rules
1617

@@ -26,10 +27,10 @@ The whole suite is in-memory and event-driven; it runs in about a second.
2627
the constants in `mcp.types`; error *message strings* are pinned only where they are the
2728
SDK's own deliberate output.
2829
- **No sleeps, no real I/O.** Concurrency is coordinated with `anyio.Event`; every wait that
29-
could hang is bounded by `anyio.fail_after(5)`. The streamable HTTP tests drive the Starlette
30+
could hang is bounded by `anyio.fail_after(5)`. The HTTP and OAuth tests drive the Starlette
3031
app in-process through the suite's streaming ASGI bridge (`transports/_bridge.py`), which
3132
delivers each response chunk as the server produces it — full duplex, but still no sockets,
32-
threads, or subprocesses anywhere.
33+
threads, or subprocesses anywhere outside the one stdio test.
3334

3435
## Layout
3536

@@ -42,7 +43,8 @@ tests/interaction/
4243
test_coverage.py enforces the manifest ↔ test contract
4344
lowlevel/ one file per feature area, against the low-level Server
4445
mcpserver/ the same feature areas in MCPServer's natural idiom
45-
transports/ behaviour specific to one transport (modes, streams, framing)
46+
transports/ behaviour specific to one transport (sessions, resumability, framing)
47+
auth/ OAuth flows against an in-process authorization server
4648
```
4749

4850
The two server APIs produce genuinely different wire output for the same conceptual feature
@@ -53,14 +55,15 @@ test body — each directory pins its flavour's true output exactly.
5355
### The transport matrix
5456

5557
Transport-agnostic tests take the `connect` fixture instead of constructing `Client(server)`
56-
directly, and therefore run once per transport: over the in-memory transport and over the
57-
server's real streamable HTTP app driven in process through the streaming bridge. A test connects
58-
the same way in either case — `async with connect(server, ...) as client:` — and asserts the same
59-
output, because the transport is not supposed to change observable behaviour. Tests that are tied
60-
to one transport do not use the fixture: the wire-recording tests (their seam is the in-memory
61-
stream pair), the bare-`ClientSession` lifecycle tests, the real-clock timeout tests (the timeout
62-
machinery is transport-independent and must not race transport latency), and everything under
63-
`transports/`, which pins behaviour only observable on that transport.
58+
directly, and therefore run once per transport: over the in-memory transport, over the server's
59+
real streamable HTTP app driven in-process through the streaming bridge, and over the legacy SSE
60+
transport the same way. A test connects with `async with connect(server, ...) as client:` and
61+
asserts the same output on every leg, because the transport is not supposed to change observable
62+
behaviour. Tests that are tied to one transport do not use the fixture: the wire-recording tests
63+
(their seam is the in-memory stream pair), the bare-`ClientSession` lifecycle tests, the
64+
real-clock timeout tests (the timeout machinery is transport-independent and must not race
65+
transport latency), and everything under `transports/`, which pins behaviour only observable on
66+
that transport.
6467

6568
A transport conformance test in `transports/` speaks raw `httpx` against the mounted ASGI app
6669
**only** when its assertion is about HTTP semantics that `Client` cannot observe — status codes,
@@ -86,9 +89,10 @@ clients can share one session manager.
8689
contract) says should happen. Tests always pin the SDK's current behaviour; where that falls
8790
short of `behavior`, the gap is recorded as data rather than hidden in the test.
8891
- **`divergence`** records that gap for entries whose tests pin the divergent current behaviour.
89-
- **`deferred`** marks a behaviour that is tracked but not yet covered by a test in this suite.
90-
The reason names the covering tests elsewhere in the repo, starts with "Not implemented in the
91-
SDK" for genuine feature gaps, or starts with "Not yet covered here" for tests that are planned.
92+
- **`deferred`** marks a behaviour that is tracked but has no test in this suite, with a precise
93+
reason: the SDK does not implement it, the negative cannot be observed, the assertion is
94+
schema-level rather than interaction-level, the feature is experimental (tasks), or the test
95+
would require real-time waits the suite refuses.
9296
- **`transports`** names the transports a behaviour applies to; omitted means transport-independent.
9397
- **`issue`** carries the tracking link for a recorded gap once one is filed.
9498

@@ -168,6 +172,15 @@ async def test_call_tool_returns_text_content() -> None:
168172
act → assert. The test reads in the order the conversation happens.
169173
- A registered handler or tool that a test never invokes gets a `raise NotImplementedError` body
170174
so it cannot silently become load-bearing.
175+
- A test that needs a peer no real `Server` or `Client` can play (a server that answers initialize
176+
with an unsupported version, a client that sends malformed params) plays that side of the wire by
177+
hand over `create_client_server_memory_streams()`. This scripted-peer pattern is the suite's only
178+
way to drive behaviour the typed API cannot produce, and the docstring of every such test says so.
179+
180+
Stack a second `@requirement` decorator only when a test's natural assertions incidentally prove
181+
another behaviour — one capabilities snapshot proving four `*:capability:declared` entries, one
182+
input-schema identity check proving each preserved keyword. Do not build a test around covering
183+
many requirements at once; if the assertions would be separate, write separate tests.
171184

172185
### Choosing an assertion
173186

0 commit comments

Comments
 (0)