@@ -10,7 +10,8 @@ running the suite before and after.
1010uv run --frozen pytest tests/interaction/
1111```
1212
13- The whole suite is in-memory and event-driven; it runs in about a second.
13+ The whole suite is in-process and event-driven — including the streamable HTTP, SSE, and OAuth
14+ flows — with a single subprocess test for stdio.
1415
1516## Ground rules
1617
@@ -26,10 +27,10 @@ The whole suite is in-memory and event-driven; it runs in about a second.
2627 the constants in ` mcp.types ` ; error * message strings* are pinned only where they are the
2728 SDK's own deliberate output.
2829- ** No sleeps, no real I/O.** Concurrency is coordinated with ` anyio.Event ` ; every wait that
29- could hang is bounded by ` anyio.fail_after(5) ` . The streamable HTTP tests drive the Starlette
30+ could hang is bounded by ` anyio.fail_after(5) ` . The HTTP and OAuth tests drive the Starlette
3031 app in-process through the suite's streaming ASGI bridge (` transports/_bridge.py ` ), which
3132 delivers each response chunk as the server produces it — full duplex, but still no sockets,
32- threads, or subprocesses anywhere.
33+ threads, or subprocesses anywhere outside the one stdio test .
3334
3435## Layout
3536
@@ -42,7 +43,8 @@ tests/interaction/
4243 test_coverage.py enforces the manifest ↔ test contract
4344 lowlevel/ one file per feature area, against the low-level Server
4445 mcpserver/ the same feature areas in MCPServer's natural idiom
45- transports/ behaviour specific to one transport (modes, streams, framing)
46+ transports/ behaviour specific to one transport (sessions, resumability, framing)
47+ auth/ OAuth flows against an in-process authorization server
4648```
4749
4850The two server APIs produce genuinely different wire output for the same conceptual feature
@@ -53,14 +55,15 @@ test body — each directory pins its flavour's true output exactly.
5355### The transport matrix
5456
5557Transport-agnostic tests take the ` connect ` fixture instead of constructing ` Client(server) `
56- directly, and therefore run once per transport: over the in-memory transport and over the
57- server's real streamable HTTP app driven in process through the streaming bridge. A test connects
58- the same way in either case — ` async with connect(server, ...) as client: ` — and asserts the same
59- output, because the transport is not supposed to change observable behaviour. Tests that are tied
60- to one transport do not use the fixture: the wire-recording tests (their seam is the in-memory
61- stream pair), the bare-` ClientSession ` lifecycle tests, the real-clock timeout tests (the timeout
62- machinery is transport-independent and must not race transport latency), and everything under
63- ` transports/ ` , which pins behaviour only observable on that transport.
58+ directly, and therefore run once per transport: over the in-memory transport, over the server's
59+ real streamable HTTP app driven in-process through the streaming bridge, and over the legacy SSE
60+ transport the same way. A test connects with ` async with connect(server, ...) as client: ` and
61+ asserts the same output on every leg, because the transport is not supposed to change observable
62+ behaviour. Tests that are tied to one transport do not use the fixture: the wire-recording tests
63+ (their seam is the in-memory stream pair), the bare-` ClientSession ` lifecycle tests, the
64+ real-clock timeout tests (the timeout machinery is transport-independent and must not race
65+ transport latency), and everything under ` transports/ ` , which pins behaviour only observable on
66+ that transport.
6467
6568A transport conformance test in ` transports/ ` speaks raw ` httpx ` against the mounted ASGI app
6669** only** when its assertion is about HTTP semantics that ` Client ` cannot observe — status codes,
@@ -86,9 +89,10 @@ clients can share one session manager.
8689 contract) says should happen. Tests always pin the SDK's current behaviour; where that falls
8790 short of ` behavior ` , the gap is recorded as data rather than hidden in the test.
8891- ** ` divergence ` ** records that gap for entries whose tests pin the divergent current behaviour.
89- - ** ` deferred ` ** marks a behaviour that is tracked but not yet covered by a test in this suite.
90- The reason names the covering tests elsewhere in the repo, starts with "Not implemented in the
91- SDK" for genuine feature gaps, or starts with "Not yet covered here" for tests that are planned.
92+ - ** ` deferred ` ** marks a behaviour that is tracked but has no test in this suite, with a precise
93+ reason: the SDK does not implement it, the negative cannot be observed, the assertion is
94+ schema-level rather than interaction-level, the feature is experimental (tasks), or the test
95+ would require real-time waits the suite refuses.
9296- ** ` transports ` ** names the transports a behaviour applies to; omitted means transport-independent.
9397- ** ` issue ` ** carries the tracking link for a recorded gap once one is filed.
9498
@@ -168,6 +172,15 @@ async def test_call_tool_returns_text_content() -> None:
168172 act → assert. The test reads in the order the conversation happens.
169173- A registered handler or tool that a test never invokes gets a ` raise NotImplementedError ` body
170174 so it cannot silently become load-bearing.
175+ - A test that needs a peer no real ` Server ` or ` Client ` can play (a server that answers initialize
176+ with an unsupported version, a client that sends malformed params) plays that side of the wire by
177+ hand over ` create_client_server_memory_streams() ` . This scripted-peer pattern is the suite's only
178+ way to drive behaviour the typed API cannot produce, and the docstring of every such test says so.
179+
180+ Stack a second ` @requirement ` decorator only when a test's natural assertions incidentally prove
181+ another behaviour — one capabilities snapshot proving four ` *:capability:declared ` entries, one
182+ input-schema identity check proving each preserved keyword. Do not build a test around covering
183+ many requirements at once; if the assertions would be separate, write separate tests.
171184
172185### Choosing an assertion
173186
0 commit comments