Python: Fail closed on remote MCP schema drift#4724
Python: Fail closed on remote MCP schema drift#4724davidahmann wants to merge 1 commit intomicrosoft:mainfrom
Conversation
|
@davidahmann please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
1 similar comment
|
@davidahmann please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
|
Operator impact: remote MCP failures caused by missing tools or schema drift now fail with stable classifications instead of opaque transport-specific text, and Python resets no longer keep stale tool definitions around. Minimal change: Python now clears cached MCP tools/prompts on reset and classifies missing-tool/schema-mismatch failures; .NET now applies the same classification helper and adds unit coverage. Validation:
Risk/blocker: the .NET unit test project is blocked locally because |
0e40e12 to
8ff86bc
Compare
| def _classify_mcp_tool_failure(message: str) -> str | None: | ||
| lowered = message.lower() | ||
|
|
||
| if "tool not found" in lowered or "unknown tool" in lowered or "no tool named" in lowered: |
There was a problem hiding this comment.
this feels very brittle, and is likely to get out of date as the library evolves, is there no better way to understand what type of McpError is raised (subclass or code)?
Motivation and Context
Remote MCP tools can disappear or change schema after a workflow has already loaded them. Today that collapses into opaque MCP failures rather than a stable workflow-visible classification, and the Python reconnect path also keeps stale tool definitions around after reset. Refs #4723.
Description
This tightens the fail-closed behavior in both runtime surfaces touched by the issue:
Validated with
uv run --directory packages/core ruff format agent_framework/_mcp.py tests/core/test_mcp.py,uv run --directory packages/core ruff check agent_framework/_mcp.py tests/core/test_mcp.py, anduv run --directory packages/core pytest tests/core/test_mcp.py -k 'classify_mcp_tool_failure or schema_drift or local_mcp_server_function_execution_error'.I could not run the .NET unit test project locally because the repo is pinned to SDK
10.0.200indotnet/global.json, while this environment only has9.0.109installed.Contribution Checklist