Python: [BREAKING] Refactor middleware layering and split Anthropic raw client#4746
Python: [BREAKING] Refactor middleware layering and split Anthropic raw client#4746eavanvalkenburg wants to merge 9 commits intomicrosoft:mainfrom
Conversation
Reorder chat client layers so function invocation wraps chat middleware, and chat middleware stays outside telemetry while still running for each inner model call. Add middleware pipeline caching, refresh docs and samples, and split Anthropic into raw and public clients to match the standard layering model. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add targeted typing ignores in workflow visualization and lab modules so pyright stays clean alongside the middleware refactor work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Refactors the Python chat-client layer stack so chat middleware executes around each inner model call in the tool loop (without inflating telemetry spans), standardizes per-call middleware routing via client_kwargs["middleware"], and aligns Anthropic with the Raw...Client/public client split used by other providers.
Changes:
- Reorders the standard public client MRO to
FunctionInvocationLayer -> ChatMiddlewareLayer -> ChatTelemetryLayer -> Raw/Base clientacross providers, tests, and docs. - Moves chat/function/agent middleware pipeline construction + reuse into the relevant layers (pipeline caching).
- Splits Anthropic into
RawAnthropicClient+AnthropicClient, and updates docs/samples (including a per-call usage tracking middleware sample).
Reviewed changes
Copilot reviewed 39 out of 39 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| python/samples/02-agents/providers/custom/README.md | Updates recommended layer ordering and composition example. |
| python/samples/02-agents/observability/advanced_zero_code.py | Clarifies telemetry vs middleware placement and span behavior. |
| python/samples/02-agents/observability/advanced_manual_setup_console_output.py | Updates observability guidance to the new default layer order. |
| python/samples/02-agents/middleware/usage_tracking_middleware.py | Adds a sample showing per-call usage tracking for streaming and non-streaming tool loops. |
| python/samples/02-agents/middleware/agent_and_run_level_middleware.py | Updates middleware ordering explanation to reflect per-call chat middleware behavior. |
| python/samples/02-agents/middleware/README.md | Adds index + instructions for running the new usage tracking sample. |
| python/samples/02-agents/chat_client/custom_chat_client.py | Updates sample to recommend and demonstrate the new layer order. |
| python/samples/02-agents/auto_retry.py | Updates retry middleware docs to reflect “per model call” semantics in tool loops. |
| python/packages/orchestrations/tests/test_handoff.py | Updates mock client MRO ordering to match the new layering. |
| python/packages/ollama/agent_framework_ollama/_chat_client.py | Reorders Ollama client inheritance to match the standard layer stack. |
| python/packages/lab/lightning/agent_framework_lab_lightning/init.py | Adjusts typing ignores around optional agentlightning dependency. |
| python/packages/lab/gaia/agent_framework_lab_gaia/gaia.py | Adds typing ignore for optional pyarrow import. |
| python/packages/foundry_local/agent_framework_foundry_local/_foundry_local_client.py | Reorders FoundryLocal client inheritance to match the standard layer stack. |
| python/packages/core/tests/core/test_observability.py | Updates span-ordering test expectations + mock client MRO. |
| python/packages/core/tests/core/test_middleware_with_chat.py | Updates per-call middleware passing to client_kwargs["middleware"] and adds pipeline cache tests. |
| python/packages/core/tests/core/test_middleware_with_agent.py | Adds agent middleware pipeline cache tests and tool-loop ordering coverage. |
| python/packages/core/tests/core/test_kwargs_propagation_to_ai_function.py | Reorders mock client inheritance to match the new standard stack. |
| python/packages/core/tests/core/test_function_invocation_logic.py | Switches per-call middleware usage to client_kwargs["middleware"] and updates mock init args. |
| python/packages/core/tests/core/test_clients.py | Updates docstring/signature expectations (removal of function_middleware references). |
| python/packages/core/tests/core/conftest.py | Updates mock client initialization to use middleware=[] and new MRO order. |
| python/packages/core/agent_framework/openai/_responses_client.py | Updates OpenAI Responses client MRO + raw-client layering docstring. |
| python/packages/core/agent_framework/openai/_chat_client.py | Updates OpenAI Chat client MRO + removes function_middleware param and routes middleware via client_kwargs. |
| python/packages/core/agent_framework/openai/_assistants_client.py | Updates assistants client MRO ordering to the new standard stack. |
| python/packages/core/agent_framework/observability.py | Adds typing ignores around optional OpenTelemetry exporter imports. |
| python/packages/core/agent_framework/azure/_responses_client.py | Updates Azure responses client MRO ordering. |
| python/packages/core/agent_framework/azure/_chat_client.py | Updates Azure chat client MRO ordering. |
| python/packages/core/agent_framework/_workflows/_viz.py | Adds typing ignores around optional graphviz usage. |
| python/packages/core/agent_framework/_tools.py | Refactors FunctionInvocationLayer to route middleware via client_kwargs["middleware"] and cache function middleware pipelines. |
| python/packages/core/agent_framework/_middleware.py | Adds pipeline .matches() and caching for chat/function/agent middleware pipelines; adjusts ChatMiddlewareLayer init/signature. |
| python/packages/core/agent_framework/_clients.py | Updates layered docstring composition to reflect new layer responsibilities and kwarg docs. |
| python/packages/bedrock/agent_framework_bedrock/_chat_client.py | Reorders Bedrock client inheritance to match the standard layer stack. |
| python/packages/azure-ai/tests/test_azure_ai_agent_client.py | Updates test fixtures for new middleware fields/caches. |
| python/packages/azure-ai/agent_framework_azure_ai/_client.py | Updates raw-client layering docstring + public client MRO. |
| python/packages/azure-ai/agent_framework_azure_ai/_chat_client.py | Updates Azure AI agent client MRO ordering. |
| python/packages/anthropic/tests/test_anthropic_client.py | Updates tests for RawAnthropicClient split and standard layer stack. |
| python/packages/anthropic/agent_framework_anthropic/_chat_client.py | Introduces RawAnthropicClient and composes AnthropicClient with standard layering. |
| python/packages/anthropic/agent_framework_anthropic/init.py | Exports RawAnthropicClient. |
| python/packages/ag-ui/tests/ag_ui/conftest.py | Updates streaming stub MRO and init arg (middleware=[]). |
| python/packages/ag-ui/agent_framework_ag_ui/_client.py | Updates AG-UI chat client MRO ordering. |
You can also share your feedback on Copilot code review. Take the survey.
eavanvalkenburg
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 88%
✓ Correctness
This is a well-structured refactoring that reorders the MRO so FunctionInvocationLayer sits above ChatMiddlewareLayer, moves middleware categorization responsibility into FunctionInvocationLayer, adds single-slot pipeline caching, splits the Anthropic client into Raw and full-featured classes, and removes the public
function_middlewareparameter in favor of unifiedmiddleware. The code paths are internally consistent: FunctionInvocationLayer.init and get_response correctly categorize mixed middleware, keep function middleware for itself, and forward chat middleware via kwargs; ChatMiddlewareLayer correctly receives only chat middleware; the pipeline caching correctly uses identity-based tuple comparison viamatches(). Tests thoroughly cover the new layer ordering, caching, and runtime middleware splitting. No correctness bugs found.
✓ Security Reliability
This is a large structural refactoring that reorders the MRO for all chat client classes, splits middleware routing so FunctionInvocationLayer owns the combined middleware parameter, adds single-slot pipeline caching, and introduces RawAnthropicClient. From a security/reliability perspective, the changes are sound: no injection risks, no secrets exposure, no unsafe deserialization. The caching logic is safe for single-threaded async (no await between check and set). The one robustness concern is that
categorize_middlewareonly checksisinstance(source, list)when unpacking sequences, but the newFunctionInvocationLayer.__init__andget_responsenow pass whole sequences (not unpacked) — if a caller passes atuple(validSequence) instead of alist, the middleware items would not be categorized correctly and would silently fall through to the wrong bucket or fail classification.
✓ Design Approach
The diff correctly inverts the layer order from
ChatMiddlewareLayer → FunctionInvocationLayer → ChatTelemetryLayertoFunctionInvocationLayer → ChatMiddlewareLayer → ChatTelemetryLayerso that chat middleware runs once per model call (inside the tool loop) rather than once per entire tool-call chain. The change is well-motivated and consistently applied across all clients. TheRawAnthropicClientsplit follows the existingRawOpenAI*Clientpattern. The per-pipeline caching withmatches()is a reasonable optimization. One structural concern in the new test is that MRO index assertions are fragile: they encode exact positional information about the class hierarchy rather than the behavioral ordering property being tested. A second, minor concern is that the single-slotmatches()cache uses==(value equality) — if any middleware object ever defines a custom__eq__, two structurally different pipelines could incorrectly share a cached instance. Neither issue is blocking, but the MRO test pattern is worth addressing proactively given that similar clients (e.g.,AzureOpenAIChatClient) already have extra mixins that would shift those indices.
Suggestions
- New call sites in
FunctionInvocationLayer.__init__andget_responsepass the middleware sequence directly tocategorize_middleware(middleware)instead of unpacking it (*(middleware or [])). Sincecategorize_middlewareonly checksisinstance(source, list), atuple(a validSequence) would be appended as a single item and silently misclassified. Either unpack at call sites (categorize_middleware(*(middleware or []))) or broaden the check incategorize_middlewareto handle allSequencetypes (e.g.,isinstance(source, (list, tuple))orcollections.abc.Sequencewith astrexclusion). - The single-slot pipeline caches (
_cached_chat_middleware_pipeline, etc.) are adequate for the common case but will thrash if concurrent async tasks use different per-call middleware, and would race under multi-threaded usage. Since the worst case is pipeline recreation (not incorrectness), consider adding a brief code comment documenting the single-threaded async assumption for future readers. - In
test_anthropic_client_wraps_raw_client_with_standard_layer_order, prefer relative ordering checks viamro.index()over hardcoded MRO indices. Clients likeAzureOpenAIChatClientalready have extra mixins that would shift indices, so assertingmro.index(FunctionInvocationLayer) < mro.index(ChatMiddlewareLayer) < mro.index(ChatTelemetryLayer) < mro.index(RawAnthropicClient)expresses the intent without breaking when a new mixin is inserted. - The
matches()method usesself._source_middleware == tuple(middleware), which relies on the default identity-based__eq__of middleware objects. This is correct for all current middleware, but a brief comment explaining the assumption would prevent incorrect cache reuse if a future middleware class defines value-based__eq__.
Automated review by eavanvalkenburg's agents
…RO assertions - Broaden isinstance check in categorize_middleware from list to Sequence so tuples and other Sequence types are properly unpacked instead of being appended as a single item. - Replace fragile hardcoded MRO index assertions in anthropic test with relative ordering via mro.index(). - Add regression tests for categorize_middleware with tuple, list, and None inputs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 87%
✗ Security Reliability
This PR refactors the MRO ordering of middleware layers (swapping FunctionInvocationLayer above ChatMiddlewareLayer), extracts a RawAnthropicClient, adds pipeline caching for middleware, and moves middleware routing responsibilities between layers. The changes are largely structural and well-tested. One reliability concern: the
categorize_middlewarechange fromisinstance(source, list)toisinstance(source, Sequence)will also matchstrandbytes, which are sequences of characters/ints — if a string were ever passed, it would be silently decomposed character-by-character into the middleware list. While type annotations make this unlikely at runtime, a defensive guard is warranted. The pipeline caching is single-slot (one cached value per client instance) which is correct for the common case but worth noting. No security issues (no secrets, no injection risks, no unsafe deserialization) were found.
✓ Test Coverage
This is a large refactoring that reorders the MRO so FunctionInvocationLayer comes before ChatMiddlewareLayer, extracts RawAnthropicClient, adds middleware pipeline caching, and moves function-middleware routing into FunctionInvocationLayer. The core behavioral changes are well-tested: new integration tests verify the tool-loop middleware ordering, pipeline caching is covered for all three pipeline types, and categorize_middleware's Sequence support is tested. However, the MRO reordering was applied to ~12 client classes but only the Anthropic client has an explicit MRO ordering test; the other reordered clients (OpenAI, Azure, Bedrock, Ollama, FoundryLocal, etc.) lack similar guards. RawAnthropicClient is newly exported but has no test exercising it independently.
✗ Design Approach
The layer reordering (FunctionInvocationLayer → ChatMiddlewareLayer → ChatTelemetryLayer) is correct and a genuine semantic improvement: chat middleware now executes per model-call within the tool loop rather than wrapping the entire loop. However, the PR has one blocking design issue: removing
middleware=as a first-class named parameter fromFunctionInvocationLayer.get_responsecreates a provider-inconsistent API.OpenAIChatClientre-addsmiddleware=via its ownget_responseoverride and converts it toclient_kwargs["middleware"], but every other provider (Anthropic, Bedrock, Ollama, Azure AI Agent, Foundry Local, AG-UI) has no such override. Callers who passmiddleware=[...]directly to those clients will have their middleware silently ignored — it flows through**kwargspastFunctionInvocationLayerandChatMiddlewareLayer(both of which only inspectclient_kwargs["middleware"]), and is eventually stripped by the raw client. This also introduces a leaky abstraction:client_kwargsis documented as provider-specific HTTP-client options, yet it is now the inter-layer messaging channel for middleware threading, with bothFunctionInvocationLayerandChatMiddlewareLayersilently popping a magic"middleware"key from it.
Flagged Issues
- categorize_middleware: changing
isinstance(source, list)toisinstance(source, Sequence)also matchesstrandbytes, which would silently decompose a string into individual characters. Addand not isinstance(source, (str, bytes))to guard against accidental misuse. - middleware= is removed as a first-class named parameter from FunctionInvocationLayer.get_response. Only OpenAIChatClient re-adds it via its own override; AnthropicClient, BedrockChatClient, OllamaChatClient, AzureAIAgentClient, FoundryLocalClient, and AGUIChatClient do not. Calling get_response(messages, middleware=[...]) on those clients silently does nothing—the kwarg flows through **kwargs past FunctionInvocationLayer and ChatMiddlewareLayer (both only read from client_kwargs["middleware"]) and is stripped by the raw provider call. Fix by adding middleware as an explicit named parameter on FunctionInvocationLayer.get_response so the conversion to client_kwargs["middleware"] happens uniformly for all providers.
Suggestions
- The single-slot pipeline cache stores only the most recent pipeline per client instance. Callers that alternate between different per-call middleware sets will thrash the cache with no benefit. Consider a small bounded LRU (e.g. up to 4 entries keyed by tuple identity) or at minimum document the single-slot limitation.
- The
matches()methods on pipeline classes comparetuple(middleware)using==, relying on middleware object identity/equality. This is correct but worth documenting so users aren't surprised that semantically-equivalent but distinct middleware objects don't hit the cache. - Add MRO ordering tests for other reordered clients (OpenAIChatClient, OpenAIResponsesClient, AzureOpenAIChatClient, AzureOpenAIResponsesClient, AzureAIClient, AzureAIAgentClient, BedrockChatClient, OllamaChatClient, FoundryLocalClient, OpenAIAssistantsClient). A single parameterized test asserting FunctionInvocationLayer < ChatMiddlewareLayer < ChatTelemetryLayer ordering across all public clients would be a lightweight guard against MRO regressions.
- Add a test verifying RawAnthropicClient does NOT inherit FunctionInvocationLayer, ChatMiddlewareLayer, or ChatTelemetryLayer, matching the pattern already established by RawOpenAIChatClient.
- The chat middleware pipeline cache test only exercises the path where
chat_client_base.chat_middlewareis empty. Add a case with non-empty base chat middleware (analogous to the function middleware cache test) to verify the cache key correctly includes base middleware. - client_kwargs is documented as provider-specific HTTP-client options but is now also used as an inter-layer messaging channel for middleware routing. This conflates two concerns and means a provider option legitimately named "middleware" would silently interfere. Consider a dedicated internal dict (e.g. _layer_kwargs) for cross-layer communication.
Automated review by moonbox3's agents
…InvocationLayer, and add tests (microsoft#4710) - Guard categorize_middleware Sequence check against str/bytes to prevent character-by-character decomposition of accidentally passed strings - Add explicit middleware parameter to FunctionInvocationLayer.get_response and merge it into client_kwargs before categorization, fixing the inconsistency where only OpenAIChatClient supported this parameter - Add assertions that RawAnthropicClient does not inherit convenience layers - Add chat middleware cache test with non-empty base middleware - Add tests for single unwrapped middleware item and string input Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TaoChenOSU
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 75%
✓ Correctness
✓ Security Reliability
This is a large structural refactoring that reorders the MRO of middleware layers (FunctionInvocationLayer now outermost, ChatMiddlewareLayer in the middle, ChatTelemetryLayer innermost), extracts RawAnthropicClient, and adds single-slot pipeline caching with a
matches()method. The security posture is largely unchanged. The main reliability concern is an inconsistent type check inFunctionInvocationLayer.get_responsewhere the middleware-merging logic checksisinstance(existing, list)but thecategorize_middlewarefunction was simultaneously fixed to handle allSequencetypes. If a caller passes middleware as a tuple viaclient_kwargsand also provides themiddlewareparameter, the tuple would be wrapped as a single item instead of being unpacked. The practical impact is low because the dual-source scenario is unlikely in normal framework usage, but it is a latent defect that contradicts the intent of the companioncategorize_middlewarefix.
✓ Test Coverage
This PR refactors the MRO ordering to place FunctionInvocationLayer before ChatMiddlewareLayer across all clients, splits RawAnthropicClient from AnthropicClient, adds middleware pipeline caching, and routes combined middleware through FunctionInvocationLayer. Test coverage is generally good: new tests verify MRO ordering for Anthropic, pipeline caching for chat/function/agent middleware, combined middleware with tool loops, and the categorize_middleware tuple/string fix. However, there are a few gaps: the string-in-categorize_middleware test has a weak assertion, there's no streaming variant of the new combined-middleware-with-tool-loop tests, and several providers (Bedrock, Ollama, FoundryLocal, OpenAI Assistants, Azure OpenAI) that received MRO changes have no corresponding MRO-ordering tests analogous to the Anthropic one.
✗ Design Approach
This PR correctly reverses the layer ordering to
FunctionInvocationLayer → ChatMiddlewareLayer → ChatTelemetryLayerso chat middleware wraps each inner model call in a tool loop rather than the entire loop. The core architectural change is sound. However, the new middleware-merging code added insideFunctionInvocationLayer.get_responsecontains the sameisinstance(existing, list)narrowness thatcategorize_middlewarewas simultaneously fixed to avoid, creating a gap where non-list sequences (e.g., tuples) passed inclient_kwargs["middleware"]would be silently wrapped as a single item rather than expanded. Additionally, several tests were changed from the newly-typedmiddleware=parameter to the untypedclient_kwargs={"middleware": ...}bypass, which is inconsistent with the typed overloads added onFunctionInvocationLayer.get_responseand makes the API intent harder to follow.
Flagged Issues
- In
FunctionInvocationLayer.get_response, the merging of the per-callmiddlewareparam intoeffective_client_kwargsusesisinstance(existing, list)to decide whether to spread or wrap the pre-existing value. This is the exact narrowness thatcategorize_middlewarewas fixed to avoid in the same PR (changed toisinstance(source, Sequence) and not isinstance(source, (str, bytes))). A tuple or other non-list sequence inclient_kwargs["middleware"]would be silently wrapped as a single element instead of being spread, corrupting the combined middleware list.
Suggestions
- The single-slot pipeline caches (
_cached_chat_middleware_pipeline,_cached_function_middleware_pipeline,_cached_agent_middleware_pipeline) only cache the most recent configuration. Consider documenting that passing different runtime middleware each call will always reconstruct the pipeline, so callers are aware of the caching limitations. - Add a streaming variant of
test_run_level_chat_and_function_middleware_split_per_function_loop_roundto verify chat middleware runs per model call in the streaming tool-loop path, sinceFunctionInvocationLayerhas separate streaming logic that also callssuper_get_responsewith**filtered_kwargs. - Consider adding MRO-ordering tests similar to
test_anthropic_client_wraps_raw_client_with_standard_layer_orderfor other providers (OpenAIChatClient, OpenAIResponsesClient, AzureOpenAIChatClient, BedrockChatClient, etc.) to guard against future MRO regressions. - The
test_categorize_middleware_with_string_does_not_decomposeassertion (total_items <= 1) is too loose — it passes whether the string is silently dropped or bucketed into 'agent'. Assert the exact expected behavior (e.g.,assert total_items == 1andassert result['agent'] == ['not_a_middleware']) to make the test precise. - The pipeline caching tests only cover the
_get_*_pipelinehelper methods directly but don't test that a fullget_responsecall actually reuses the cache on repeated calls with identical middleware, which would confirm end-to-end caching integration. - Several tests were changed from
get_response(..., middleware=[mw])toget_response(..., client_kwargs={"middleware": [mw]})even thoughFunctionInvocationLayer.get_responsenow exposes a typedmiddlewareparameter. Using the untypedclient_kwargsbypass undermines discoverability of the typed API. The tests intest_function_invocation_logic.py(lines 3226, 3292, 3345) andtest_middleware_with_chat.pyshould use the typed parameter unless they are deliberately testing theclient_kwargscode path, in which case a clarifying comment is warranted.
Automated review by TaoChenOSU's agents
Motivation and Context
Fixes #4710.
This change clarifies the separation of concerns between agent middleware, chat middleware, function invocation, and telemetry. Chat middleware now runs for each inner model call in the function loop while remaining outside telemetry so middleware latency does not skew per-call timings. It also covers the related per-call usage-tracking scenario discussed in #4671 and aligns Anthropic with the raw/public client pattern used elsewhere.
Description
FunctionInvocationLayer -> ChatMiddlewareLayer -> ChatTelemetryLayer -> Raw/Base clientChatMiddlewareLayerand add matching cache reuse for agent and function middleware pipelinesclient_kwargs["middleware"]and remove stalefunction_middlewaredocstring/runtime referencesRawAnthropicClientplus publicAnthropicClient, and include the small ancillary typing cleanups already present on the branchContribution Checklist