Python: [BREAKING] Refactor middleware layering and split Anthropic raw client by eavanvalkenburg · Pull Request #4746 · microsoft/agent-framework

eavanvalkenburg · 2026-03-17T16:14:29Z

Motivation and Context

This change clarifies the separation of concerns between agent middleware, chat middleware, function invocation, and telemetry. Chat middleware now runs for each inner model call in the function loop while remaining outside telemetry so middleware latency does not skew per-call timings. It also covers the related per-call usage-tracking scenario discussed in #4671 and aligns Anthropic with the raw/public client pattern used elsewhere.

Description

reorder layered chat clients so the public stack is FunctionInvocationLayer -> ChatMiddlewareLayer -> ChatTelemetryLayer -> Raw/Base client
move chat pipeline construction/caching into ChatMiddlewareLayer and add matching cache reuse for agent and function middleware pipelines
standardize mixed per-call middleware routing on client_kwargs["middleware"] and remove stale function_middleware docstring/runtime references
add and update middleware/observability samples and docs to reflect the new behavior, including a usage-tracking sample that shows per-call middleware behavior in streaming and non-streaming runs
split Anthropic into RawAnthropicClient plus public AnthropicClient, and include the small ancillary typing cleanups already present on the branch

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Reorder chat client layers so function invocation wraps chat middleware, and chat middleware stays outside telemetry while still running for each inner model call. Add middleware pipeline caching, refresh docs and samples, and split Anthropic into raw and public clients to match the standard layering model. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add targeted typing ignores in workflow visualization and lab modules so pyright stays clean alongside the middleware refactor work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

markwallace-microsoft · 2026-03-17T16:17:35Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/ag-ui/agent_framework_ag_ui
_client.py	147	13	91%	85–86, 90–94, 98–102, 262
packages/anthropic/agent_framework_anthropic
_chat_client.py	431	35	91%	435, 438, 519, 606, 608, 758, 785–786, 864, 894–895, 940, 956–957, 964–966, 970–972, 976–979, 1093, 1103, 1155, 1276, 1303–1304, 1321, 1334, 1347, 1372–1373
packages/azure-ai/agent_framework_azure_ai
_chat_client.py	489	78	84%	415–416, 418, 602, 607–608, 610–611, 614, 617, 619, 624, 885–886, 888, 891, 894, 897–902, 905, 907, 915, 927–929, 933, 936–937, 945–948, 958, 966–969, 971–972, 974–975, 982, 990–991, 999–1012, 1017, 1020, 1028, 1034, 1042–1044, 1047, 1067–1068, 1201, 1228, 1243, 1359, 1409, 1415
_client.py	405	49	87%	214, 392, 394, 442, 452–456, 458–464, 477, 538, 553–558, 600, 616–617, 643–644, 662, 697, 699, 720–721, 724, 769, 772, 774, 865, 896, 938, 1147, 1150, 1153–1154, 1156–1159
packages/bedrock/agent_framework_bedrock
_chat_client.py	380	100	73%	298–299, 315–324, 329, 345–351, 354–355, 363, 380, 389, 400, 402, 404, 422–423, 444, 457, 469, 472, 480–481, 484–485, 487–488, 493–495, 497, 507–508, 530, 537, 546–547, 549–550, 552–554, 556, 558–559, 565–567, 570–571, 577–580, 586–596, 599, 618, 623, 665–666, 679, 705, 717, 722, 731, 735, 743–744, 748, 750–757
packages/core/agent_framework
_clients.py	145	7	95%	339, 391, 554–557, 666
_middleware.py	366	16	95%	61, 64, 69, 794, 810, 812, 814, 947, 950, 977, 979, 1113, 1117, 1299, 1303, 1377
_tools.py	947	85	91%	191–192, 367, 369, 382, 411–413, 421, 439, 453, 460, 467, 484, 486, 493, 501, 536, 585, 589, 632–634, 636, 642, 687–689, 691, 714, 740, 782–784, 788, 810, 922–928, 964, 976, 978, 980, 983–986, 1007, 1011, 1015, 1029–1031, 1372, 1394, 1481–1487, 1616, 1620, 1666, 1785, 1815, 1835, 1837, 1893, 1956, 2140–2141, 2213–2214, 2269, 2342–2343, 2405, 2410, 2417
observability.py	693	25	96%	374–375, 402, 404–406, 409–411, 416–417, 423–424, 430–431, 721, 921–922, 1084, 1329–1330, 1576, 1758, 2142, 2144
packages/core/agent_framework/_workflows
_viz.py	175	14	92%	102–103, 117–119, 122–123, 126, 133, 153, 166, 179, 357, 418
packages/core/agent_framework/azure
_chat_client.py	87	3	96%	322–323, 334
_responses_client.py	54	6	88%	220, 268, 274–277
packages/core/agent_framework/openai
_assistants_client.py	329	36	89%	422, 424, 426, 429, 433–434, 437, 440, 445–446, 448, 451–453, 458, 469, 494, 496, 498, 500, 502, 507, 510, 513, 517, 528, 723, 809, 812, 831, 842, 879–882, 952
_chat_client.py	332	28	91%	212, 293–294, 298, 423, 430, 506–513, 515–518, 528, 606, 608, 625, 646, 674, 687, 711, 727, 841
_responses_client.py	821	126	84%	312–315, 319–320, 325–326, 336–337, 344, 359–365, 386, 394, 417, 514, 613, 672, 674, 676, 678, 746, 760, 840, 850, 855, 898, 977, 994, 1007, 1072, 1165, 1170, 1174–1176, 1180–1181, 1247, 1276, 1282, 1292, 1298, 1303, 1309, 1314–1315, 1376, 1398–1399, 1414–1415, 1433–1434, 1475–1478, 1640, 1695, 1697, 1777–1785, 1907, 1962, 1977, 1997–2007, 2020, 2031–2035, 2049, 2063–2074, 2083, 2115–2118, 2126–2127, 2129–2131, 2145–2147, 2157–2158, 2164, 2179
packages/foundry_local/agent_framework_foundry_local
_foundry_local_client.py	42	0	100%
packages/ollama/agent_framework_ollama
_chat_client.py	208	9	95%	394, 396, 407, 411–412, 420, 431, 510, 516
TOTAL	27162	3224	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
5289	20 💤	0 ❌	0 🔥	1m 29s ⏱️

Copilot

Pull request overview

Refactors the Python chat-client layer stack so chat middleware executes around each inner model call in the tool loop (without inflating telemetry spans), standardizes per-call middleware routing via client_kwargs["middleware"], and aligns Anthropic with the Raw...Client/public client split used by other providers.

Changes:

Reorders the standard public client MRO to FunctionInvocationLayer -> ChatMiddlewareLayer -> ChatTelemetryLayer -> Raw/Base client across providers, tests, and docs.
Moves chat/function/agent middleware pipeline construction + reuse into the relevant layers (pipeline caching).
Splits Anthropic into RawAnthropicClient + AnthropicClient, and updates docs/samples (including a per-call usage tracking middleware sample).

Reviewed changes

Copilot reviewed 39 out of 39 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
python/samples/02-agents/providers/custom/README.md	Updates recommended layer ordering and composition example.
python/samples/02-agents/observability/advanced_zero_code.py	Clarifies telemetry vs middleware placement and span behavior.
python/samples/02-agents/observability/advanced_manual_setup_console_output.py	Updates observability guidance to the new default layer order.
python/samples/02-agents/middleware/usage_tracking_middleware.py	Adds a sample showing per-call usage tracking for streaming and non-streaming tool loops.
python/samples/02-agents/middleware/agent_and_run_level_middleware.py	Updates middleware ordering explanation to reflect per-call chat middleware behavior.
python/samples/02-agents/middleware/README.md	Adds index + instructions for running the new usage tracking sample.
python/samples/02-agents/chat_client/custom_chat_client.py	Updates sample to recommend and demonstrate the new layer order.
python/samples/02-agents/auto_retry.py	Updates retry middleware docs to reflect “per model call” semantics in tool loops.
python/packages/orchestrations/tests/test_handoff.py	Updates mock client MRO ordering to match the new layering.
python/packages/ollama/agent_framework_ollama/_chat_client.py	Reorders Ollama client inheritance to match the standard layer stack.
python/packages/lab/lightning/agent_framework_lab_lightning/init.py	Adjusts typing ignores around optional `agentlightning` dependency.
python/packages/lab/gaia/agent_framework_lab_gaia/gaia.py	Adds typing ignore for optional `pyarrow` import.
python/packages/foundry_local/agent_framework_foundry_local/_foundry_local_client.py	Reorders FoundryLocal client inheritance to match the standard layer stack.
python/packages/core/tests/core/test_observability.py	Updates span-ordering test expectations + mock client MRO.
python/packages/core/tests/core/test_middleware_with_chat.py	Updates per-call middleware passing to `client_kwargs["middleware"]` and adds pipeline cache tests.
python/packages/core/tests/core/test_middleware_with_agent.py	Adds agent middleware pipeline cache tests and tool-loop ordering coverage.
python/packages/core/tests/core/test_kwargs_propagation_to_ai_function.py	Reorders mock client inheritance to match the new standard stack.
python/packages/core/tests/core/test_function_invocation_logic.py	Switches per-call middleware usage to `client_kwargs["middleware"]` and updates mock init args.
python/packages/core/tests/core/test_clients.py	Updates docstring/signature expectations (removal of `function_middleware` references).
python/packages/core/tests/core/conftest.py	Updates mock client initialization to use `middleware=[]` and new MRO order.
python/packages/core/agent_framework/openai/_responses_client.py	Updates OpenAI Responses client MRO + raw-client layering docstring.
python/packages/core/agent_framework/openai/_chat_client.py	Updates OpenAI Chat client MRO + removes `function_middleware` param and routes middleware via `client_kwargs`.
python/packages/core/agent_framework/openai/_assistants_client.py	Updates assistants client MRO ordering to the new standard stack.
python/packages/core/agent_framework/observability.py	Adds typing ignores around optional OpenTelemetry exporter imports.
python/packages/core/agent_framework/azure/_responses_client.py	Updates Azure responses client MRO ordering.
python/packages/core/agent_framework/azure/_chat_client.py	Updates Azure chat client MRO ordering.
python/packages/core/agent_framework/_workflows/_viz.py	Adds typing ignores around optional `graphviz` usage.
python/packages/core/agent_framework/_tools.py	Refactors `FunctionInvocationLayer` to route middleware via `client_kwargs["middleware"]` and cache function middleware pipelines.
python/packages/core/agent_framework/_middleware.py	Adds pipeline `.matches()` and caching for chat/function/agent middleware pipelines; adjusts ChatMiddlewareLayer init/signature.
python/packages/core/agent_framework/_clients.py	Updates layered docstring composition to reflect new layer responsibilities and kwarg docs.
python/packages/bedrock/agent_framework_bedrock/_chat_client.py	Reorders Bedrock client inheritance to match the standard layer stack.
python/packages/azure-ai/tests/test_azure_ai_agent_client.py	Updates test fixtures for new middleware fields/caches.
python/packages/azure-ai/agent_framework_azure_ai/_client.py	Updates raw-client layering docstring + public client MRO.
python/packages/azure-ai/agent_framework_azure_ai/_chat_client.py	Updates Azure AI agent client MRO ordering.
python/packages/anthropic/tests/test_anthropic_client.py	Updates tests for `RawAnthropicClient` split and standard layer stack.
python/packages/anthropic/agent_framework_anthropic/_chat_client.py	Introduces `RawAnthropicClient` and composes `AnthropicClient` with standard layering.
python/packages/anthropic/agent_framework_anthropic/init.py	Exports `RawAnthropicClient`.
python/packages/ag-ui/tests/ag_ui/conftest.py	Updates streaming stub MRO and init arg (`middleware=[]`).
python/packages/ag-ui/agent_framework_ag_ui/_client.py	Updates AG-UI chat client MRO ordering.

You can also share your feedback on Copilot code review. Take the survey.

python/packages/core/agent_framework/_tools.py

python/packages/core/agent_framework/openai/_chat_client.py

python/samples/02-agents/middleware/usage_tracking_middleware.py

eavanvalkenburg

Automated Code Review

Reviewers: 3 | Confidence: 88%

✓ Correctness

This is a well-structured refactoring that reorders the MRO so FunctionInvocationLayer sits above ChatMiddlewareLayer, moves middleware categorization responsibility into FunctionInvocationLayer, adds single-slot pipeline caching, splits the Anthropic client into Raw and full-featured classes, and removes the public function_middleware parameter in favor of unified middleware. The code paths are internally consistent: FunctionInvocationLayer.init and get_response correctly categorize mixed middleware, keep function middleware for itself, and forward chat middleware via kwargs; ChatMiddlewareLayer correctly receives only chat middleware; the pipeline caching correctly uses identity-based tuple comparison via matches(). Tests thoroughly cover the new layer ordering, caching, and runtime middleware splitting. No correctness bugs found.

✓ Security Reliability

This is a large structural refactoring that reorders the MRO for all chat client classes, splits middleware routing so FunctionInvocationLayer owns the combined middleware parameter, adds single-slot pipeline caching, and introduces RawAnthropicClient. From a security/reliability perspective, the changes are sound: no injection risks, no secrets exposure, no unsafe deserialization. The caching logic is safe for single-threaded async (no await between check and set). The one robustness concern is that categorize_middleware only checks isinstance(source, list) when unpacking sequences, but the new FunctionInvocationLayer.__init__ and get_response now pass whole sequences (not unpacked) — if a caller passes a tuple (valid Sequence) instead of a list, the middleware items would not be categorized correctly and would silently fall through to the wrong bucket or fail classification.

✓ Design Approach

The diff correctly inverts the layer order from ChatMiddlewareLayer → FunctionInvocationLayer → ChatTelemetryLayer to FunctionInvocationLayer → ChatMiddlewareLayer → ChatTelemetryLayer so that chat middleware runs once per model call (inside the tool loop) rather than once per entire tool-call chain. The change is well-motivated and consistently applied across all clients. The RawAnthropicClient split follows the existing RawOpenAI*Client pattern. The per-pipeline caching with matches() is a reasonable optimization. One structural concern in the new test is that MRO index assertions are fragile: they encode exact positional information about the class hierarchy rather than the behavioral ordering property being tested. A second, minor concern is that the single-slot matches() cache uses == (value equality) — if any middleware object ever defines a custom __eq__, two structurally different pipelines could incorrectly share a cached instance. Neither issue is blocking, but the MRO test pattern is worth addressing proactively given that similar clients (e.g., AzureOpenAIChatClient) already have extra mixins that would shift those indices.

Suggestions

New call sites in FunctionInvocationLayer.__init__ and get_response pass the middleware sequence directly to categorize_middleware(middleware) instead of unpacking it (*(middleware or [])). Since categorize_middleware only checks isinstance(source, list), a tuple (a valid Sequence) would be appended as a single item and silently misclassified. Either unpack at call sites (categorize_middleware(*(middleware or []))) or broaden the check in categorize_middleware to handle all Sequence types (e.g., isinstance(source, (list, tuple)) or collections.abc.Sequence with a str exclusion).
The single-slot pipeline caches (_cached_chat_middleware_pipeline, etc.) are adequate for the common case but will thrash if concurrent async tasks use different per-call middleware, and would race under multi-threaded usage. Since the worst case is pipeline recreation (not incorrectness), consider adding a brief code comment documenting the single-threaded async assumption for future readers.
In test_anthropic_client_wraps_raw_client_with_standard_layer_order, prefer relative ordering checks via mro.index() over hardcoded MRO indices. Clients like AzureOpenAIChatClient already have extra mixins that would shift indices, so asserting mro.index(FunctionInvocationLayer) < mro.index(ChatMiddlewareLayer) < mro.index(ChatTelemetryLayer) < mro.index(RawAnthropicClient) expresses the intent without breaking when a new mixin is inserted.
The matches() method uses self._source_middleware == tuple(middleware), which relies on the default identity-based __eq__ of middleware objects. This is correct for all current middleware, but a brief comment explaining the assumption would prevent incorrect cache reuse if a future middleware class defines value-based __eq__.

Automated review by eavanvalkenburg's agents

python/packages/core/agent_framework/_tools.py

python/packages/anthropic/tests/test_anthropic_client.py

…RO assertions - Broaden isinstance check in categorize_middleware from list to Sequence so tuples and other Sequence types are properly unpacked instead of being appended as a single item. - Replace fragile hardcoded MRO index assertions in anthropic test with relative ordering via mro.index(). - Add regression tests for categorize_middleware with tuple, list, and None inputs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3

Automated Code Review

Reviewers: 3 | Confidence: 87%

✗ Security Reliability

This PR refactors the MRO ordering of middleware layers (swapping FunctionInvocationLayer above ChatMiddlewareLayer), extracts a RawAnthropicClient, adds pipeline caching for middleware, and moves middleware routing responsibilities between layers. The changes are largely structural and well-tested. One reliability concern: the categorize_middleware change from isinstance(source, list) to isinstance(source, Sequence) will also match str and bytes, which are sequences of characters/ints — if a string were ever passed, it would be silently decomposed character-by-character into the middleware list. While type annotations make this unlikely at runtime, a defensive guard is warranted. The pipeline caching is single-slot (one cached value per client instance) which is correct for the common case but worth noting. No security issues (no secrets, no injection risks, no unsafe deserialization) were found.

✓ Test Coverage

This is a large refactoring that reorders the MRO so FunctionInvocationLayer comes before ChatMiddlewareLayer, extracts RawAnthropicClient, adds middleware pipeline caching, and moves function-middleware routing into FunctionInvocationLayer. The core behavioral changes are well-tested: new integration tests verify the tool-loop middleware ordering, pipeline caching is covered for all three pipeline types, and categorize_middleware's Sequence support is tested. However, the MRO reordering was applied to ~12 client classes but only the Anthropic client has an explicit MRO ordering test; the other reordered clients (OpenAI, Azure, Bedrock, Ollama, FoundryLocal, etc.) lack similar guards. RawAnthropicClient is newly exported but has no test exercising it independently.

✗ Design Approach

The layer reordering (FunctionInvocationLayer → ChatMiddlewareLayer → ChatTelemetryLayer) is correct and a genuine semantic improvement: chat middleware now executes per model-call within the tool loop rather than wrapping the entire loop. However, the PR has one blocking design issue: removing middleware= as a first-class named parameter from FunctionInvocationLayer.get_response creates a provider-inconsistent API. OpenAIChatClient re-adds middleware= via its own get_response override and converts it to client_kwargs["middleware"], but every other provider (Anthropic, Bedrock, Ollama, Azure AI Agent, Foundry Local, AG-UI) has no such override. Callers who pass middleware=[...] directly to those clients will have their middleware silently ignored — it flows through **kwargs past FunctionInvocationLayer and ChatMiddlewareLayer (both of which only inspect client_kwargs["middleware"]), and is eventually stripped by the raw client. This also introduces a leaky abstraction: client_kwargs is documented as provider-specific HTTP-client options, yet it is now the inter-layer messaging channel for middleware threading, with both FunctionInvocationLayer and ChatMiddlewareLayer silently popping a magic "middleware" key from it.

Flagged Issues

categorize_middleware: changing isinstance(source, list) to isinstance(source, Sequence) also matches str and bytes, which would silently decompose a string into individual characters. Add and not isinstance(source, (str, bytes)) to guard against accidental misuse.
middleware= is removed as a first-class named parameter from FunctionInvocationLayer.get_response. Only OpenAIChatClient re-adds it via its own override; AnthropicClient, BedrockChatClient, OllamaChatClient, AzureAIAgentClient, FoundryLocalClient, and AGUIChatClient do not. Calling get_response(messages, middleware=[...]) on those clients silently does nothing—the kwarg flows through **kwargs past FunctionInvocationLayer and ChatMiddlewareLayer (both only read from client_kwargs["middleware"]) and is stripped by the raw provider call. Fix by adding middleware as an explicit named parameter on FunctionInvocationLayer.get_response so the conversion to client_kwargs["middleware"] happens uniformly for all providers.

Suggestions

The single-slot pipeline cache stores only the most recent pipeline per client instance. Callers that alternate between different per-call middleware sets will thrash the cache with no benefit. Consider a small bounded LRU (e.g. up to 4 entries keyed by tuple identity) or at minimum document the single-slot limitation.
The matches() methods on pipeline classes compare tuple(middleware) using ==, relying on middleware object identity/equality. This is correct but worth documenting so users aren't surprised that semantically-equivalent but distinct middleware objects don't hit the cache.
Add MRO ordering tests for other reordered clients (OpenAIChatClient, OpenAIResponsesClient, AzureOpenAIChatClient, AzureOpenAIResponsesClient, AzureAIClient, AzureAIAgentClient, BedrockChatClient, OllamaChatClient, FoundryLocalClient, OpenAIAssistantsClient). A single parameterized test asserting FunctionInvocationLayer < ChatMiddlewareLayer < ChatTelemetryLayer ordering across all public clients would be a lightweight guard against MRO regressions.
Add a test verifying RawAnthropicClient does NOT inherit FunctionInvocationLayer, ChatMiddlewareLayer, or ChatTelemetryLayer, matching the pattern already established by RawOpenAIChatClient.
The chat middleware pipeline cache test only exercises the path where chat_client_base.chat_middleware is empty. Add a case with non-empty base chat middleware (analogous to the function middleware cache test) to verify the cache key correctly includes base middleware.
client_kwargs is documented as provider-specific HTTP-client options but is now also used as an inter-layer messaging channel for middleware routing. This conflates two concerns and means a provider option legitimately named "middleware" would silently interfere. Consider a dedicated internal dict (e.g. _layer_kwargs) for cross-layer communication.

Automated review by moonbox3's agents

python/packages/core/agent_framework/_middleware.py

python/packages/core/agent_framework/_tools.py

python/packages/anthropic/tests/test_anthropic_client.py

python/packages/core/tests/core/test_middleware_with_chat.py

python/packages/core/tests/core/test_middleware.py

…InvocationLayer, and add tests (microsoft#4710) - Guard categorize_middleware Sequence check against str/bytes to prevent character-by-character decomposition of accidentally passed strings - Add explicit middleware parameter to FunctionInvocationLayer.get_response and merge it into client_kwargs before categorization, fixing the inconsistency where only OpenAIChatClient supported this parameter - Add assertions that RawAnthropicClient does not inherit convenience layers - Add chat middleware cache test with non-empty base middleware - Add tests for single unwrapped middleware item and string input Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

TaoChenOSU

Automated Code Review

Reviewers: 4 | Confidence: 75%

✓ Correctness

✓ Security Reliability

This is a large structural refactoring that reorders the MRO of middleware layers (FunctionInvocationLayer now outermost, ChatMiddlewareLayer in the middle, ChatTelemetryLayer innermost), extracts RawAnthropicClient, and adds single-slot pipeline caching with a matches() method. The security posture is largely unchanged. The main reliability concern is an inconsistent type check in FunctionInvocationLayer.get_response where the middleware-merging logic checks isinstance(existing, list) but the categorize_middleware function was simultaneously fixed to handle all Sequence types. If a caller passes middleware as a tuple via client_kwargs and also provides the middleware parameter, the tuple would be wrapped as a single item instead of being unpacked. The practical impact is low because the dual-source scenario is unlikely in normal framework usage, but it is a latent defect that contradicts the intent of the companion categorize_middleware fix.

✓ Test Coverage

This PR refactors the MRO ordering to place FunctionInvocationLayer before ChatMiddlewareLayer across all clients, splits RawAnthropicClient from AnthropicClient, adds middleware pipeline caching, and routes combined middleware through FunctionInvocationLayer. Test coverage is generally good: new tests verify MRO ordering for Anthropic, pipeline caching for chat/function/agent middleware, combined middleware with tool loops, and the categorize_middleware tuple/string fix. However, there are a few gaps: the string-in-categorize_middleware test has a weak assertion, there's no streaming variant of the new combined-middleware-with-tool-loop tests, and several providers (Bedrock, Ollama, FoundryLocal, OpenAI Assistants, Azure OpenAI) that received MRO changes have no corresponding MRO-ordering tests analogous to the Anthropic one.

✗ Design Approach

This PR correctly reverses the layer ordering to FunctionInvocationLayer → ChatMiddlewareLayer → ChatTelemetryLayer so chat middleware wraps each inner model call in a tool loop rather than the entire loop. The core architectural change is sound. However, the new middleware-merging code added inside FunctionInvocationLayer.get_response contains the same isinstance(existing, list) narrowness that categorize_middleware was simultaneously fixed to avoid, creating a gap where non-list sequences (e.g., tuples) passed in client_kwargs["middleware"] would be silently wrapped as a single item rather than expanded. Additionally, several tests were changed from the newly-typed middleware= parameter to the untyped client_kwargs={"middleware": ...} bypass, which is inconsistent with the typed overloads added on FunctionInvocationLayer.get_response and makes the API intent harder to follow.

Flagged Issues

In FunctionInvocationLayer.get_response, the merging of the per-call middleware param into effective_client_kwargs uses isinstance(existing, list) to decide whether to spread or wrap the pre-existing value. This is the exact narrowness that categorize_middleware was fixed to avoid in the same PR (changed to isinstance(source, Sequence) and not isinstance(source, (str, bytes))). A tuple or other non-list sequence in client_kwargs["middleware"] would be silently wrapped as a single element instead of being spread, corrupting the combined middleware list.

Suggestions

The single-slot pipeline caches (_cached_chat_middleware_pipeline, _cached_function_middleware_pipeline, _cached_agent_middleware_pipeline) only cache the most recent configuration. Consider documenting that passing different runtime middleware each call will always reconstruct the pipeline, so callers are aware of the caching limitations.
Add a streaming variant of test_run_level_chat_and_function_middleware_split_per_function_loop_round to verify chat middleware runs per model call in the streaming tool-loop path, since FunctionInvocationLayer has separate streaming logic that also calls super_get_response with **filtered_kwargs.
Consider adding MRO-ordering tests similar to test_anthropic_client_wraps_raw_client_with_standard_layer_order for other providers (OpenAIChatClient, OpenAIResponsesClient, AzureOpenAIChatClient, BedrockChatClient, etc.) to guard against future MRO regressions.
The test_categorize_middleware_with_string_does_not_decompose assertion (total_items <= 1) is too loose — it passes whether the string is silently dropped or bucketed into 'agent'. Assert the exact expected behavior (e.g., assert total_items == 1 and assert result['agent'] == ['not_a_middleware']) to make the test precise.
The pipeline caching tests only cover the _get_*_pipeline helper methods directly but don't test that a full get_response call actually reuses the cache on repeated calls with identical middleware, which would confirm end-to-end caching integration.
Several tests were changed from get_response(..., middleware=[mw]) to get_response(..., client_kwargs={"middleware": [mw]}) even though FunctionInvocationLayer.get_response now exposes a typed middleware parameter. Using the untyped client_kwargs bypass undermines discoverability of the typed API. The tests in test_function_invocation_logic.py (lines 3226, 3292, 3345) and test_middleware_with_chat.py should use the typed parameter unless they are deliberately testing the client_kwargs code path, in which case a clarifying comment is warranted.

Automated review by TaoChenOSU's agents

python/packages/core/agent_framework/_tools.py

python/packages/core/tests/core/test_middleware.py

python/packages/core/tests/core/test_middleware_with_chat.py

eavanvalkenburg and others added 2 commits March 17, 2026 17:09

Tighten typing ignores in ancillary modules

aa77862

Add targeted typing ignores in workflow visualization and lab modules so pyright stays clean alongside the middleware refactor work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 17, 2026 16:14

markwallace-microsoft added documentation Improvements or additions to documentation python lab Agent Framework Lab labels Mar 17, 2026

github-actions bot changed the title ~~[BREAKING] Refactor middleware layering and split Anthropic raw client~~ Python: [BREAKING] Refactor middleware layering and split Anthropic raw client Mar 17, 2026

Copilot started reviewing on behalf of eavanvalkenburg March 17, 2026 16:15 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

python/packages/core/agent_framework/_tools.py Show resolved Hide resolved

python/packages/core/agent_framework/openai/_chat_client.py Show resolved Hide resolved

python/samples/02-agents/middleware/usage_tracking_middleware.py Show resolved Hide resolved

eavanvalkenburg commented Mar 17, 2026

View reviewed changes

python/packages/core/agent_framework/_tools.py Show resolved Hide resolved

python/packages/anthropic/tests/test_anthropic_client.py Outdated Show resolved Hide resolved

moonbox3 reviewed Mar 18, 2026

View reviewed changes

Copilot and others added 3 commits March 18, 2026 08:37

Merge remote-tracking branch 'origin/main' into fix_4710

c0512c8

Apply pre-commit auto-fixes

87f0ac5

TaoChenOSU reviewed Mar 18, 2026

View reviewed changes

python/packages/core/agent_framework/_tools.py Outdated Show resolved Hide resolved

python/packages/core/tests/core/test_middleware.py Outdated Show resolved Hide resolved

python/packages/core/tests/core/test_middleware_with_chat.py Show resolved Hide resolved

Copilot added 3 commits March 19, 2026 10:46

Merge remote-tracking branch 'upstream/main' into fix_4710

150fc28

Apply pre-commit auto-fixes

4784792

Address review feedback for microsoft#4710: review comment fixes

22b6164

moonbox3 approved these changes Mar 19, 2026

View reviewed changes

eavanvalkenburg mentioned this pull request Mar 19, 2026

Python: Update docs for middleware layering refactor and Anthropic client split #4789

Open

4 tasks

giles17 approved these changes Mar 19, 2026

View reviewed changes

giles17 mentioned this pull request Mar 19, 2026

Python: [Bug]: ChatMiddleware isn't called when submitting tool results back to the model #4767

Open

LEDazzio01 mentioned this pull request Mar 19, 2026

Python: Update docs for middleware layering refactor and Anthropic client split #4801

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: [BREAKING] Refactor middleware layering and split Anthropic raw client#4746

Python: [BREAKING] Refactor middleware layering and split Anthropic raw client#4746
eavanvalkenburg wants to merge 9 commits intomicrosoft:mainfrom
eavanvalkenburg:fix_4710

eavanvalkenburg commented Mar 17, 2026

Uh oh!

markwallace-microsoft commented Mar 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eavanvalkenburg left a comment

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TaoChenOSU left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

eavanvalkenburg commented Mar 17, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

markwallace-microsoft commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eavanvalkenburg left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Design Approach

Suggestions

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✗ Security Reliability

✓ Test Coverage

✗ Design Approach

Flagged Issues

Suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TaoChenOSU left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

✗ Design Approach

Flagged Issues

Suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

markwallace-microsoft commented Mar 17, 2026 •

edited

Loading