Python: Foundry Evals integration for Python#4750

Draft

alliscode wants to merge 1 commit intomicrosoft:mainfrom

alliscode:af-foundry-evals-python

Member

alliscode commented Mar 17, 2026

Add evaluation framework with local and Foundry-hosted evaluator support:

EvalItem/EvalResult core types with conversation splitting strategies
@evaluator decorator for defining custom evaluation functions
LocalEvaluator for running evaluations locally
FoundryEvals provider for Azure AI Foundry hosted evaluations
evaluate_agent() orchestration with expected values support
evaluate_workflow() for multi-agent workflow evaluation
Comprehensive test suite and evaluation samples

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

markwallace-microsoft added documentation python labels

github-actions bot changed the title ~~Foundry Evals integration for Python~~ Python: Foundry Evals integration for Python

alliscode force-pushed the af-foundry-evals-python branch from a0edd5f to fe9e621 Compare

March 17, 2026 21:21

eavanvalkenburg reviewed

View reviewed changes

python/packages/core/agent_framework/_evaluation.py

Member

eavanvalkenburg Mar 18, 2026

let's call this file _evaluation and include the contents of _local_eval

Member Author

alliscode Mar 19, 2026

Done ✅ — merged _eval.py + _local_eval.py into single _evaluation.py. All imports updated across 12 files.

python/packages/core/agent_framework/__init__.py Show resolved Hide resolved

python/packages/core/agent_framework/_eval.py Outdated Show resolved Hide resolved

python/packages/core/agent_framework/_eval.py Outdated

+                      assistant_texts = [m.text for m in response_msgs if m.role == "assistant" and m.text]
+                      return " ".join(assistant_texts).strip()
+                  def to_dict(

Member

eavanvalkenburg Mar 18, 2026

this should be named something else, becuase it is not just a dict, it is a highly specific dict.

Member Author

alliscode Mar 19, 2026

Renamed to to_eval_data() to better reflect the specific structure it produces.

python/packages/core/agent_framework/_eval.py

		"""


		@dataclass

Member

eavanvalkenburg Mar 18, 2026

we are putting a awful lot of logic into a dataclass, that is not the intent of dataclasses (at least not how we prefer to use them), so let's either turn into a regular class, or move the helper functions outside of it and ensure they accept a EvalItem object as input.

Member Author

alliscode Mar 19, 2026

Converted EvalItem from dataclass to regular class with __init__. Helper methods stay on the class since they operate on self.

python/packages/azure-ai/agent_framework_azure_ai/_foundry_evals.py Outdated

+                  result = func(*args, **kwargs)
+                  if inspect.isawaitable(result):
+                      return await result
+                  return await asyncio.to_thread(lambda: result)

Member

eavanvalkenburg Mar 18, 2026

why is this needed?

Member Author

alliscode Mar 19, 2026

Removed — the isawaitable check is sufficient, no need for asyncio.to_thread.

python/packages/azure-ai/agent_framework_azure_ai/_foundry_evals.py Outdated



		async def _poll_eval_run(
		client: OpenAI \| AsyncOpenAI,

Member

eavanvalkenburg Mar 18, 2026

I think we should limit this to AsyncOpenAI we use async everywhere in AF, so doesn't make much sense to suddenly introduce sync here.

Member Author

alliscode Mar 19, 2026

Done — limited to AsyncOpenAI only. Removed sync OpenAI support since AF is async-everywhere.

python/packages/azure-ai/agent_framework_azure_ai/_foundry_evals.py Outdated

+                      self,
+                      *,
+                      project_client: Any | None = None,
+                      openai_client: OpenAI | AsyncOpenAI | None = None,

Member

eavanvalkenburg Mar 18, 2026

same here

Member Author

alliscode Mar 19, 2026

Done — same async-only change applied here.

python/packages/azure-ai/agent_framework_azure_ai/_foundry_evals.py Outdated

+                  def __init__(
+                      self,
+                      *,
+                      project_client: Any | None = None,

Member

eavanvalkenburg Mar 18, 2026

project client is a dependency of the core framework, so we can type this

Member Author

alliscode Mar 19, 2026

Done — typed as AIProjectClient from azure.ai.projects.aio under TYPE_CHECKING.

python/packages/azure-ai/agent_framework_azure_ai/_foundry_evals.py Outdated

+                      NotImplementedError: The continuous evaluation rules API shape is not
+                          yet finalized.
+                  """
+                  raise NotImplementedError(

Member

eavanvalkenburg Mar 18, 2026

if this is not ready, let's remove it for now

Member Author

alliscode Mar 19, 2026

Removed the entire setup_continuous_evaluation function.

alliscode force-pushed the af-foundry-evals-python branch 5 times, most recently from 8d4289c to 15d8640 Compare

March 19, 2026 16:57


          Foundry Evals integration for Python

aad92ac

Merged and refactored eval module per Eduard's PR review:

- Merge _eval.py + _local_eval.py into single _evaluation.py
- Convert EvalItem from dataclass to regular class
- Rename to_dict() to to_eval_data()
- Convert _AgentEvalData to TypedDict
- Simplify check system: unified async pattern with isawaitable
- Parallelize checks and evaluators with asyncio.gather
- Add all/any mode to tool_called_check
- Fix bool(passed) truthy bug in _coerce_result
- Remove deprecated function_evaluator/async_function_evaluator aliases
- Remove _MinimalAgent, tighten evaluate_agent signature
- Set self.name in __init__ (LocalEvaluator, FoundryEvals)
- Limit FoundryEvals to AsyncOpenAI only
- Type project_client as AIProjectClient
- Remove NotImplementedError continuous eval code
- Add evaluation samples in 02-agents/ and 03-workflows/
- Update all imports and tests (167 passing)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

alliscode force-pushed the af-foundry-evals-python branch from 15d8640 to aad92ac Compare

March 19, 2026 20:41

Member

markwallace-microsoft commented Mar 19, 2026

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/azure-ai/agent_framework_azure_ai
_foundry_evals.py	229	49	78%	247, 269, 274–275, 292–296, 303, 306–309, 318–327, 586, 593, 605, 612, 727–728, 730–731, 738, 744–745, 747, 751–754, 756, 763, 770, 813–814, 816, 826, 835, 842
packages/core/agent_framework
_agents.py	362	47	87%	465, 469, 524, 942, 978, 994, 1091–1095, 1150, 1178, 1311, 1327, 1329, 1342, 1348, 1384, 1386, 1395–1400, 1405, 1407, 1413–1414, 1421, 1423–1424, 1432–1433, 1436–1438, 1448–1453, 1457, 1462, 1464
_evaluation.py	613	96	84%	225, 257, 272, 486, 488, 592–593, 672–674, 679, 719–722, 779–780, 783, 789–791, 793, 824–826, 878, 903–918, 920, 922, 1018, 1124, 1424–1425, 1431–1432, 1459, 1461–1464, 1470, 1474–1476, 1480–1482, 1486–1487, 1507–1510, 1512, 1585, 1600, 1604–1606, 1631, 1637–1641, 1675, 1696–1699, 1701, 1703–1705, 1715, 1721–1722, 1724, 1757–1758, 1763
packages/core/agent_framework/_workflows
_agent_executor.py	208	24	88%	97, 113, 168–169, 221–222, 224–225, 255–257, 265–267, 275–277, 279, 283, 287, 394–395, 460, 479
_workflow.py	270	19	92%	88, 269–271, 273–274, 292, 296, 434, 622, 643, 699, 711, 717, 722, 742–744, 757
TOTAL	27975	3366	87%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
5443	20 💤	0 ❌	0 🔥	1m 27s ⏱️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation python