Python: Foundry Evals integration for Python#4750
Python: Foundry Evals integration for Python#4750alliscode wants to merge 1 commit intomicrosoft:mainfrom
Conversation
a0edd5f to
fe9e621
Compare
There was a problem hiding this comment.
let's call this file _evaluation and include the contents of _local_eval
There was a problem hiding this comment.
Done ✅ — merged _eval.py + _local_eval.py into single _evaluation.py. All imports updated across 12 files.
| assistant_texts = [m.text for m in response_msgs if m.role == "assistant" and m.text] | ||
| return " ".join(assistant_texts).strip() | ||
|
|
||
| def to_dict( |
There was a problem hiding this comment.
this should be named something else, becuase it is not just a dict, it is a highly specific dict.
There was a problem hiding this comment.
Renamed to to_eval_data() to better reflect the specific structure it produces.
| """ | ||
|
|
||
|
|
||
| @dataclass |
There was a problem hiding this comment.
we are putting a awful lot of logic into a dataclass, that is not the intent of dataclasses (at least not how we prefer to use them), so let's either turn into a regular class, or move the helper functions outside of it and ensure they accept a EvalItem object as input.
There was a problem hiding this comment.
Converted EvalItem from dataclass to regular class with __init__. Helper methods stay on the class since they operate on self.
| result = func(*args, **kwargs) | ||
| if inspect.isawaitable(result): | ||
| return await result | ||
| return await asyncio.to_thread(lambda: result) |
There was a problem hiding this comment.
Removed — the isawaitable check is sufficient, no need for asyncio.to_thread.
|
|
||
|
|
||
| async def _poll_eval_run( | ||
| client: OpenAI | AsyncOpenAI, |
There was a problem hiding this comment.
I think we should limit this to AsyncOpenAI we use async everywhere in AF, so doesn't make much sense to suddenly introduce sync here.
There was a problem hiding this comment.
Done — limited to AsyncOpenAI only. Removed sync OpenAI support since AF is async-everywhere.
| self, | ||
| *, | ||
| project_client: Any | None = None, | ||
| openai_client: OpenAI | AsyncOpenAI | None = None, |
There was a problem hiding this comment.
Done — same async-only change applied here.
| def __init__( | ||
| self, | ||
| *, | ||
| project_client: Any | None = None, |
There was a problem hiding this comment.
project client is a dependency of the core framework, so we can type this
There was a problem hiding this comment.
Done — typed as AIProjectClient from azure.ai.projects.aio under TYPE_CHECKING.
| NotImplementedError: The continuous evaluation rules API shape is not | ||
| yet finalized. | ||
| """ | ||
| raise NotImplementedError( |
There was a problem hiding this comment.
if this is not ready, let's remove it for now
There was a problem hiding this comment.
Removed the entire setup_continuous_evaluation function.
8d4289c to
15d8640
Compare
Merged and refactored eval module per Eduard's PR review: - Merge _eval.py + _local_eval.py into single _evaluation.py - Convert EvalItem from dataclass to regular class - Rename to_dict() to to_eval_data() - Convert _AgentEvalData to TypedDict - Simplify check system: unified async pattern with isawaitable - Parallelize checks and evaluators with asyncio.gather - Add all/any mode to tool_called_check - Fix bool(passed) truthy bug in _coerce_result - Remove deprecated function_evaluator/async_function_evaluator aliases - Remove _MinimalAgent, tighten evaluate_agent signature - Set self.name in __init__ (LocalEvaluator, FoundryEvals) - Limit FoundryEvals to AsyncOpenAI only - Type project_client as AIProjectClient - Remove NotImplementedError continuous eval code - Add evaluation samples in 02-agents/ and 03-workflows/ - Update all imports and tests (167 passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
15d8640 to
aad92ac
Compare
Add evaluation framework with local and Foundry-hosted evaluator support:
Contribution Checklist