feat(azure-ai-projects): add EndpointBasedEvaluatorDefinition model#47540
Open
ahmad-nader wants to merge 3 commits into
Open
feat(azure-ai-projects): add EndpointBasedEvaluatorDefinition model#47540ahmad-nader wants to merge 3 commits into
ahmad-nader wants to merge 3 commits into
Conversation
Add endpoint-based evaluator definition type to support custom HTTP endpoint evaluators. Merge upstream model changes (CodeConfiguration, EntraAuthorizationScheme, RubricBasedEvaluatorDefinition) with the new endpoint evaluator discriminator. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…evaluators - Unit tests: model creation, serialization, discriminator behavior - Integration tests: E2E API Key and Entra ID flows (live only) - Update conftest to allow endpoint evaluator unit tests in PR pipeline Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…arsing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Thank you for your contribution @ahmad-nader! We will review the pull request and get back to you soon. |
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds support and guidance for “endpoint-based evaluators” by introducing a new evaluator definition model, along with unit tests, live integration tests, and end-to-end samples (API key + Entra ID).
Changes:
- Introduces
EndpointBasedEvaluatorDefinitionandEvaluatorDefinitionType.ENDPOINTfor discriminator-based model support. - Adds unit tests for model construction/serialization and live tests for API key / Entra ID evaluator workflows.
- Adds two runnable samples demonstrating end-to-end setup and execution for endpoint evaluators.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/ai/azure-ai-projects/tests/evaluations/test_endpoint_evaluators_live.py | New live E2E tests for endpoint evaluators (API key + Entra ID). |
| sdk/ai/azure-ai-projects/tests/evaluations/test_endpoint_evaluator_models.py | New unit tests for EndpointBasedEvaluatorDefinition discriminator/serialization behavior. |
| sdk/ai/azure-ai-projects/tests/conftest.py | Updates test skipping logic to keep new unit tests running in PR pipeline. |
| sdk/ai/azure-ai-projects/samples/evaluations/sample_endpoint_evaluator_with_entra_id.py | New Entra ID (AAD) end-to-end sample for endpoint evaluators. |
| sdk/ai/azure-ai-projects/samples/evaluations/sample_endpoint_evaluator_with_api_key.py | New API key end-to-end sample for endpoint evaluators. |
| sdk/ai/azure-ai-projects/azure/ai/projects/models/_models.py | Adds EndpointBasedEvaluatorDefinition model + updates base model docs. |
| sdk/ai/azure-ai-projects/azure/ai/projects/models/_enums.py | Adds EvaluatorDefinitionType.ENDPOINT. |
| sdk/ai/azure-ai-projects/azure/ai/projects/models/init.py | Exports EndpointBasedEvaluatorDefinition from the models package. |
Comment on lines
37
to
+40
| if "tests\\evaluation" in path or "tests/evaluation" in path: | ||
| # test_human_evaluations.py is a pure unit test with no Microsoft Foundry | ||
| # dependency, so it must keep running in the PR pipeline. | ||
| if "test_human_evaluations" in os.path.basename(path): | ||
| # Pure unit tests with no Microsoft Foundry dependency must keep running in the PR pipeline. | ||
| basename = os.path.basename(path) | ||
| if "test_human_evaluations" in basename or "test_endpoint_evaluator_models" in basename: |
Comment on lines
+11
to
+12
| To run these tests: | ||
| pytest tests/evaluations/test_endpoint_evaluators_live.py -s --run-live |
Comment on lines
+47
to
+51
| # Skip all tests in this module unless running live | ||
| pytestmark = pytest.mark.skipif( | ||
| os.environ.get("AZURE_TEST_RUN_LIVE") != "true", | ||
| reason="Live tests only — set AZURE_TEST_RUN_LIVE=true", | ||
| ) |
Comment on lines
+95
to
+104
| endpoint = os.environ["AZURE_AI_PROJECT_ENDPOINT"] | ||
| return { | ||
| "endpoint": endpoint, | ||
| "subscription_id": os.environ["AZURE_SUBSCRIPTION_ID"], | ||
| "resource_group": os.environ["AZURE_RESOURCE_GROUP"], | ||
| "endpoint_url": os.environ["ENDPOINT_URL"], | ||
| "endpoint_api_key": os.environ.get("ENDPOINT_API_KEY", ""), | ||
| "endpoint_app_id": os.environ.get("ENDPOINT_APP_ID", ""), | ||
| "account_name": urlparse(endpoint).hostname.split(".")[0], | ||
| } |
Comment on lines
+161
to
+162
| connection_name = "test-apikey-conn-live" | ||
| evaluator_name = "test-endpoint-eval-apikey-live" |
Comment on lines
+308
to
+309
| connection_name = "test-entra-conn-live" | ||
| evaluator_name = "test-endpoint-eval-entra-live" |
| assert len(output_items) == 2 | ||
| for item in output_items: | ||
| # Items should be in error state since the endpoint rejected the invalid key | ||
| assert item.status == "error" or (item.results and len(item.results) == 0) |
Comment on lines
+3395
to
+3398
| :ivar metrics: List of output metrics produced by this evaluator. | ||
| :vartype metrics: dict[str, ~azure.ai.projects.models.EvaluatorMetric] | ||
| :ivar type: Required. Endpoint-based definition. | ||
| :vartype type: str or ~azure.ai.projects.models.ENDPOINT |
Comment on lines
+235
to
+244
| while True: | ||
| run = client.evals.runs.retrieve(run_id=eval_run.id, eval_id=eval_object.id) | ||
| if run.status in ("completed", "failed"): | ||
| print(f" Run status: {run.status}") | ||
| output_items = list(client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id)) | ||
| pprint(output_items) | ||
| print(f" Report URL: {run.report_url}") | ||
| break | ||
| time.sleep(5) | ||
| print(f" Status: {run.status} — polling again...") |
Comment on lines
+108
to
+111
| mgmt_client = CognitiveServicesManagementClient( | ||
| credential=credential, | ||
| subscription_id=subscription_id, | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add support for endpoint-based custom evaluators in azure-ai-projects. This includes:
Merge also includes upstream model additions: CodeConfiguration, EntraAuthorizationScheme, RubricBasedEvaluatorDefinition.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines