Skip to content

feat(azure-ai-projects): add EndpointBasedEvaluatorDefinition model#47540

Open
ahmad-nader wants to merge 3 commits into
Azure:mainfrom
ahmad-nader:ahmadnader/custom-evaluators
Open

feat(azure-ai-projects): add EndpointBasedEvaluatorDefinition model#47540
ahmad-nader wants to merge 3 commits into
Azure:mainfrom
ahmad-nader:ahmadnader/custom-evaluators

Conversation

@ahmad-nader

Copy link
Copy Markdown

Add support for endpoint-based custom evaluators in azure-ai-projects. This includes:

  • EndpointBasedEvaluatorDefinition model with connection_name field and "endpoint" discriminator type
  • ENDPOINT value added to EvaluatorDefinitionType enum
  • Two E2E samples demonstrating the full workflow (connection creation â�� evaluator registration â�� evaluation run): - sample_endpoint_evaluator_with_api_key.py â�� API Key authentication
  • sample_endpoint_evaluator_with_entra_id.py â�� Entra ID (managed identity) authentication
  • Samples use azure-mgmt-cognitiveservices for connection creation
  • Unit tests for model serialization/deserialization (5 tests, passing in PR pipeline)
  • Integration tests for live E2E validation (API Key + Entra ID flows)

Merge also includes upstream model additions: CodeConfiguration, EntraAuthorizationScheme, RubricBasedEvaluatorDefinition.

All SDK Contribution checklist:

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message.

Testing Guidelines

  • Pull request includes test coverage for the included changes

Ahmad Nader and others added 3 commits June 17, 2026 16:28
Add endpoint-based evaluator definition type to support custom HTTP
endpoint evaluators. Merge upstream model changes (CodeConfiguration,
EntraAuthorizationScheme, RubricBasedEvaluatorDefinition) with the new
endpoint evaluator discriminator.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…evaluators

- Unit tests: model creation, serialization, discriminator behavior
- Integration tests: E2E API Key and Entra ID flows (live only)
- Update conftest to allow endpoint evaluator unit tests in PR pipeline

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…arsing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 13:48
@github-actions github-actions Bot added AI Projects Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Jun 17, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Thank you for your contribution @ahmad-nader! We will review the pull request and get back to you soon.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds support and guidance for “endpoint-based evaluators” by introducing a new evaluator definition model, along with unit tests, live integration tests, and end-to-end samples (API key + Entra ID).

Changes:

  • Introduces EndpointBasedEvaluatorDefinition and EvaluatorDefinitionType.ENDPOINT for discriminator-based model support.
  • Adds unit tests for model construction/serialization and live tests for API key / Entra ID evaluator workflows.
  • Adds two runnable samples demonstrating end-to-end setup and execution for endpoint evaluators.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
sdk/ai/azure-ai-projects/tests/evaluations/test_endpoint_evaluators_live.py New live E2E tests for endpoint evaluators (API key + Entra ID).
sdk/ai/azure-ai-projects/tests/evaluations/test_endpoint_evaluator_models.py New unit tests for EndpointBasedEvaluatorDefinition discriminator/serialization behavior.
sdk/ai/azure-ai-projects/tests/conftest.py Updates test skipping logic to keep new unit tests running in PR pipeline.
sdk/ai/azure-ai-projects/samples/evaluations/sample_endpoint_evaluator_with_entra_id.py New Entra ID (AAD) end-to-end sample for endpoint evaluators.
sdk/ai/azure-ai-projects/samples/evaluations/sample_endpoint_evaluator_with_api_key.py New API key end-to-end sample for endpoint evaluators.
sdk/ai/azure-ai-projects/azure/ai/projects/models/_models.py Adds EndpointBasedEvaluatorDefinition model + updates base model docs.
sdk/ai/azure-ai-projects/azure/ai/projects/models/_enums.py Adds EvaluatorDefinitionType.ENDPOINT.
sdk/ai/azure-ai-projects/azure/ai/projects/models/init.py Exports EndpointBasedEvaluatorDefinition from the models package.

Comment on lines 37 to +40
if "tests\\evaluation" in path or "tests/evaluation" in path:
# test_human_evaluations.py is a pure unit test with no Microsoft Foundry
# dependency, so it must keep running in the PR pipeline.
if "test_human_evaluations" in os.path.basename(path):
# Pure unit tests with no Microsoft Foundry dependency must keep running in the PR pipeline.
basename = os.path.basename(path)
if "test_human_evaluations" in basename or "test_endpoint_evaluator_models" in basename:
Comment on lines +11 to +12
To run these tests:
pytest tests/evaluations/test_endpoint_evaluators_live.py -s --run-live
Comment on lines +47 to +51
# Skip all tests in this module unless running live
pytestmark = pytest.mark.skipif(
os.environ.get("AZURE_TEST_RUN_LIVE") != "true",
reason="Live tests only — set AZURE_TEST_RUN_LIVE=true",
)
Comment on lines +95 to +104
endpoint = os.environ["AZURE_AI_PROJECT_ENDPOINT"]
return {
"endpoint": endpoint,
"subscription_id": os.environ["AZURE_SUBSCRIPTION_ID"],
"resource_group": os.environ["AZURE_RESOURCE_GROUP"],
"endpoint_url": os.environ["ENDPOINT_URL"],
"endpoint_api_key": os.environ.get("ENDPOINT_API_KEY", ""),
"endpoint_app_id": os.environ.get("ENDPOINT_APP_ID", ""),
"account_name": urlparse(endpoint).hostname.split(".")[0],
}
Comment on lines +161 to +162
connection_name = "test-apikey-conn-live"
evaluator_name = "test-endpoint-eval-apikey-live"
Comment on lines +308 to +309
connection_name = "test-entra-conn-live"
evaluator_name = "test-endpoint-eval-entra-live"
assert len(output_items) == 2
for item in output_items:
# Items should be in error state since the endpoint rejected the invalid key
assert item.status == "error" or (item.results and len(item.results) == 0)
Comment on lines +3395 to +3398
:ivar metrics: List of output metrics produced by this evaluator.
:vartype metrics: dict[str, ~azure.ai.projects.models.EvaluatorMetric]
:ivar type: Required. Endpoint-based definition.
:vartype type: str or ~azure.ai.projects.models.ENDPOINT
Comment on lines +235 to +244
while True:
run = client.evals.runs.retrieve(run_id=eval_run.id, eval_id=eval_object.id)
if run.status in ("completed", "failed"):
print(f" Run status: {run.status}")
output_items = list(client.evals.runs.output_items.list(run_id=run.id, eval_id=eval_object.id))
pprint(output_items)
print(f" Report URL: {run.report_url}")
break
time.sleep(5)
print(f" Status: {run.status} — polling again...")
Comment on lines +108 to +111
mgmt_client = CognitiveServicesManagementClient(
credential=credential,
subscription_id=subscription_id,
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Projects Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants