Skip to content

fix(eval): handle unevaluated final response v2 results#5728

Open
pragnyanramtha wants to merge 11 commits into
google:mainfrom
pragnyanramtha:pragnyan/final-response-v2-no-eval-guard
Open

fix(eval): handle unevaluated final response v2 results#5728
pragnyanramtha wants to merge 11 commits into
google:mainfrom
pragnyanramtha:pragnyan/final-response-v2-no-eval-guard

Conversation

@pragnyanramtha
Copy link
Copy Markdown

Summary

Fixes a small aggregation edge case in FinalResponseMatchV2Evaluator: when every per-invocation result is skipped or not evaluated, the evaluator currently divides by zero while computing the overall score.

Root Cause

aggregate_invocation_results() filters out results whose score is None or whose eval_status is NOT_EVALUATED, but it unconditionally computes:

overall_score = num_valid / num_evaluated

If all judge samples fail to produce a usable score, num_evaluated remains 0 and evaluation crashes instead of returning a not-evaluated aggregate result. Other ADK evaluators handle this condition by returning overall_score=None and overall_eval_status=NOT_EVALUATED.

Change

  • Return an EvaluationResult with overall_score=None and overall_eval_status=NOT_EVALUATED when no FinalResponseMatchV2 invocation results are evaluable.
  • Add a focused regression test for all-skipped/all-not-evaluated invocation results.

Validation

uv sync --extra test
uv run pytest tests/unittests/evaluation/test_final_response_match_v2.py

Result: 18 passed, 20 warnings.

Full unit suite was not run; this patch is limited to FinalResponseMatchV2 aggregation and its targeted unit test file.

@pragnyanramtha pragnyanramtha marked this pull request as ready for review May 17, 2026 00:15
@rohityan rohityan self-assigned this May 18, 2026
@rohityan rohityan added the v2 Affects only 2.0 version label May 19, 2026
@pragnyanramtha
Copy link
Copy Markdown
Author

Refreshed this branch with current main in f61da061.

Validation rerun:

  • uv run --extra test pytest tests/unittests/evaluation/test_final_response_match_v2.py -q (18 passed)
  • uv run --extra dev pyink --check src/google/adk/evaluation/final_response_match_v2.py tests/unittests/evaluation/test_final_response_match_v2.py
  • git diff --check

@pragnyanramtha
Copy link
Copy Markdown
Author

Refreshed this branch with current main in f7a83e9b.

Validation rerun:

  • uv run --extra test pytest tests/unittests/evaluation/test_final_response_match_v2.py -q (18 passed, 20 experimental warnings)
  • uv run --extra dev pyink --check src/google/adk/evaluation/final_response_match_v2.py tests/unittests/evaluation/test_final_response_match_v2.py
  • uv run --extra dev isort --check-only src/google/adk/evaluation/final_response_match_v2.py tests/unittests/evaluation/test_final_response_match_v2.py
  • git diff --check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v2 Affects only 2.0 version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants