Skip to content

Parse Gemini 2.5 native object-detection format in vlm_as_detector#2400

Open
dkosowski87 wants to merge 2 commits into
mainfrom
fix-vlm_as_detector-default-gemini-2.0-flash-handling
Open

Parse Gemini 2.5 native object-detection format in vlm_as_detector#2400
dkosowski87 wants to merge 2 commits into
mainfrom
fix-vlm_as_detector-default-gemini-2.0-flash-handling

Conversation

@dkosowski87
Copy link
Copy Markdown
Contributor

What does this PR do?

After migrating the default Gemini model from gemini-2.0-flash to gemini-2.5-flash (#2395), hosted E2E tests showed that Gemini object-detection workflows still completed but returned parsed_prediction: null. Gemini 2.5 often responds with its native bounding-box JSON (box_2d + label, 0–1000 scale, top-level array) instead of the legacy format the vlm_as_detector parser expected ({"detections": [{"x_min", "class_name", ...}]} with 0–1 coordinates).

This PR adds shared Gemini object-detection parsing that auto-detects both output shapes and converts them into standard sv.Detections. The parser accepts a JSON root that is either a dict with a detections key or a top-level list, maps label / class / class_name to class IDs, and converts box_2d coordinates to pixel-space xyxy boxes. Both vlm_as_detector@v1 and @v2 route google-gemini object-detection output through this shared logic; OpenAI and Claude continue to use the existing legacy parser unchanged.

Main elements:

  • New gemini_detection_parsing.py shared module for Gemini detection response normalization
  • Updated vlm_as_detector@v1 and @v2 JSON parsing to accept list or dict roots
  • Unit tests for Gemini 2.5 native box_2d output in test_v1.py and test_v2.py

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Other:

Testing

Unit tests

  • Legacy Gemini/Claude detection JSON (x_min / class_name format) — existing coverage unchanged
  • Gemini 2.5 native detection JSON (box_2d / label, top-level array) — new tests in test_v1.py and test_v2.py
  • Invalid/malformed JSON still returns error_status: True and predictions: None

Integration tests

  • None for this PR. Fixes the hosted-platform scenario exercised by tests/inference/hosted_platform_tests/workflows_examples/test_workflow_with_gemini.py::test_object_detection_workflow once deployed.

Other

.venv/bin/python -m pytest tests/workflows/unit_tests/core_steps/formatters/vlm_as_detector/ -q
# 30 passed

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code where necessary, particularly in hard-to-understand areas
  • My changes generate no new warnings or errors
  • I have updated the documentation accordingly (if applicable)

Additional Context

  • Symptom before fix: test_object_detection_workflow failed with TypeError: 'NoneType' object is not subscriptable because vlm_as_detector returned predictions: None while the Gemini step succeeded.
  • Related change: Complements refactor(gemini): remove deprecated model versions and update default to 2.5-flash #2395 (Gemini 2.0 deprecation / default bump to 2.5-flash). No changes to google_gemini blocks — they still return raw model text; parsing stays in the formatter.
  • Follow-up (optional): Consider adding response_schema to google_gemini object-detection prompts to enforce a single JSON shape at the API level, reducing reliance on format auto-detection.

… with VLMAsDetectorBlock

This commit introduces a new module for parsing Gemini object detection responses, including functions for extracting detection entries, parsing coordinates, and scaling confidence values. The `parse_gemini_object_detection_response` function is integrated into both VLMAsDetectorBlockV1 and VLMAsDetectorBlockV2, replacing the previous inline implementation. Additionally, unit tests are added to validate the new functionality, ensuring correct handling of Gemini's native box format in detection outputs.
…ction_response in v1 and v2 files

This commit reinstates the import of the `parse_gemini_object_detection_response` function in both v1 and v2 formatter files for VLMAsDetectorBlock. The function was previously removed but is now necessary for proper functionality in the detection parsing workflow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants