Parse Gemini 2.5 native object-detection format in vlm_as_detector by dkosowski87 · Pull Request #2400 · roboflow/inference

dkosowski87 · 2026-06-02T16:49:59Z

What does this PR do?

After migrating the default Gemini model from gemini-2.0-flash to gemini-2.5-flash (#2395), hosted E2E tests showed that Gemini object-detection workflows still completed but returned parsed_prediction: null. Gemini 2.5 often responds with its native bounding-box JSON (box_2d + label, 0–1000 scale, top-level array) instead of the legacy format the vlm_as_detector parser expected ({"detections": [{"x_min", "class_name", ...}]} with 0–1 coordinates).

This PR adds shared Gemini object-detection parsing that auto-detects both output shapes and converts them into standard sv.Detections. The parser accepts a JSON root that is either a dict with a detections key or a top-level list, maps label / class / class_name to class IDs, and converts box_2d coordinates to pixel-space xyxy boxes. Both vlm_as_detector@v1 and @v2 route google-gemini object-detection output through this shared logic; OpenAI and Claude continue to use the existing legacy parser unchanged.

Main elements:

New gemini_detection_parsing.py shared module for Gemini detection response normalization
Updated vlm_as_detector@v1 and @v2 JSON parsing to accept list or dict roots
Unit tests for Gemini 2.5 native box_2d output in test_v1.py and test_v2.py

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Other:

Testing

Unit tests

Legacy Gemini/Claude detection JSON (x_min / class_name format) — existing coverage unchanged
Gemini 2.5 native detection JSON (box_2d / label, top-level array) — new tests in test_v1.py and test_v2.py
Invalid/malformed JSON still returns error_status: True and predictions: None

Integration tests

None for this PR. Fixes the hosted-platform scenario exercised by tests/inference/hosted_platform_tests/workflows_examples/test_workflow_with_gemini.py::test_object_detection_workflow once deployed.

Other

.venv/bin/python -m pytest tests/workflows/unit_tests/core_steps/formatters/vlm_as_detector/ -q
# 30 passed

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code where necessary, particularly in hard-to-understand areas
My changes generate no new warnings or errors
I have updated the documentation accordingly (if applicable)

Additional Context

Symptom before fix: test_object_detection_workflow failed with TypeError: 'NoneType' object is not subscriptable because vlm_as_detector returned predictions: None while the Gemini step succeeded.
Related change: Complements refactor(gemini): remove deprecated model versions and update default to 2.5-flash #2395 (Gemini 2.0 deprecation / default bump to 2.5-flash). No changes to google_gemini blocks — they still return raw model text; parsing stays in the formatter.
Follow-up (optional): Consider adding response_schema to google_gemini object-detection prompts to enforce a single JSON shape at the API level, reducing reliance on format auto-detection.

… with VLMAsDetectorBlock This commit introduces a new module for parsing Gemini object detection responses, including functions for extracting detection entries, parsing coordinates, and scaling confidence values. The `parse_gemini_object_detection_response` function is integrated into both VLMAsDetectorBlockV1 and VLMAsDetectorBlockV2, replacing the previous inline implementation. Additionally, unit tests are added to validate the new functionality, ensuring correct handling of Gemini's native box format in detection outputs.

…ction_response in v1 and v2 files This commit reinstates the import of the `parse_gemini_object_detection_response` function in both v1 and v2 formatter files for VLMAsDetectorBlock. The function was previously removed but is now necessary for proper functionality in the detection parsing workflow.

dkosowski87 requested review from PawelPeczek-Roboflow, grzegorz-roboflow, hansent, probicheaux, rafel-roboflow and yeldarby as code owners June 2, 2026 16:50

grzegorz-roboflow previously approved these changes Jun 2, 2026

View reviewed changes

dkosowski87 dismissed grzegorz-roboflow’s stale review via 62e9a3c June 2, 2026 16:54

grzegorz-roboflow approved these changes Jun 2, 2026

View reviewed changes

PawelPeczek-Roboflow approved these changes Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse Gemini 2.5 native object-detection format in vlm_as_detector#2400

Parse Gemini 2.5 native object-detection format in vlm_as_detector#2400
dkosowski87 wants to merge 2 commits into
mainfrom
fix-vlm_as_detector-default-gemini-2.0-flash-handling

dkosowski87 commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dkosowski87 commented Jun 2, 2026

What does this PR do?

Type of Change

Testing

Unit tests

Integration tests

Other

Checklist

Additional Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants