Parse Gemini 2.5 native object-detection format in vlm_as_detector#2400
Open
dkosowski87 wants to merge 2 commits into
Open
Parse Gemini 2.5 native object-detection format in vlm_as_detector#2400dkosowski87 wants to merge 2 commits into
dkosowski87 wants to merge 2 commits into
Conversation
… with VLMAsDetectorBlock This commit introduces a new module for parsing Gemini object detection responses, including functions for extracting detection entries, parsing coordinates, and scaling confidence values. The `parse_gemini_object_detection_response` function is integrated into both VLMAsDetectorBlockV1 and VLMAsDetectorBlockV2, replacing the previous inline implementation. Additionally, unit tests are added to validate the new functionality, ensuring correct handling of Gemini's native box format in detection outputs.
grzegorz-roboflow
previously approved these changes
Jun 2, 2026
…ction_response in v1 and v2 files This commit reinstates the import of the `parse_gemini_object_detection_response` function in both v1 and v2 formatter files for VLMAsDetectorBlock. The function was previously removed but is now necessary for proper functionality in the detection parsing workflow.
grzegorz-roboflow
approved these changes
Jun 2, 2026
PawelPeczek-Roboflow
approved these changes
Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
After migrating the default Gemini model from
gemini-2.0-flashtogemini-2.5-flash(#2395), hosted E2E tests showed that Gemini object-detection workflows still completed but returnedparsed_prediction: null. Gemini 2.5 often responds with its native bounding-box JSON (box_2d+label, 0–1000 scale, top-level array) instead of the legacy format thevlm_as_detectorparser expected ({"detections": [{"x_min", "class_name", ...}]}with 0–1 coordinates).This PR adds shared Gemini object-detection parsing that auto-detects both output shapes and converts them into standard
sv.Detections. The parser accepts a JSON root that is either a dict with adetectionskey or a top-level list, mapslabel/class/class_nameto class IDs, and convertsbox_2dcoordinates to pixel-spacexyxyboxes. Bothvlm_as_detector@v1and@v2routegoogle-geminiobject-detection output through this shared logic; OpenAI and Claude continue to use the existing legacy parser unchanged.Main elements:
gemini_detection_parsing.pyshared module for Gemini detection response normalizationvlm_as_detector@v1and@v2JSON parsing to accept list or dict rootsbox_2doutput intest_v1.pyandtest_v2.pyType of Change
Testing
Unit tests
x_min/class_nameformat) — existing coverage unchangedbox_2d/label, top-level array) — new tests intest_v1.pyandtest_v2.pyerror_status: Trueandpredictions: NoneIntegration tests
tests/inference/hosted_platform_tests/workflows_examples/test_workflow_with_gemini.py::test_object_detection_workflowonce deployed.Other
.venv/bin/python -m pytest tests/workflows/unit_tests/core_steps/formatters/vlm_as_detector/ -q # 30 passedChecklist
Additional Context
test_object_detection_workflowfailed withTypeError: 'NoneType' object is not subscriptablebecausevlm_as_detectorreturnedpredictions: Nonewhile the Gemini step succeeded.google_geminiblocks — they still return raw model text; parsing stays in the formatter.response_schematogoogle_geminiobject-detection prompts to enforce a single JSON shape at the API level, reducing reliance on format auto-detection.