Skip to content

Add Google Gemini 3.5 VLM support#2265

Open
SkalskiP wants to merge 7 commits into
developfrom
add-gemini-3.5-vlm-support
Open

Add Google Gemini 3.5 VLM support#2265
SkalskiP wants to merge 7 commits into
developfrom
add-gemini-3.5-vlm-support

Conversation

@SkalskiP
Copy link
Copy Markdown
Collaborator

Summary

  • Adds GOOGLE_GEMINI_3_5 to the VLM (and deprecated LMM) enum, reusing
    the existing Gemini 2.5 response parser since the output format is identical.
  • Registers the new model in all lookup dicts (RESULT_TYPES,
    REQUIRED_ARGUMENTS, ALLOWED_ARGUMENTS) and the from_vlm / from_lmm
    dispatch logic.
  • Adds parametrized tests verifying VLM.GOOGLE_GEMINI_3_5 produces the same
    detections as VLM.GOOGLE_GEMINI_2_5 for identical inputs.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78%. Comparing base (cb25906) to head (8ad8602).

Additional details and impacted files
@@           Coverage Diff           @@
##           develop   #2265   +/-   ##
=======================================
  Coverage       78%     78%           
=======================================
  Files           66      66           
  Lines         8406    8408    +2     
=======================================
+ Hits          6524    6534   +10     
+ Misses        1882    1874    -8     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@SkalskiP SkalskiP force-pushed the add-gemini-3.5-vlm-support branch from f450b17 to e99ca54 Compare May 21, 2026 09:39
@Borda Borda requested a review from Copilot May 22, 2026 19:26
Comment thread tests/detection/test_vlm.py
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Comment thread tests/detection/test_vlm.py Outdated
@Borda Borda added the enhancement New feature or request label May 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for Google Gemini 3.5 as an additional Vision-Language Model option in Supervision’s Detections.from_vlm / deprecated from_lmm pathways by treating it as format-compatible with the existing Gemini 2.5 parser.

Changes:

  • Extend VLM (and deprecated LMM) enums and VLM validation lookup tables to include GOOGLE_GEMINI_3_5.
  • Update Detections.from_vlm / from_lmm dispatch to route Gemini 3.5 through the existing Gemini 2.5 parsing logic.
  • Add a parametrized regression test asserting Gemini 3.5 matches Gemini 2.5 outputs for identical inputs.

Review notes (scores):

  • Code quality: 4/5
  • Testing: 3/5
  • Documentation: 3/5

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/supervision/detection/vlm.py Adds GOOGLE_GEMINI_3_5 to enums and registers it in VLM parameter/result validation tables.
src/supervision/detection/core.py Extends from_lmm mapping and from_vlm dispatch so Gemini 3.5 uses the Gemini 2.5 parser path.
tests/detection/test_vlm.py Adds parity test intended to ensure Gemini 3.5 produces the same detections as Gemini 2.5.

Comment on lines 29 to 37
Attributes:
PALIGEMMA: Google's PaliGemma vision-language model.
FLORENCE_2: Microsoft's Florence-2 vision-language model.
QWEN_2_5_VL: Qwen2.5-VL open vision-language model from Alibaba.\
QWEN_3_VL: Qwen3-VL open vision-language model from Alibaba.
GOOGLE_GEMINI_2_0: Google Gemini 2.0 vision-language model.
GOOGLE_GEMINI_2_5: Google Gemini 2.5 vision-language model.
GOOGLE_GEMINI_3_5: Google Gemini 3.5 vision-language model.
MOONDREAM: The Moondream vision-language model.
Comment on lines 78 to 87
Attributes:
PALIGEMMA: Google's PaliGemma vision-language model.
FLORENCE_2: Microsoft's Florence-2 vision-language model.
QWEN_2_5_VL: Qwen2.5-VL open vision-language model from Alibaba.
QWEN_3_VL: Qwen3-VL open vision-language model from Alibaba.
GOOGLE_GEMINI_2_0: Google Gemini 2.0 vision-language model.
GOOGLE_GEMINI_2_5: Google Gemini 2.5 vision-language model.
GOOGLE_GEMINI_3_5: Google Gemini 3.5 vision-language model.
MOONDREAM: The Moondream vision-language model.
"""
Comment on lines +1336 to +1343
resolution_wh=resolution_wh,
classes=classes,
)
detections_3_5 = Detections.from_vlm(
vlm=VLM.GOOGLE_GEMINI_3_5,
result=result,
resolution_wh=resolution_wh,
classes=classes,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants