Skip to content

[quantization] Support Vision Encoder wrapper for Gemma4#796

Merged
mhs4670go merged 1 commit into
Samsung:mainfrom
Torrero:gemma4_support_wrapper_visionencoder
Jun 29, 2026
Merged

[quantization] Support Vision Encoder wrapper for Gemma4#796
mhs4670go merged 1 commit into
Samsung:mainfrom
Torrero:gemma4_support_wrapper_visionencoder

Conversation

@Torrero

@Torrero Torrero commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Introduce a PTQ wrapper (QuantGemma4VisionEncoder) for the Hugging Face Gemma4VisionEncoder module, enabling post-training quantization of the Gemma4 vision tower with both dynamic evaluation and static torch.export paths.

Changes

tico/quantization/wrapq/wrappers/gemma4/quant_vision_encoder.py

  • QuantGemma4VisionEncoder registered against Gemma4VisionEncoder via @try_register
  • Dynamic forward path (forward): computes RoPE position embeddings via precomputed lookup tables (cos_table/sin_table) and bidirectional attention masks from pixel_position_ids — used during calibration and accuracy evaluation.
  • Static export path (forward_export): reads precomputed position_embeddings and attention_mask from registered buffers, avoiding any dynamic shape-dependent computation — safe for torch.export tracing
  • Precomputes RoPE sin/cos lookup tables for all position IDs at __init__, replacing the dynamic matmul+cos/sin with a simple gather
  • Observers on input activations, attention mask, position embeddings (cos/sin), and encoder output

Smoke Tests

Command:

 TICO_LOG=4  python -m tico.quantization.examples.inspect   \
--config tico/quantization/examples/configs/wrapper_smoke.yaml     \
--mode wrapper-smoke     \
--case gemma4_vision_encoder     \
--export circle     \
--output-dir ./out/wrapper_smoke
[QuantCheck] WARNING: 28 nodes without qparam detected (see logs).
┌───────────── Wrapper Smoke Summary ─────────────
│ Case             : gemma4_vision_encoder
│ Status           : PASS
│ Mean |diff|      : 0.079886
│ Max |diff|       : 0.503346
│ PEIR             : 0.056661
│ Shape match      : True
│ Quant finite     : True
└─────────────────────────────────────────────────
Artifacts:
  - circle: out/wrapper_smoke/gemma4_vision_encoder.q.circle
    ┌────────────────────────────────────────────┐
 5.2┤                                            │
    │                                      •••   │
 3.5┤                                  •••       │
    │                               •••••        │
 1.8┤                          ••••••            │
    │                       ••••••               │
 0.1┤                   ••••••                   │
    │                ••••••                      │
-1.5┤            •••••                           │
    │         •••• •                             │
-3.2┤      ••••                                  │
    │  • ••                                      │
-4.9┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -4.9       -2.4        0.1       2.7       5.2 

Nodes without qparam (logs)

Vision Encoder wrapper related nodes:

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] quantized nodes : 97
DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] fp nodes        : 28
DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:

 trace  :   
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 250, in _apply_multidimensional_rope
    tensor_parts = torch.split(tensor, split_sizes, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : getitem_1
  target : getitem
  users  : 2
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 250, in _apply_multidimensional_rope
    tensor_parts = torch.split(tensor, split_sizes, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : getitem_2
  target : getitem
  users  : 1
  trace  :   
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 251, in _apply_multidimensional_rope
    cos_parts = torch.split(cos, split_sizes, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : getitem_3
  target : getitem
  users  : 1
  trace  :  
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 251, in _apply_multidimensional_rope
    cos_parts = torch.split(cos, split_sizes, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : getitem_4
  target : getitem
  users  : 1
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 252, in _apply_multidimensional_rope
    sin_parts = torch.split(sin, split_sizes, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : getitem_5
  target : getitem
  users  : 1
  trace  :
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 252, in _apply_multidimensional_rope
    sin_parts = torch.split(sin, split_sizes, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : cat_2
  target : cat.default
  users  : 1
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 269, in _apply_multidimensional_rope
    return torch.cat(rotated_parts, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : permute
  target : permute.default
  users  : 2
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 499, in forward
    query_states = query_states.transpose(1, 2)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : getitem_10
  target : getitem
  users  : 2
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 250, in _apply_multidimensional_rope
    tensor_parts = torch.split(tensor, split_sizes, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : getitem_11
  target : getitem
  users  : 2
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 250, in _apply_multidimensional_rope
    tensor_parts = torch.split(tensor, split_sizes, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : getitem_12
  target : getitem
  users  : 1
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 251, in _apply_multidimensional_rope
    cos_parts = torch.split(cos, split_sizes, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : getitem_13
  target : getitem
  users  : 1
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 251, in _apply_multidimensional_rope
    cos_parts = torch.split(cos, split_sizes, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : getitem_14
  target : getitem
  users  : 1
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 252, in _apply_multidimensional_rope
    sin_parts = torch.split(sin, split_sizes, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : getitem_15
  target : getitem
  users  : 1
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 252, in _apply_multidimensional_rope
    sin_parts = torch.split(sin, split_sizes, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : cat_5
  target : cat.default
  users  : 1
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 269, in _apply_multidimensional_rope
    return torch.cat(rotated_parts, dim=-1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : permute_1
  target : permute.default
  users  : 2
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 516, in forward
    key_states = key_states.transpose(1, 2)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : slice_1
  target : slice.Tensor
  users  : 1
  trace  :
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 400, in _attention_forward
    key_i = key_states[:, kv_idx : kv_idx + 1, :, :]

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : slice_3
  target : slice.Tensor
  users  : 1
  trace  :  
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 404, in _attention_forward
    query_i = query_states[:, head_start:head_end, :, :]

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : permute_3
  target : permute.default
  users  : 1
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 408, in _attention_forward
    logits_i = query_i @ key_i.transpose(-2, -1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : reshape_default_7
  target : reshape.default
  users  : 1
  trace  :
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 408, in _attention_forward
    logits_i = query_i @ key_i.transpose(-2, -1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : expand_1
  target : expand.default
  users  : 1
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 408, in _attention_forward
    logits_i = query_i @ key_i.transpose(-2, -1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : reshape_default_8
  target : reshape.default
  users  : 1
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 408, in _attention_forward
    logits_i = query_i @ key_i.transpose(-2, -1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : slice_4
  target : slice.Tensor
  users  : 1
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 400, in _attention_forward
    key_i = key_states[:, kv_idx : kv_idx + 1, :, :]

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : slice_6
  target : slice.Tensor
  users  : 1
  trace  :
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 404, in _attention_forward
    query_i = query_states[:, head_start:head_end, :, :]

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : permute_4
  target : permute.default
  users  : 1
  trace  :
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 408, in _attention_forward
    logits_i = query_i @ key_i.transpose(-2, -1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : reshape_default_13
  target : reshape.default
  users  : 1
  trace  : 
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 408, in _attention_forward
    logits_i = query_i @ key_i.transpose(-2, -1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : expand_5
  target : expand.default
  users  : 1
  trace  :
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 408, in _attention_forward
    logits_i = query_i @ key_i.transpose(-2, -1)

DEBUG:tico.quantization.wrapq.utils.check_missing_qparam:[QuantCheck] Missing qparam:
  name   : reshape_default_14
  target : reshape.default
  users  : 1
  trace  :
  File "tico/quantization/wrapq/wrappers/gemma4/quant_vision_attention.py", line 408, in _attention_forward
    logits_i = query_i @ key_i.transpose(-2, -1)

UnitTests

Command:

python -m pytest test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py -x -v 
27 tests passed. Output
============================================================================================================= test session starts =============================================================================================================
platform linux -- Python 3.10.12, pytest-9.1.1, pluggy-1.6.0 -- /home/emaltsev/SAMSUNG/llm_quantization/.gemma_venv/bin/python
cachedir: .pytest_cache
rootdir: /home/emaltsev/SAMSUNG/llm_quantization/TICO_my/TICO
configfile: pyproject.toml
plugins: anyio-4.13.0
collected 27 items                                                                                                                                                                                                                            

test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_as_export_module_precomputes_buffers_on_wrapper PASSED                                                                         [  3%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_as_export_module_rejects_mismatched_pixel_position_ids PASSED                                                                  [  7%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_as_export_module_requires_quant_mode PASSED                                                                                    [ 11%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_as_export_module_returns_adapter PASSED                                                                                        [ 14%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_as_export_module_with_padding_produces_valid_output PASSED                                                                     [ 18%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_as_export_module_without_pixel_position_ids_uses_templates PASSED                                                              [ 22%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_attention_mask_is_fake_quantized_in_quant_mode PASSED                                                                          [ 25%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_attention_mask_is_observed_in_calib_mode PASSED                                                                                [ 29%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_dtype_override PASSED                                                                                                          [ 33%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_dynamic_forward_with_padding_produces_valid_output PASSED                                                                      [ 37%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_forward_export_matches_forward PASSED                                                                                          [ 40%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_forward_export_requires_precomputed_buffers PASSED                                                                             [ 44%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_gather_position_embeddings_matches_hf_rotary PASSED                                                                            [ 48%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_gather_position_embeddings_with_padding PASSED                                                                                 [ 51%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_input_is_observed_in_calib_mode PASSED                                                                                         [ 55%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_make_bidirectional_mask_fill_value PASSED                                                                                      [ 59%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_make_bidirectional_mask_no_padding PASSED                                                                                      [ 62%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_make_bidirectional_mask_with_padding PASSED                                                                                    [ 66%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_mode_transitions PASSED                                                                                                        [ 70%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_no_quant_forward_matches_fp PASSED                                                                                             [ 74%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_no_quant_output_shape PASSED                                                                                                   [ 77%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_observers_are_collected PASSED                                                                                                 [ 81%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_output_is_fake_quantized_in_quant_mode PASSED                                                                                  [ 85%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_position_cos_sin_are_observed_in_calib_mode PASSED                                                                             [ 88%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_position_embeddings_are_fake_quantized_in_quant_mode PASSED                                                                    [ 92%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_quant_mode_output_is_finite PASSED                                                                                             [ 96%]
test/quantization/wrapq/wrappers/gemma4/test_quant_vision_encoder.py::TestQuantGemma4VisionEncoder::test_unsupported_export_mode_raises PASSED                                                                                          [100%]

============================================================================================================= 27 passed in 5.53s ==============================================================================================================

TICO-DCO-1.0-Signed-off-by: Evgenii Maltsev e.maltsev@samsung.com

@Torrero Torrero force-pushed the gemma4_support_wrapper_visionencoder branch 3 times, most recently from f770fb2 to 5a4ec51 Compare June 24, 2026 17:50
@Torrero Torrero requested review from dvsav and mhs4670go June 24, 2026 17:56
@Torrero Torrero marked this pull request as draft June 25, 2026 15:33
@Torrero Torrero force-pushed the gemma4_support_wrapper_visionencoder branch 2 times, most recently from 12e6510 to c9dcea8 Compare June 25, 2026 17:58
@Torrero Torrero marked this pull request as ready for review June 25, 2026 17:59
@Torrero Torrero force-pushed the gemma4_support_wrapper_visionencoder branch from c9dcea8 to ab4b0f7 Compare June 26, 2026 09:02
@dvsav

dvsav commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Could you please clarify the cause of [QuantCheck] WARNING: 45 nodes without qparam detected (see logs). in the smoke test?
The exact culprit can be traced down by adding debug prints to tico/quantization/wrapq/utils/check_missing_qparam.py (print node.stack_trace of nodes lacking qparam).

@dvsav

dvsav commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Would you consider adding test/quantization/wrapq/wrappers/gemma4/test_quantize_vision_encoder.py?

This commit supports Vision Encoder wrapper for Gemma4

Co-authored-by: Cline

TICO-DCO-1.0-Signed-off-by:  Evgenii Maltsev <e.maltsev@samsung.com>
@Torrero Torrero force-pushed the gemma4_support_wrapper_visionencoder branch from ab4b0f7 to cc99ea6 Compare June 29, 2026 10:49
@Torrero

Torrero commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Could you please clarify the cause of [QuantCheck] WARNING: 45 nodes without qparam detected (see logs). in the smoke test? The exact culprit can be traced down by adding debug prints to tico/quantization/wrapq/utils/check_missing_qparam.py (print node.stack_trace of nodes lacking qparam).

Thank you, I updated PR message, added Nodes without qparam (logs)

@Torrero

Torrero commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Would you consider adding test/quantization/wrapq/wrappers/gemma4/test_quantize_vision_encoder.py?

Added. Thank you.

@dvsav PTAL

@dvsav dvsav left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mhs4670go mhs4670go left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mhs4670go mhs4670go merged commit 7156c9a into Samsung:main Jun 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants