Skip to content

fix(plugins-google): thought_signature dropped for version-less Gemini model aliases#6011

Open
ngoanpv wants to merge 5 commits into
livekit:mainfrom
ngoanpv:fix/google-thought-signature-version-less-aliases
Open

fix(plugins-google): thought_signature dropped for version-less Gemini model aliases#6011
ngoanpv wants to merge 5 commits into
livekit:mainfrom
ngoanpv:fix/google-thought-signature-version-less-aliases

Conversation

@ngoanpv

@ngoanpv ngoanpv commented Jun 8, 2026

Copy link
Copy Markdown

Problem

Multi-turn function calling fails with 400 INVALID_ARGUMENT: "Function call is missing a thought_signature in functionCall parts" when the LLM model is a version-less Gemini alias such as gemini-flash-latest or gemini-flash-lite-latest (these resolve to Gemini 3 server-side). It works on the first turn and breaks on the first follow-up that echoes a function call back.

Root cause

_requires_thought_signatures(model) gates BOTH storing (_parse_part) and resending (_run) of thought_signature, but only matches literal gemini-2.5* / gemini-3* strings. Version-less aliases return False, so the signature is never stored or resent — and Gemini 3 requires it echoed back.

Fix

Make signature handling response-driven instead of model-name-driven:

  • store a thought_signature whenever the API returns one (_parse_part);
  • resend whatever was stored, whenever present (_run).

Correct for every signature-emitting model + alias; a no-op for models that never emit them (nothing stored → nothing resent). Signature handling at runtime no longer depends on the _requires_thought_signatures model-name heuristic. The helper itself is retained (its alias handling corrected so gemini-flash-latest / gemini-flash-lite-latest also return True) because the existing unit tests in tests/test_google_thought_signatures.py import and assert on it.

Model coverage note

The current latest flash models are gemini-3.5-flash and gemini-3.1-flash-lite. Both already match the gemini-3 detection, so — unlike the version-less -latest aliases — they were already storing and resending signatures before this change; the response-driven fix keeps them correct. The fix also covers the -latest aliases automatically (they no longer depend on the name heuristic at runtime), while the retained helper additionally recognizes them so the unit tests stay accurate. The dropped-signature 400 was specific to the version-less -latest aliases.

Test

Verified live against the real Gemini API, driving the plugin's llm.LLM through a 2-turn function-calling exchange with a single save_answer(answer: str) tool (turn 1 forces the tool call; turn 2 echoes the FunctionCall + FunctionCallOutput back, reusing the harvested call_id).

gemini-flash-lite-latest (the version-less alias bug)

Before (unpatched): turn 1 stores nothing (LLM._thought_signatures is empty), turn 2 raises:

APIStatusError status_code=400 INVALID_ARGUMENT
"Function call is missing a thought_signature in functionCall parts. This is required
for tools to work correctly... function call `default_api:save_answer`, position 2."

After (patched): turn 1 stores the signature (LLM._thought_signatures keys = ['<call_id>']), turn 2 succeeds with a normal assistant response — no 400. Same script, same model, only the plugin changed.

gemini-3.5-flash (latest flash)

Also verified live for gemini-3.5-flash. Because this model matches the gemini-3 detection in both the original and patched code, the runtime gate is True either way, so it never hit the dropped-signature 400. With the patched plugin, multi-turn function calling succeeds: turn 1 stores the signature (LLM._thought_signatures keys = ['<call_id>']), turn 2 returns a normal assistant response — no 400. The response-driven fix keeps gemini-3.5-flash (and gemini-3.1-flash-lite) correct.

Unit tests

The existing unit cases in tests/test_google_thought_signatures.py still pass unchanged. Added _requires_thought_signaturesTrue cases for the current latest models gemini-3.5-flash and gemini-3.1-flash-lite, plus the version-less aliases gemini-flash-latest and gemini-flash-lite-latest (38 passed).


Follow-up fix — model-detection gaps in thinking_config and flash defaults

The same fragile string matching affected two sibling helpers in llm.py, surfaced during review:

  • _is_gemini_3_model / _is_gemini_3_flash_model only matched literal gemini-3* / gemini-3-flash*. They returned False for the version-less aliases gemini-flash-latest / gemini-flash-lite-latest (which resolve to Gemini 3.x flash server-side) and mis-handled gemini-3.5-flash.
  • Consequence 1: the thinking_config block called _is_gemini_3_model(model); for the aliases it took the "Gemini 2.5 and earlier" branch and raised ValueError("does not support thinking_level"), even though those aliases support thinking_level.
  • Consequence 2: _is_gemini_3_flash_model("gemini-3.5-flash") returned False, so 3.5-flash missed the "minimal" flash thinking default and fell back to "low".

Fix

A shared _GEMINI_3_FLASH_ALIASES constant plus alias/3.x-aware helpers:

  • _is_gemini_3_model now matches any gemini-3 substring (covers gemini-3.5-flash, gemini-3.1-flash-lite) and the version-less aliases.
  • _is_gemini_3_flash_model is True for any 3.x flash model and both aliases.
  • The thinking_config block is unchanged — correcting the helpers makes the aliases take the Gemini-3 branch (no ValueError) and gives gemini-3.5-flash the "minimal" flash default.

Tests

Expanded the parametrized cases in tests/test_google_thought_signatures.py for all three helpers to cover the aliases, gemini-3.5-flash, and gemini-3.1-flash-lite (e.g. _is_gemini_3_model("gemini-flash-latest") → True, _is_gemini_3_flash_model("gemini-3.5-flash") → True, plus gemini-3-pro-preview → False to keep pro models out of the flash path).

@CLAassistant

CLAassistant commented Jun 8, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@ngoanpv

ngoanpv commented Jun 8, 2026

Copy link
Copy Markdown
Author

The latest commit (ea985d9) resolves the flagged model-detection findings in llm.py:

  • _is_gemini_3_model / _is_gemini_3_flash_model now recognize the version-less aliases gemini-flash-latest / gemini-flash-lite-latest (via a shared _GEMINI_3_FLASH_ALIASES constant) and the gemini-3.5-flash / gemini-3.1-flash-lite point releases.
  • As a result, the thinking_config block no longer raises ValueError("does not support thinking_level") for those aliases — they now correctly take the Gemini-3 branch — and gemini-3.5-flash gets the "minimal" flash thinking default instead of falling back to "low".

Expanded the parametrized cases in tests/test_google_thought_signatures.py for all three helpers to cover the aliases, gemini-3.5-flash, and gemini-3.1-flash-lite.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 _thought_signatures dict grows unboundedly across chat sessions

The _thought_signatures dict at llm.py:249 lives on the LLM instance and accumulates entries from every _parse_part call across all LLMStream instances. There is no eviction mechanism. For long-running agents with many multi-turn function-calling interactions, this dict could grow unboundedly. This is a pre-existing issue (not introduced by this PR) but worth noting since the PR's removal of the model-name guard technically makes it possible for any model that emits thought_signatures to contribute entries (in practice, only the same set of models as before).

(Refers to line 249)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants