fix(plugins-google): thought_signature dropped for version-less Gemini model aliases#6011
fix(plugins-google): thought_signature dropped for version-less Gemini model aliases#6011ngoanpv wants to merge 5 commits into
Conversation
…ponse-driven store)
…ught_signature handling
…n model helpers (thinking_config + flash defaults)
|
The latest commit (
Expanded the parametrized cases in |
There was a problem hiding this comment.
🚩 _thought_signatures dict grows unboundedly across chat sessions
The _thought_signatures dict at llm.py:249 lives on the LLM instance and accumulates entries from every _parse_part call across all LLMStream instances. There is no eviction mechanism. For long-running agents with many multi-turn function-calling interactions, this dict could grow unboundedly. This is a pre-existing issue (not introduced by this PR) but worth noting since the PR's removal of the model-name guard technically makes it possible for any model that emits thought_signatures to contribute entries (in practice, only the same set of models as before).
(Refers to line 249)
Was this helpful? React with 👍 or 👎 to provide feedback.
…tup timeout; unrelated to this change)
Problem
Multi-turn function calling fails with
400 INVALID_ARGUMENT: "Function call is missing a thought_signature in functionCall parts"when the LLM model is a version-less Gemini alias such asgemini-flash-latestorgemini-flash-lite-latest(these resolve to Gemini 3 server-side). It works on the first turn and breaks on the first follow-up that echoes a function call back.Root cause
_requires_thought_signatures(model)gates BOTH storing (_parse_part) and resending (_run) ofthought_signature, but only matches literalgemini-2.5*/gemini-3*strings. Version-less aliases returnFalse, so the signature is never stored or resent — and Gemini 3 requires it echoed back.Fix
Make signature handling response-driven instead of model-name-driven:
thought_signaturewhenever the API returns one (_parse_part);_run).Correct for every signature-emitting model + alias; a no-op for models that never emit them (nothing stored → nothing resent). Signature handling at runtime no longer depends on the
_requires_thought_signaturesmodel-name heuristic. The helper itself is retained (its alias handling corrected sogemini-flash-latest/gemini-flash-lite-latestalso returnTrue) because the existing unit tests intests/test_google_thought_signatures.pyimport and assert on it.Model coverage note
The current latest flash models are
gemini-3.5-flashandgemini-3.1-flash-lite. Both already match thegemini-3detection, so — unlike the version-less-latestaliases — they were already storing and resending signatures before this change; the response-driven fix keeps them correct. The fix also covers the-latestaliases automatically (they no longer depend on the name heuristic at runtime), while the retained helper additionally recognizes them so the unit tests stay accurate. The dropped-signature 400 was specific to the version-less-latestaliases.Test
Verified live against the real Gemini API, driving the plugin's
llm.LLMthrough a 2-turn function-calling exchange with a singlesave_answer(answer: str)tool (turn 1 forces the tool call; turn 2 echoes theFunctionCall+FunctionCallOutputback, reusing the harvestedcall_id).gemini-flash-lite-latest(the version-less alias bug)Before (unpatched): turn 1 stores nothing (
LLM._thought_signaturesis empty), turn 2 raises:After (patched): turn 1 stores the signature (
LLM._thought_signatureskeys =['<call_id>']), turn 2 succeeds with a normal assistant response — no 400. Same script, same model, only the plugin changed.gemini-3.5-flash(latest flash)Also verified live for
gemini-3.5-flash. Because this model matches thegemini-3detection in both the original and patched code, the runtime gate isTrueeither way, so it never hit the dropped-signature 400. With the patched plugin, multi-turn function calling succeeds: turn 1 stores the signature (LLM._thought_signatureskeys =['<call_id>']), turn 2 returns a normal assistant response — no 400. The response-driven fix keepsgemini-3.5-flash(andgemini-3.1-flash-lite) correct.Unit tests
The existing unit cases in
tests/test_google_thought_signatures.pystill pass unchanged. Added_requires_thought_signatures→Truecases for the current latest modelsgemini-3.5-flashandgemini-3.1-flash-lite, plus the version-less aliasesgemini-flash-latestandgemini-flash-lite-latest(38 passed).Follow-up fix — model-detection gaps in
thinking_configand flash defaultsThe same fragile string matching affected two sibling helpers in
llm.py, surfaced during review:_is_gemini_3_model/_is_gemini_3_flash_modelonly matched literalgemini-3*/gemini-3-flash*. They returnedFalsefor the version-less aliasesgemini-flash-latest/gemini-flash-lite-latest(which resolve to Gemini 3.x flash server-side) and mis-handledgemini-3.5-flash.thinking_configblock called_is_gemini_3_model(model); for the aliases it took the "Gemini 2.5 and earlier" branch and raisedValueError("does not support thinking_level"), even though those aliases supportthinking_level._is_gemini_3_flash_model("gemini-3.5-flash")returnedFalse, so 3.5-flash missed the"minimal"flash thinking default and fell back to"low".Fix
A shared
_GEMINI_3_FLASH_ALIASESconstant plus alias/3.x-aware helpers:_is_gemini_3_modelnow matches anygemini-3substring (coversgemini-3.5-flash,gemini-3.1-flash-lite) and the version-less aliases._is_gemini_3_flash_modelisTruefor any 3.x flash model and both aliases.thinking_configblock is unchanged — correcting the helpers makes the aliases take the Gemini-3 branch (noValueError) and givesgemini-3.5-flashthe"minimal"flash default.Tests
Expanded the parametrized cases in
tests/test_google_thought_signatures.pyfor all three helpers to cover the aliases,gemini-3.5-flash, andgemini-3.1-flash-lite(e.g._is_gemini_3_model("gemini-flash-latest") → True,_is_gemini_3_flash_model("gemini-3.5-flash") → True, plusgemini-3-pro-preview → Falseto keep pro models out of the flash path).