automatic STT keyterm detection#6039
Conversation
Show user-defined terms to the detection LLM as applied so it stops re-proposing them every pass; add misrecognition rules to the default prompt (sound-alike variants of tracked terms, user-only garbled phrases, interrupted-line fragments) and route corrections through the normal confirmation gate.
User-defined keyterms were silently dropped unless detection.enabled was set: start() returned before binding the STT, so the initial push and later set_user_keyterms() were no-ops. Bind the STT unconditionally (skipping the push when there are no terms, so sessions without keyterms see no capability warning or reconnect) and gate only the detection setup on enabled.
| keyterm_options={ | ||
| "terms": ["LiveKit"], | ||
| "detection": {"enabled": True, "turn_interval": 1}, | ||
| }, |
There was a problem hiding this comment.
For conversations where some context/user information is available before the call (e.g from a patient/customer profile loaded when starting), should we allow extracting keyterms from such context first?
There was a problem hiding this comment.
perhaps keep that on the developer side, they can pass the context like address, user name via keyterm_options={"terms": [...]} once the profile loads.
| "enabled": False, | ||
| "llm": None, | ||
| "turn_interval": 1, | ||
| "max_keyterms": None, |
There was a problem hiding this comment.
some of vendors impose a max limit, maybe we should check this in the extract-for-model function.
| def update_keyterms(self, keyterms: list[str]) -> None: | ||
| # Google biases recognition via (phrase, boost) pairs; apply a moderate | ||
| # default boost since the common keyterms API carries no per-term weight. | ||
| self.update_options(keywords=[(term, _DEFAULT_KEYTERM_BOOST) for term in keyterms]) |
There was a problem hiding this comment.
📝 Info: Google STT _update_keyterms merges user keywords with auto-detected keyterms using a default boost
At livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py:556-570, the Google STT's _update_keyterms merges the provider-agnostic keyterms (which have no per-term weight) with the user's manually-tuned keywords list (which have explicit boosts). A default boost of 10.0 (_DEFAULT_KEYTERM_BOOST at line 74) is applied to auto-detected terms. This is a reasonable heuristic but may need tuning — Google accepts boosts roughly in the 0–20 range, and 10.0 is moderate. Users who need different boost values should use the Google-specific keywords parameter directly.
Was this helpful? React with 👍 or 👎 to provide feedback.
| def _update_keyterms(self, keyterms: list[str]) -> None: | ||
| # Google biases recognition via (phrase, boost) pairs; apply a moderate | ||
| # default boost since the common keyterms API carries no per-term weight. | ||
| self.update_options(keywords=[(term, _DEFAULT_KEYTERM_BOOST) for term in keyterms]) |
There was a problem hiding this comment.
🚩 Google STT claims keyterms=True even when adaptation would shadow keywords
The Google STT plugin now unconditionally sets keyterms=True in its capabilities (livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py:234). However, _update_keyterms uses update_options(keywords=...) which is shadowed by an existing adaptation config (stt.py:106-109 — build_adaptation returns adaptation first, ignoring keywords). If a user configures both adaptation and enables keyterm detection, detected terms are stored but never reach the recognizer. The existing warning at stt.py:512-515 covers this partially, but the keyterms capability claim might mislead the keyterm detector into running LLM passes whose results are silently discarded.
Was this helpful? React with 👍 or 👎 to provide feedback.
| endpoint_url=endpoint_url, | ||
| ) | ||
|
|
||
| def _update_keyterms(self, keyterms: list[str]) -> None: |
There was a problem hiding this comment.
Only the flux models can support mid-stream keyterm update: https://developers.deepgram.com/docs/keyterm#dynamic-keyterm-updates-flux-only should we disable this for nova models?
Adds automatic keyterm detection to
AgentSession, biasing the STT toward the correct spelling of distinctive words (names, companies, products, jargon) as they come up in the conversation.Overview
keyterm_optionsonAgentSession: user-definedtermsplus adetectionconfig (enabled,llm,turn_interval,max_keyterms,instructions).KeytermDetectorruns a background LLM pass per user turn over the recent transcript and maintains the keyterm set with a confirmation gate: a new term starts aspendingand only biases the STT once later transcript evidence confirms it;removeonly applies to spellings the user explicitly corrected, and the replacement goes through the same confirmation flow.STT.update_keyterms()with akeytermscapability flag, implemented for deepgram (v1/v2), assemblyai, google, and livekit inference STT; the fallback and stream adapters forward it.