amd · alexey-tyurin · Jun 19, 2026 · Jun 19, 2026 · Jun 19, 2026 · Jun 19, 2026
@@ -9,7 +9,7 @@ title: "Dynamic Tool Loader"
 <Note>
 **Component:** Per-turn tool visibility for agents (issue [#688](https://github.com/amd/gaia/issues/688))
 **Module:** `gaia.agents.base.tool_loader`
-**Status:** **Part 0 (#1448) + Part 1 (#1449) landed.** Part 1 ships the selection mechanism behind a default-off toggle on the ChatAgent `doc` profile. Parts 2–3 (explicit escape hatch, skill signal) are still proposed.
+**Status:** **Part 0 (#1448) + Part 1 (#1449) + Part 2 (#1450) landed.** Part 1 ships the selection mechanism behind a default-off toggle on the ChatAgent `doc` profile; Part 2 adds the explicit `load_tools` escape hatch (so native tool-calling models can recover a semantic miss) plus the escape-hatch activation-rate tuning signal. Part 3 (skill signal) is still proposed.
 **Target agent (v1):** `ChatAgent` (`doc` profile), behind a default-off toggle.
 </Note>
 
@@ -297,11 +297,14 @@ backend KV prefix stays warm. When a filter is active the tools block moves
 **after** the response-format template (volatile content last); with no filter
 the legacy order and bytes are preserved exactly.
 
-**Native known gap (Amendment 2).** `_execute_tool` is never tightened, so a
-non-tool-calling model that names an unlisted tool still runs it (free recovery)
-and the loader logs `TOOL_LOADER_ESCAPE_HATCH`. Native tool-calling models have
-no such hatch until Part 2's `load_tools`; on first activation the agent logs the
-miss as a *known gap* rather than padding the loaded set.
+**Native known gap (Amendment 2) — closed by Part 2.** `_execute_tool` is never
+tightened, so a non-tool-calling model that names an unlisted tool still runs it
+(free recovery) and the loader logs `TOOL_LOADER_ESCAPE_HATCH`. In Part 1 native
+tool-calling models had no such hatch — a semantic miss could not self-recover.
+[Part 2](#part-2-explicit-escape-hatch--tuning-1450) closes the recovery gap with
+the always-on `load_tools` meta-tool (the model loads the bundle it needs and
+calls the tool on its next step), and the recall gate's native exemption is
+removed accordingly.
 
 **Approved deviations from this sketch** (flagged in the #1449 PR):
 
@@ -323,7 +326,7 @@ baseline — meaning **CORE-only is the ~60%-reduction best case** and a full
 `test_tool_loader_token_budget.py` pins these filtered costs as a static guard.
 </Warning>
 
-### Part 2 — Explicit escape hatch + tuning
+### Part 2 — Explicit escape hatch + tuning ✅ landed (#1450)
 
 - Add bundle re-surfacing + a discoverability menu of bundle names, and the
   `load_tools` meta-tool that native tool-calling models need (the free recovery
@@ -340,6 +343,50 @@ baseline — meaning **CORE-only is the ~60%-reduction best case** and a full
 - **Escape-hatch activation rate** is logged per session and usable as the
   threshold-tuning signal (rising rate ⇒ τ too strict).
 
+#### How Part 2 shipped (implementation reference)
+
+**`load_tools` is always-on via CORE.** `load_tools` is added to
+[`DOC_CORE_TOOLS`](https://github.com/amd/gaia/blob/main/src/gaia/agents/chat/tool_bundles.py)
+(CORE = 11), so once registered it renders in **both** the text prompt and the
+native `tools=` schema every active turn and is cap-/eviction-exempt. It is
+registered **only when the loader is active** (`self.tool_loader is not None`),
+so the default-off `doc` path stays byte-identical — the unfiltered 37-tool
+baseline is unchanged.
+
+**Recovery lands on the next model *step*, not the next user turn.** The
+`load_tools(bundle)` handler calls `ToolLoader.load_bundle`, then
+`Agent._apply_tool_filter` — the one place the active filter and the cached
+system prompt move together. Because `system_prompt` and `_openai_tools` are
+read live at every LLM call, the expanded set is visible to the very next step
+in the same query, which is what lets `smart_discovery` recover on turn 1.
+
+**`load_bundle` is cap-aware.** It resolves a bundle name (or a bare tool name,
+via the reverse index) and admits members with the same LRU-evict path `select()`
+uses — protecting CORE and the members being loaded now — so `max_tools` holds at
+all times. It emits a same-turn `TOOL_LOADER {…, "event": "load_tools", …}`
+superset line.
+
+**Menu is stable and native-only.** A compact bundle menu (name + one-line
+description, from `ToolBundle.description`) is injected into the **stable** prefix
+of the doc system prompt (before the volatile tools tail → no KV thrash), and
+**only for native tool-calling models** — non-native models already have free
+recovery and are the TTFT-sensitive path.
+
+**Tuning signal is log-derived.** The loader counts escape-hatch (free) and
+`load_tools` (explicit) activations per session and emits a `TOOL_LOADER_SESSION`
+summary on `reset_session()` (`escape_hatch_rate = (escape_hatch + load_tools) /
+turns`). `gaia.eval.tool_recall` aggregates these from the server log and reports
+the per-turn rate alongside recall — no UI-DB migration.
+
+**Recall gate flipped correctly.** `tool_recall.py` unions same-turn
+`load_tools` superset lines into that turn's loaded set and treats `load_tools`
+as always-satisfied; **only then** is the native "known gap" exemption removed,
+so a successful recovery passes the gate and a genuinely unrecovered miss fails
+it on every model.
+
+**Cap unchanged at 14** (→ 3 dynamic slots now that CORE = 11). The eval gates
+recall; bump the default only if recall or the escape-hatch rate regresses.
+
 ### Part 3 — Skill-driven signal (gated on #887)
 
 A third selection signal, added **only after** [#887](https://github.com/amd/gaia/issues/887)
@@ -407,7 +454,11 @@ via the base `_select_tools_for_turn` hook, and both render paths filter from th
 same selection. The old keyword/bundle-policy skeleton was removed; the class name
 `ToolLoader` and `reset_session()` were kept so the existing (guarded) call sites
 in `cli.py` / `chat/app.py` needed no change. Recall recovery for native
-tool-calling models (the `load_tools` meta-tool) is still **Part 2**.
+tool-calling models has shipped (Part 2, #1450): the loader exposes
+`bundle_names` / `format_bundle_menu` / `load_bundle` and per-session escape-hatch
+counters; `ChatAgent` registers the `load_tools` meta-tool and injects the
+native-only bundle menu; and `gaia.eval.tool_recall` unions mid-loop `load_tools`
+lines, drops the native exemption, and reports the escape-hatch activation rate.
 
 ## Dependencies
 

@@ -817,8 +817,19 @@ def _refresh_active_tool_filter(self, user_input: str) -> None:
         # pylint: disable-next=assignment-from-none
         new_filter = self._select_tools_for_turn(user_input)
         if new_filter != self._active_tool_filter:
-            self._active_tool_filter = new_filter
-            self._system_prompt_cache = self._compose_system_prompt()
+            self._apply_tool_filter(new_filter)
+
+    def _apply_tool_filter(self, new_filter: Optional[List[str]]) -> None:
+        """Swap the active tool filter and recompute the cached system prompt.
+
+        The single place the "filter and prompt move together" invariant lives.
+        Called from :meth:`_refresh_active_tool_filter` (per user turn) and from
+        the ``load_tools`` escape-hatch handler (mid-loop), so a mid-query
+        expansion is visible to the very next model step — both render paths
+        (``system_prompt`` and ``_openai_tools``) read these live.
+        """
+        self._active_tool_filter = new_filter
+        self._system_prompt_cache = self._compose_system_prompt()
 
     def rebuild_system_prompt(self) -> None:
         """Rebuild system prompt with current tools from _TOOL_REGISTRY.

@@ -57,8 +57,9 @@
 # tools (index/summarize/RAG) for doc-oriented turns while excluding lower-
 # scoring noise; plain content questions fall back to the CORE set. Overridable.
 DEFAULT_THRESHOLD = 0.20
-# Default cap: 10 CORE + 4 dynamic slots = 14 (≈62% shrink on the 37-tool doc
-# profile, clears the ≥60% Part-0 TTFT-reduction gate). See the plan deviations.
+# Default cap: 11 CORE (doc profile, incl. the load_tools escape hatch) + 3
+# dynamic slots = 14 (≈62% shrink on the 37-tool doc profile, clears the
+# ≥60% Part-0 TTFT-reduction gate). See the plan deviations.
 DEFAULT_MAX_TOOLS = 14
 
 
@@ -156,6 +157,13 @@ def __init__(
         self._loaded: Dict[str, _ToolState] = {}
         self._turn = 0
         self._session_disabled = False
+        # Escape-hatch activation counters (Part 2, #1450). Both recovery paths
+        # feed the τ-tuning signal: the non-tool-calling free recovery
+        # (record_tool_use on an unlisted tool) and the native explicit recovery
+        # (load_bundle). Summarized on reset_session(), aggregated from logs by
+        # the eval. A rising per-turn rate ⇒ τ too strict.
+        self._escape_hatch_count = 0
+        self._load_tools_count = 0
 
     # ── public API ───────────────────────────────────────────────────────
 
@@ -284,37 +292,140 @@ def record_tool_use(self, tool_name: str) -> None:
 
         If the tool is loaded, refresh its ``last_call_ts``. If it is **not**
         loaded, the model reached a tool the prompt didn't list (a free
-        non-tool-calling recovery via the full registry); log it as the
-        escape-hatch signal. This does *not* auto-load the tool — that is
-        Part 2's job.
+        non-tool-calling recovery via the full registry); count and log it as the
+        escape-hatch signal. This does *not* auto-load the tool; a native model
+        re-surfaces a missed tool through the explicit :meth:`load_bundle` path
+        (the ``load_tools`` meta-tool).
         """
         state = self._loaded.get(tool_name)
         if state is not None:
             state.last_call_ts = time.time()
             return
+        self._escape_hatch_count += 1
         logger.info(
             json.dumps(
                 {
                     "event": "TOOL_LOADER_ESCAPE_HATCH",
                     "tool": tool_name,
                     "turn": self._turn,
-                    "note": "executed unlisted tool via full registry (Part-2 gap)",
+                    "note": "executed unlisted tool via full registry (free recovery)",
                 }
             )
         )
 
     def reset_session(self) -> None:
         """Clear per-session state for a new conversation.
 
-        The content-keyed embedding cache survives — embeddings depend only on
-        the tool docs, not on the conversation.
+        Emits the per-session escape-hatch summary (the τ-tuning signal) for the
+        conversation just ending **before** clearing, then zeroes the counters
+        alongside the existing state clears. The content-keyed embedding cache
+        survives — embeddings depend only on the tool docs, not the conversation.
         """
+        if self._turn > 0:
+            self._log_session_summary()
         self._loaded.clear()
         self._turn = 0
         self._session_disabled = False
+        self._escape_hatch_count = 0
+        self._load_tools_count = 0
+
+    def bundle_names(self) -> List[str]:
+        """Return the configured bundle names, sorted (the ``load_tools`` menu)."""
+        return sorted(b.name for b in self._bundles)
+
+    def format_bundle_menu(self) -> str:
+        """Return a compact ``"- {name}: {description}"`` menu over all bundles.
+
+        Used both for the native-model system-prompt menu and for the
+        unknown-bundle error text, so the model always sees the same valid names.
+        """
+        return "\n".join(
+            f"- {b.name}: {b.description}" if b.description else f"- {b.name}"
+            for b in self._bundles
+        )
+
+    def load_bundle(self, bundle: str, registry: Dict[str, dict]) -> List[str]:
+        """Admit a bundle's tools into the loaded set (the explicit escape hatch).
+
+        Resolves *bundle* to a :class:`ToolBundle` — exact bundle-name match
+        first, else (robustness nicety) a bare tool name resolved to its
+        bundle(s) via the reverse index — and admits each member present in
+        *registry* and not already loaded, **cap-aware**: under the cap via
+        :meth:`_admit`; at the cap by LRU-evicting a non-CORE tool that is not
+        being loaded right now (or skipping + logging if nothing is evictable),
+        mirroring :meth:`select`'s admission loop. So ``max_tools`` holds at all
+        times. Emits a same-turn ``TOOL_LOADER`` *loaded superset* line so the
+        recall parser sees the mid-loop expansion.
+
+        Args:
+            bundle: A bundle name from the menu, or a bare tool name to resolve
+                to its owning bundle(s).
+            registry: The live tool registry (same object passed to
+                :meth:`select`); members absent from it are not admitted.
+
+        Returns:
+            The sorted loaded set after admission.
+
+        Raises:
+            KeyError: *bundle* is neither a known bundle name nor a known tool
+                name — the caller turns this into an actionable error listing the
+                valid bundle names.
+        """
+        members, resolved_name = self._resolve_bundle_members(bundle)
+
+        protected = set(self._core) | set(members)
+        sel = _Selection()
+        for member in sorted(members):
+            if member not in registry or member in self._loaded:
+                continue
+            if len(self._loaded) < self._max_tools:
+                self._admit(member, sel)
+                continue
+            victim = self._pick_eviction_victim(protected)
+            if victim is None:
+                sel.skipped_at_cap.append(member)
+                continue
+            del self._loaded[victim]
+            sel.evicted.append(victim)
+            self._admit(member, sel)
+
+        self._load_tools_count += 1
+        logger.info(
+            "TOOL_LOADER %s",
+            json.dumps(
+                {
+                    "turn": self._turn,
+                    "event": "load_tools",
+                    "bundle": resolved_name,
+                    "admitted": sorted(sel.admitted),
+                    "evicted": sorted(sel.evicted),
+                    "skipped_at_cap": sorted(sel.skipped_at_cap),
+                    "loaded": sorted(self._loaded),
+                }
+            ),
+        )
+        return sorted(self._loaded)
 
     # ── internals ────────────────────────────────────────────────────────
 
+    def _resolve_bundle_members(self, bundle: str) -> tuple["FrozenSet[str]", str]:
+        """Resolve *bundle* to ``(members, resolved_name)``, or raise ``KeyError``.
+
+        Exact bundle-name match first; else a bare tool name resolved to the
+        union of its owning bundles' members via the reverse index.
+        ``resolved_name`` is the matched bundle name (exact match) or the owning
+        bundle name(s) joined with ``+`` (tool-name match), so the ``load_tools``
+        log line records the bundle actually pulled, not the bare tool name.
+        """
+        for b in self._bundles:
+            if b.name == bundle:
+                return b.members, b.name
+        owning = self._tool_to_bundles.get(bundle)
+        if owning:
+            members = frozenset().union(*(b.members for b in owning))
+            return members, "+".join(b.name for b in owning)
+        raise KeyError(bundle)
+
     def _admit(self, name: str, sel: _Selection) -> None:
         """Add *name* to the loaded set with fresh bookkeeping."""
         self._loaded[name] = _ToolState(loaded_at=time.time(), load_turn=self._turn)
@@ -404,6 +515,28 @@ def _log_selection(
             ),
         )
 
+    def _log_session_summary(self) -> None:
+        """Emit one ``TOOL_LOADER_SESSION`` INFO line — the τ-tuning signal.
+
+        ``escape_hatch_rate`` is per turn over both recovery paths (free
+        non-tool-calling recovery + native ``load_tools``); the two component
+        counts are reported separately so the tuner can see which path fired.
+        """
+        logger.info(
+            "TOOL_LOADER_SESSION %s",
+            json.dumps(
+                {
+                    "turns": self._turn,
+                    "escape_hatch_count": self._escape_hatch_count,
+                    "load_tools_count": self._load_tools_count,
+                    "escape_hatch_rate": (
+                        self._escape_hatch_count + self._load_tools_count
+                    )
+                    / max(self._turn, 1),
+                }
+            ),
+        )
+
 
 def _sha256(text: str) -> str:
     """Hex SHA-256 of *text* (UTF-8)."""