Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 59 additions & 8 deletions docs/plans/tool-loader.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ title: "Dynamic Tool Loader"
<Note>
**Component:** Per-turn tool visibility for agents (issue [#688](https://github.com/amd/gaia/issues/688))
**Module:** `gaia.agents.base.tool_loader`
**Status:** **Part 0 (#1448) + Part 1 (#1449) landed.** Part 1 ships the selection mechanism behind a default-off toggle on the ChatAgent `doc` profile. Parts 2–3 (explicit escape hatch, skill signal) are still proposed.
**Status:** **Part 0 (#1448) + Part 1 (#1449) + Part 2 (#1450) landed.** Part 1 ships the selection mechanism behind a default-off toggle on the ChatAgent `doc` profile; Part 2 adds the explicit `load_tools` escape hatch (so native tool-calling models can recover a semantic miss) plus the escape-hatch activation-rate tuning signal. Part 3 (skill signal) is still proposed.
**Target agent (v1):** `ChatAgent` (`doc` profile), behind a default-off toggle.
</Note>

Expand Down Expand Up @@ -297,11 +297,14 @@ backend KV prefix stays warm. When a filter is active the tools block moves
**after** the response-format template (volatile content last); with no filter
the legacy order and bytes are preserved exactly.

**Native known gap (Amendment 2).** `_execute_tool` is never tightened, so a
non-tool-calling model that names an unlisted tool still runs it (free recovery)
and the loader logs `TOOL_LOADER_ESCAPE_HATCH`. Native tool-calling models have
no such hatch until Part 2's `load_tools`; on first activation the agent logs the
miss as a *known gap* rather than padding the loaded set.
**Native known gap (Amendment 2) — closed by Part 2.** `_execute_tool` is never
tightened, so a non-tool-calling model that names an unlisted tool still runs it
(free recovery) and the loader logs `TOOL_LOADER_ESCAPE_HATCH`. In Part 1 native
tool-calling models had no such hatch — a semantic miss could not self-recover.
[Part 2](#part-2-explicit-escape-hatch--tuning-1450) closes the recovery gap with
the always-on `load_tools` meta-tool (the model loads the bundle it needs and
calls the tool on its next step), and the recall gate's native exemption is
removed accordingly.

**Approved deviations from this sketch** (flagged in the #1449 PR):

Expand All @@ -323,7 +326,7 @@ baseline — meaning **CORE-only is the ~60%-reduction best case** and a full
`test_tool_loader_token_budget.py` pins these filtered costs as a static guard.
</Warning>

### Part 2 — Explicit escape hatch + tuning
### Part 2 — Explicit escape hatch + tuning ✅ landed (#1450)

- Add bundle re-surfacing + a discoverability menu of bundle names, and the
`load_tools` meta-tool that native tool-calling models need (the free recovery
Expand All @@ -340,6 +343,50 @@ baseline — meaning **CORE-only is the ~60%-reduction best case** and a full
- **Escape-hatch activation rate** is logged per session and usable as the
threshold-tuning signal (rising rate ⇒ τ too strict).

#### How Part 2 shipped (implementation reference)

**`load_tools` is always-on via CORE.** `load_tools` is added to
[`DOC_CORE_TOOLS`](https://github.com/amd/gaia/blob/main/src/gaia/agents/chat/tool_bundles.py)
(CORE = 11), so once registered it renders in **both** the text prompt and the
native `tools=` schema every active turn and is cap-/eviction-exempt. It is
registered **only when the loader is active** (`self.tool_loader is not None`),
so the default-off `doc` path stays byte-identical — the unfiltered 37-tool
baseline is unchanged.

**Recovery lands on the next model *step*, not the next user turn.** The
`load_tools(bundle)` handler calls `ToolLoader.load_bundle`, then
`Agent._apply_tool_filter` — the one place the active filter and the cached
system prompt move together. Because `system_prompt` and `_openai_tools` are
read live at every LLM call, the expanded set is visible to the very next step
in the same query, which is what lets `smart_discovery` recover on turn 1.

**`load_bundle` is cap-aware.** It resolves a bundle name (or a bare tool name,
via the reverse index) and admits members with the same LRU-evict path `select()`
uses — protecting CORE and the members being loaded now — so `max_tools` holds at
all times. It emits a same-turn `TOOL_LOADER {…, "event": "load_tools", …}`
superset line.

**Menu is stable and native-only.** A compact bundle menu (name + one-line
description, from `ToolBundle.description`) is injected into the **stable** prefix
of the doc system prompt (before the volatile tools tail → no KV thrash), and
**only for native tool-calling models** — non-native models already have free
recovery and are the TTFT-sensitive path.

**Tuning signal is log-derived.** The loader counts escape-hatch (free) and
`load_tools` (explicit) activations per session and emits a `TOOL_LOADER_SESSION`
summary on `reset_session()` (`escape_hatch_rate = (escape_hatch + load_tools) /
turns`). `gaia.eval.tool_recall` aggregates these from the server log and reports
the per-turn rate alongside recall — no UI-DB migration.

**Recall gate flipped correctly.** `tool_recall.py` unions same-turn
`load_tools` superset lines into that turn's loaded set and treats `load_tools`
as always-satisfied; **only then** is the native "known gap" exemption removed,
so a successful recovery passes the gate and a genuinely unrecovered miss fails
it on every model.

**Cap unchanged at 14** (→ 3 dynamic slots now that CORE = 11). The eval gates
recall; bump the default only if recall or the escape-hatch rate regresses.

### Part 3 — Skill-driven signal (gated on #887)

A third selection signal, added **only after** [#887](https://github.com/amd/gaia/issues/887)
Expand Down Expand Up @@ -407,7 +454,11 @@ via the base `_select_tools_for_turn` hook, and both render paths filter from th
same selection. The old keyword/bundle-policy skeleton was removed; the class name
`ToolLoader` and `reset_session()` were kept so the existing (guarded) call sites
in `cli.py` / `chat/app.py` needed no change. Recall recovery for native
tool-calling models (the `load_tools` meta-tool) is still **Part 2**.
tool-calling models has shipped (Part 2, #1450): the loader exposes
`bundle_names` / `format_bundle_menu` / `load_bundle` and per-session escape-hatch
counters; `ChatAgent` registers the `load_tools` meta-tool and injects the
native-only bundle menu; and `gaia.eval.tool_recall` unions mid-loop `load_tools`
lines, drops the native exemption, and reports the escape-hatch activation rate.

## Dependencies

Expand Down
15 changes: 13 additions & 2 deletions src/gaia/agents/base/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -817,8 +817,19 @@ def _refresh_active_tool_filter(self, user_input: str) -> None:
# pylint: disable-next=assignment-from-none
new_filter = self._select_tools_for_turn(user_input)
if new_filter != self._active_tool_filter:
self._active_tool_filter = new_filter
self._system_prompt_cache = self._compose_system_prompt()
self._apply_tool_filter(new_filter)

def _apply_tool_filter(self, new_filter: Optional[List[str]]) -> None:
"""Swap the active tool filter and recompute the cached system prompt.

The single place the "filter and prompt move together" invariant lives.
Called from :meth:`_refresh_active_tool_filter` (per user turn) and from
the ``load_tools`` escape-hatch handler (mid-loop), so a mid-query
expansion is visible to the very next model step — both render paths
(``system_prompt`` and ``_openai_tools``) read these live.
"""
self._active_tool_filter = new_filter
self._system_prompt_cache = self._compose_system_prompt()

def rebuild_system_prompt(self) -> None:
"""Rebuild system prompt with current tools from _TOOL_REGISTRY.
Expand Down
149 changes: 141 additions & 8 deletions src/gaia/agents/base/tool_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,9 @@
# tools (index/summarize/RAG) for doc-oriented turns while excluding lower-
# scoring noise; plain content questions fall back to the CORE set. Overridable.
DEFAULT_THRESHOLD = 0.20
# Default cap: 10 CORE + 4 dynamic slots = 14 (≈62% shrink on the 37-tool doc
# profile, clears the ≥60% Part-0 TTFT-reduction gate). See the plan deviations.
# Default cap: 11 CORE (doc profile, incl. the load_tools escape hatch) + 3
# dynamic slots = 14 (≈62% shrink on the 37-tool doc profile, clears the
# ≥60% Part-0 TTFT-reduction gate). See the plan deviations.
DEFAULT_MAX_TOOLS = 14


Expand Down Expand Up @@ -156,6 +157,13 @@ def __init__(
self._loaded: Dict[str, _ToolState] = {}
self._turn = 0
self._session_disabled = False
# Escape-hatch activation counters (Part 2, #1450). Both recovery paths
# feed the τ-tuning signal: the non-tool-calling free recovery
# (record_tool_use on an unlisted tool) and the native explicit recovery
# (load_bundle). Summarized on reset_session(), aggregated from logs by
# the eval. A rising per-turn rate ⇒ τ too strict.
self._escape_hatch_count = 0
self._load_tools_count = 0

# ── public API ───────────────────────────────────────────────────────

Expand Down Expand Up @@ -284,37 +292,140 @@ def record_tool_use(self, tool_name: str) -> None:

If the tool is loaded, refresh its ``last_call_ts``. If it is **not**
loaded, the model reached a tool the prompt didn't list (a free
non-tool-calling recovery via the full registry); log it as the
escape-hatch signal. This does *not* auto-load the tool — that is
Part 2's job.
non-tool-calling recovery via the full registry); count and log it as the
escape-hatch signal. This does *not* auto-load the tool; a native model
re-surfaces a missed tool through the explicit :meth:`load_bundle` path
(the ``load_tools`` meta-tool).
"""
state = self._loaded.get(tool_name)
if state is not None:
state.last_call_ts = time.time()
return
self._escape_hatch_count += 1
logger.info(
json.dumps(
{
"event": "TOOL_LOADER_ESCAPE_HATCH",
"tool": tool_name,
"turn": self._turn,
"note": "executed unlisted tool via full registry (Part-2 gap)",
"note": "executed unlisted tool via full registry (free recovery)",
}
)
)

def reset_session(self) -> None:
"""Clear per-session state for a new conversation.

The content-keyed embedding cache survives — embeddings depend only on
the tool docs, not on the conversation.
Emits the per-session escape-hatch summary (the τ-tuning signal) for the
conversation just ending **before** clearing, then zeroes the counters
alongside the existing state clears. The content-keyed embedding cache
survives — embeddings depend only on the tool docs, not the conversation.
"""
if self._turn > 0:
self._log_session_summary()
self._loaded.clear()
self._turn = 0
self._session_disabled = False
self._escape_hatch_count = 0
self._load_tools_count = 0

def bundle_names(self) -> List[str]:
"""Return the configured bundle names, sorted (the ``load_tools`` menu)."""
return sorted(b.name for b in self._bundles)

def format_bundle_menu(self) -> str:
"""Return a compact ``"- {name}: {description}"`` menu over all bundles.

Used both for the native-model system-prompt menu and for the
unknown-bundle error text, so the model always sees the same valid names.
"""
return "\n".join(
f"- {b.name}: {b.description}" if b.description else f"- {b.name}"
for b in self._bundles
)

def load_bundle(self, bundle: str, registry: Dict[str, dict]) -> List[str]:
"""Admit a bundle's tools into the loaded set (the explicit escape hatch).

Resolves *bundle* to a :class:`ToolBundle` — exact bundle-name match
first, else (robustness nicety) a bare tool name resolved to its
bundle(s) via the reverse index — and admits each member present in
*registry* and not already loaded, **cap-aware**: under the cap via
:meth:`_admit`; at the cap by LRU-evicting a non-CORE tool that is not
being loaded right now (or skipping + logging if nothing is evictable),
mirroring :meth:`select`'s admission loop. So ``max_tools`` holds at all
times. Emits a same-turn ``TOOL_LOADER`` *loaded superset* line so the
recall parser sees the mid-loop expansion.

Args:
bundle: A bundle name from the menu, or a bare tool name to resolve
to its owning bundle(s).
registry: The live tool registry (same object passed to
:meth:`select`); members absent from it are not admitted.

Returns:
The sorted loaded set after admission.

Raises:
KeyError: *bundle* is neither a known bundle name nor a known tool
name — the caller turns this into an actionable error listing the
valid bundle names.
"""
members, resolved_name = self._resolve_bundle_members(bundle)

protected = set(self._core) | set(members)
sel = _Selection()
for member in sorted(members):
if member not in registry or member in self._loaded:
continue
if len(self._loaded) < self._max_tools:
self._admit(member, sel)
continue
victim = self._pick_eviction_victim(protected)
if victim is None:
sel.skipped_at_cap.append(member)
continue
del self._loaded[victim]
sel.evicted.append(victim)
self._admit(member, sel)

self._load_tools_count += 1
logger.info(
"TOOL_LOADER %s",
json.dumps(
{
"turn": self._turn,
"event": "load_tools",
"bundle": resolved_name,
"admitted": sorted(sel.admitted),
"evicted": sorted(sel.evicted),
"skipped_at_cap": sorted(sel.skipped_at_cap),
"loaded": sorted(self._loaded),
}
),
)
return sorted(self._loaded)

# ── internals ────────────────────────────────────────────────────────

def _resolve_bundle_members(self, bundle: str) -> tuple["FrozenSet[str]", str]:
"""Resolve *bundle* to ``(members, resolved_name)``, or raise ``KeyError``.

Exact bundle-name match first; else a bare tool name resolved to the
union of its owning bundles' members via the reverse index.
``resolved_name`` is the matched bundle name (exact match) or the owning
bundle name(s) joined with ``+`` (tool-name match), so the ``load_tools``
log line records the bundle actually pulled, not the bare tool name.
"""
for b in self._bundles:
if b.name == bundle:
return b.members, b.name
owning = self._tool_to_bundles.get(bundle)
if owning:
members = frozenset().union(*(b.members for b in owning))
return members, "+".join(b.name for b in owning)
raise KeyError(bundle)

def _admit(self, name: str, sel: _Selection) -> None:
"""Add *name* to the loaded set with fresh bookkeeping."""
self._loaded[name] = _ToolState(loaded_at=time.time(), load_turn=self._turn)
Expand Down Expand Up @@ -404,6 +515,28 @@ def _log_selection(
),
)

def _log_session_summary(self) -> None:
"""Emit one ``TOOL_LOADER_SESSION`` INFO line — the τ-tuning signal.

``escape_hatch_rate`` is per turn over both recovery paths (free
non-tool-calling recovery + native ``load_tools``); the two component
counts are reported separately so the tuner can see which path fired.
"""
logger.info(
"TOOL_LOADER_SESSION %s",
json.dumps(
{
"turns": self._turn,
"escape_hatch_count": self._escape_hatch_count,
"load_tools_count": self._load_tools_count,
"escape_hatch_rate": (
self._escape_hatch_count + self._load_tools_count
)
/ max(self._turn, 1),
}
),
)


def _sha256(text: str) -> str:
"""Hex SHA-256 of *text* (UTF-8)."""
Expand Down
Loading
Loading