Skip to content

Add CodeMode dynamic_catalog flag for cache-stable tool disclosure#243

Open
DouweM wants to merge 7 commits into
mainfrom
capability/code-mode-dynamic-catalog
Open

Add CodeMode dynamic_catalog flag for cache-stable tool disclosure#243
DouweM wants to merge 7 commits into
mainfrom
capability/code-mode-dynamic-catalog

Conversation

@DouweM
Copy link
Copy Markdown
Contributor

@DouweM DouweM commented May 15, 2026

Tier-2 follow-up to #232 (Tier-1 fix merged in #240). Implements the structural cache reshape described in the tool-search × code-mode grapple, directly on CodeMode rather than as a separate capability.

Summary

Adds a dynamic_catalog: bool flag to CodeMode. When set, it moves the tool-disclosure surface out of run_code.description (which lives in the prompt-cache-keyed tool-definitions block) into a cache-friendly shape:

  1. run_code.description becomes static -- just the base prose (sandbox restrictions, return-value contract) plus the cache-stable search addendum. Never re-rendered → the tool-defs block stays byte-stable across Tool Search discoveries / per-step toolset swaps.
  2. Catalog moves to instructions as a dynamic InstructionPart. Anthropic/Bedrock split static vs. dynamic instructions, so the static instruction prefix survives discoveries.
  3. Newly-discovered tools are announced via ctx.enqueue (Add pending message queue (ctx.enqueue / agent_run.enqueue) pydantic-ai#4980) as a SystemPromptPart in a fresh ModelRequest. Append-only -- never touches a cached prefix.
from pydantic_ai import Agent
from pydantic_ai.capabilities import ToolSearch
from pydantic_ai_harness import CodeMode

agent = Agent(
    'anthropic:claude-sonnet-4-5',
    capabilities=[ToolSearch(), CodeMode(dynamic_catalog=True)],
)

Opt-in: default CodeMode (dynamic_catalog=False) keeps the catalog in run_code.description (slightly cheaper on prompt size when the toolset never changes). Pair with ToolSearch or a churning toolset for the cache win.

Why a flag, not a separate capability

The earlier revision shipped this as a CodeModeDynamicCatalog capability that surgically wrapped CodeMode's assembled toolset and rewrote its run_code ToolDefinition. That coupling (importing CodeMode's private toolset internals, matching its outermost ordering tier to avoid a wraps cycle) made it a shim on top of CodeMode rather than part of it. Folding it into a flag removes the shim, the cross-capability private imports, and the ordering dance -- and lets the search addendum stay in run_code.description in both modes (it's cache-stable), which the wrapper previously dropped.

Implementation notes

  • CodeModeToolset gains dynamic_catalog. In get_tools, when set, the description is just the base prose and the rendered catalog is stashed (_last_catalog); get_instructions returns it as InstructionPart(dynamic=True), merged after any upstream instructions. for_run_step carries the stash alongside _repl / _warned_deferred.
  • CodeMode gains the announcement hooks: after_tool_execute reads ToolSearchReturnContent from local-path search_tools returns; after_model_request does the same for native-path NativeToolSearchReturnParts. Both are inert unless dynamic_catalog is set. _announced_tools: set[str] is per-run (for_run returns a fresh instance when the flag is on) and idempotent across repeat discoveries. The announcement is enqueued as a bare SystemPromptPart (no ModelRequest wrapper).
  • _render_catalog(callable_defs) -> str (extracted in the Tier-1 prep) renders the catalog for both the default description path and the new instructions path, so there's no duplicated rendering logic.

Dependency note

[tool.uv.sources] points pydantic-ai-slim at the background-tools branch (where the pending message queue lives) until pydantic/pydantic-ai#4980 lands. Switch back to a published version once it's released.

Cross-provider cache safety

With pydantic/pydantic-ai#5509 (the #5437 fix) merged into background-tools, mid-conversation SystemPromptParts render inline (XML-wrapped on providers without native mid-conversation system support) instead of being hoisted to the top-level system block. The announcement is therefore cache-safe across providers — it never mutates the cached prefix — and the run_code.description stays byte-stable across discoveries.

Test plan

  • Tests merged into tests/code_mode/test_code_mode.py (TestDynamicCatalog): catalog leaves run_code.description and surfaces as a dynamic InstructionPart, appends to upstream str / Sequence; default keeps it in the description with no instructions; empty catalog emits nothing; search addendum stays in the description; for_run_step preserves the stash; per-run _announced_tools reset (and for_run returns self when disabled); local-search announcement (success / disabled / empty / non-search / idempotent); native-search announcement (success / unrelated parts); _extract_discovered_names edge cases; two end-to-end Agent.run exercises with FunctionModel (catalog-in-instructions + discovery announcement, and an eager tool callable via run_code).
  • 100% line + branch coverage on code_mode/_capability.py and code_mode/_toolset.py.
  • make lint && make typecheck clean; full suite (109 tests) passes.

Related

DouweM added 5 commits May 15, 2026 23:55
…closure

Moves CodeMode's per-tool signature catalog out of `run_code.description`
(which lives in the prompt-cache-keyed tool-definitions block) and into
agent instructions as a dynamic `InstructionPart`, then announces newly-
discovered tools via `RunContext.enqueue` rather than by mutating the
cached description. The result: the tool-defs block stays byte-stable
across Tool Search discoveries / per-step toolset swaps, only the dynamic
instructions and append-only system-prompt announcements grow.

This is the Tier-2 reshape from #232. Opt-in
(default `CodeMode` keeps the catalog in `run_code.description`, which is
slightly cheaper on prompt size when the toolset never changes); pair
with `ToolSearch` or a churning toolset for the cache win.

Depends on pydantic/pydantic-ai#4980 for the pending message queue;
pinned to the `background-tools` branch of pydantic-ai-slim until that
lands. Cross-provider cache safety completes once
pydantic/pydantic-ai#5437 (XML-wrapped mid-conversation SystemPromptPart
mapping) ships.
Replace the separate `CodeModeDynamicCatalog` capability (which surgically
wrapped `CodeMode`'s assembled toolset) with a `dynamic_catalog: bool` flag on
`CodeMode` itself. When set, `CodeModeToolset` keeps only the static base prose
in `run_code.description` and surfaces the sandboxed-tool catalog as a dynamic
`InstructionPart` via `get_instructions`, and `CodeMode` announces newly
discovered tools by enqueuing a `SystemPromptPart` through `ctx.enqueue`.

This drops the shim package and its dedicated test module, merging the behavior
and its tests into `code_mode`. The search addendum now stays in
`run_code.description` in both modes (it's cache-stable), an improvement over
the prior wrapper which dropped it.
@DouweM DouweM changed the title Add CodeModeDynamicCatalog for cache-stable tool disclosure Add CodeMode dynamic_catalog flag for cache-stable tool disclosure May 21, 2026
DouweM added 2 commits May 21, 2026 22:07
Re-lock pydantic-ai-slim to the latest background-tools HEAD, which now accepts
`ModelRequestPart`s directly in `enqueue` and renders mid-conversation
`SystemPromptPart`s inline rather than hoisting them (pydantic/pydantic-ai#5509).
Drop the `ModelRequest` wrapper workaround and enqueue the `SystemPromptPart`
directly; on the wire it renders as an XML-wrapped user prompt, so the catalog
cache stays intact across providers.
…essage queue

`RunContext.enqueue` now raises when the context has no pending-message queue (pydantic-ai
removed the silent-drop path). Pass `pending_messages=[]` in the test helper so the
discovery-announcement unit tests still exercise enqueue, and narrow `ctx.pending_messages`
before reading it. Real runs wire the queue automatically.
@DouweM
Copy link
Copy Markdown
Contributor Author

DouweM commented May 22, 2026

@adtyavrdhn @dsfaccini Feel free to take over while I'm out! I think the dynamic catalog could be the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant