Add CodeMode `dynamic_catalog` flag for cache-stable tool disclosure by DouweM · Pull Request #243 · pydantic/pydantic-ai-harness

DouweM · 2026-05-15T23:55:43Z

Tier-2 follow-up to #232 (Tier-1 fix merged in #240). Implements the structural cache reshape described in the tool-search × code-mode grapple, directly on CodeMode rather than as a separate capability.

Summary

Adds a dynamic_catalog: bool flag to CodeMode. When set, it moves the tool-disclosure surface out of run_code.description (which lives in the prompt-cache-keyed tool-definitions block) into a cache-friendly shape:

run_code.description becomes static -- just the base prose (sandbox restrictions, return-value contract) plus the cache-stable search addendum. Never re-rendered → the tool-defs block stays byte-stable across Tool Search discoveries / per-step toolset swaps.
Catalog moves to instructions as a dynamic InstructionPart. Anthropic/Bedrock split static vs. dynamic instructions, so the static instruction prefix survives discoveries.
Newly-discovered tools are announced via ctx.enqueue (Add pending message queue (ctx.enqueue / agent_run.enqueue) pydantic-ai#4980) as a SystemPromptPart in a fresh ModelRequest. Append-only -- never touches a cached prefix.

from pydantic_ai import Agent
from pydantic_ai.capabilities import ToolSearch
from pydantic_ai_harness import CodeMode

agent = Agent(
    'anthropic:claude-sonnet-4-5',
    capabilities=[ToolSearch(), CodeMode(dynamic_catalog=True)],
)

Opt-in: default CodeMode (dynamic_catalog=False) keeps the catalog in run_code.description (slightly cheaper on prompt size when the toolset never changes). Pair with ToolSearch or a churning toolset for the cache win.

Why a flag, not a separate capability

The earlier revision shipped this as a CodeModeDynamicCatalog capability that surgically wrapped CodeMode's assembled toolset and rewrote its run_code ToolDefinition. That coupling (importing CodeMode's private toolset internals, matching its outermost ordering tier to avoid a wraps cycle) made it a shim on top of CodeMode rather than part of it. Folding it into a flag removes the shim, the cross-capability private imports, and the ordering dance -- and lets the search addendum stay in run_code.description in both modes (it's cache-stable), which the wrapper previously dropped.

Implementation notes

CodeModeToolset gains dynamic_catalog. In get_tools, when set, the description is just the base prose and the rendered catalog is stashed (_last_catalog); get_instructions returns it as InstructionPart(dynamic=True), merged after any upstream instructions. for_run_step carries the stash alongside _repl / _warned_deferred.
CodeMode gains the announcement hooks: after_tool_execute reads ToolSearchReturnContent from local-path search_tools returns; after_model_request does the same for native-path NativeToolSearchReturnParts. Both are inert unless dynamic_catalog is set. _announced_tools: set[str] is per-run (for_run returns a fresh instance when the flag is on) and idempotent across repeat discoveries. The announcement is enqueued as a bare SystemPromptPart (no ModelRequest wrapper).
_render_catalog(callable_defs) -> str (extracted in the Tier-1 prep) renders the catalog for both the default description path and the new instructions path, so there's no duplicated rendering logic.

Dependency note

[tool.uv.sources] points pydantic-ai-slim at the background-tools branch (where the pending message queue lives) until pydantic/pydantic-ai#4980 lands. Switch back to a published version once it's released.

Cross-provider cache safety

With pydantic/pydantic-ai#5509 (the #5437 fix) merged into background-tools, mid-conversation SystemPromptParts render inline (XML-wrapped on providers without native mid-conversation system support) instead of being hoisted to the top-level system block. The announcement is therefore cache-safe across providers — it never mutates the cached prefix — and the run_code.description stays byte-stable across discoveries.

Test plan

Tests merged into tests/code_mode/test_code_mode.py (TestDynamicCatalog): catalog leaves run_code.description and surfaces as a dynamic InstructionPart, appends to upstream str / Sequence; default keeps it in the description with no instructions; empty catalog emits nothing; search addendum stays in the description; for_run_step preserves the stash; per-run _announced_tools reset (and for_run returns self when disabled); local-search announcement (success / disabled / empty / non-search / idempotent); native-search announcement (success / unrelated parts); _extract_discovered_names edge cases; two end-to-end Agent.run exercises with FunctionModel (catalog-in-instructions + discovery announcement, and an eager tool callable via run_code).
100% line + branch coverage on code_mode/_capability.py and code_mode/_toolset.py.
make lint && make typecheck clean; full suite (109 tests) passes.

Code Mode flattens Tool Search deferral — filter defer_loading=True from sandboxing #232 -- Tier-1 fix (merged in fix(code_mode): honor Tool Search's deferred-loading contract #240) and the Tier-2 outline this implements
Add pending message queue (ctx.enqueue / agent_run.enqueue) pydantic-ai#4980 -- pending message queue (the enqueue primitive this rides on)
Render mid-conversation SystemPromptParts as XML-wrapped UserPromptParts for Anthropic/Google pydantic-ai#5437 -- mid-conversation SystemPromptPart mapping fix (closes the cross-provider cache loop)

…closure Moves CodeMode's per-tool signature catalog out of `run_code.description` (which lives in the prompt-cache-keyed tool-definitions block) and into agent instructions as a dynamic `InstructionPart`, then announces newly- discovered tools via `RunContext.enqueue` rather than by mutating the cached description. The result: the tool-defs block stays byte-stable across Tool Search discoveries / per-step toolset swaps, only the dynamic instructions and append-only system-prompt announcements grow. This is the Tier-2 reshape from #232. Opt-in (default `CodeMode` keeps the catalog in `run_code.description`, which is slightly cheaper on prompt size when the toolset never changes); pair with `ToolSearch` or a churning toolset for the cache win. Depends on pydantic/pydantic-ai#4980 for the pending message queue; pinned to the `background-tools` branch of pydantic-ai-slim until that lands. Cross-provider cache safety completes once pydantic/pydantic-ai#5437 (XML-wrapped mid-conversation SystemPromptPart mapping) ships.

…gap)

Replace the separate `CodeModeDynamicCatalog` capability (which surgically wrapped `CodeMode`'s assembled toolset) with a `dynamic_catalog: bool` flag on `CodeMode` itself. When set, `CodeModeToolset` keeps only the static base prose in `run_code.description` and surfaces the sandboxed-tool catalog as a dynamic `InstructionPart` via `get_instructions`, and `CodeMode` announces newly discovered tools by enqueuing a `SystemPromptPart` through `ctx.enqueue`. This drops the shim package and its dedicated test module, merging the behavior and its tests into `code_mode`. The search addendum now stays in `run_code.description` in both modes (it's cache-stable), an improvement over the prior wrapper which dropped it.

Re-lock pydantic-ai-slim to the latest background-tools HEAD, which now accepts `ModelRequestPart`s directly in `enqueue` and renders mid-conversation `SystemPromptPart`s inline rather than hoisting them (pydantic/pydantic-ai#5509). Drop the `ModelRequest` wrapper workaround and enqueue the `SystemPromptPart` directly; on the wire it renders as an XML-wrapped user prompt, so the catalog cache stays intact across providers.

…essage queue `RunContext.enqueue` now raises when the context has no pending-message queue (pydantic-ai removed the silent-drop path). Pass `pending_messages=[]` in the test helper so the discovery-announcement unit tests still exercise enqueue, and narrow `ctx.pending_messages` before reading it. Real runs wire the queue automatically.

DouweM · 2026-05-22T22:24:01Z

@adtyavrdhn @dsfaccini Feel free to take over while I'm out! I think the dynamic catalog could be the default.

DouweM added 5 commits May 15, 2026 23:55

Re-lock pydantic-ai-slim to background-tools HEAD after main merge

4e88f97

Adapt to pydantic-ai refactor: PendingMessage.payload→request

b3783a7

Add e2e test exercising eager tool through run_code (closes coverage …

4b6f6ab

…gap)

DouweM changed the title ~~Add CodeModeDynamicCatalog for cache-stable tool disclosure~~ Add CodeMode dynamic_catalog flag for cache-stable tool disclosure May 21, 2026

DouweM added 2 commits May 21, 2026 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CodeMode `dynamic_catalog` flag for cache-stable tool disclosure#243

Add CodeMode `dynamic_catalog` flag for cache-stable tool disclosure#243
DouweM wants to merge 7 commits into
mainfrom
capability/code-mode-dynamic-catalog

DouweM commented May 15, 2026 •

edited

Loading

Uh oh!

DouweM commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DouweM commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why a flag, not a separate capability

Implementation notes

Dependency note

Cross-provider cache safety

Test plan

Related

Uh oh!

DouweM commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DouweM commented May 15, 2026 •

edited

Loading