Add CodeMode dynamic_catalog flag for cache-stable tool disclosure#243
Open
DouweM wants to merge 7 commits into
Open
Add CodeMode dynamic_catalog flag for cache-stable tool disclosure#243DouweM wants to merge 7 commits into
dynamic_catalog flag for cache-stable tool disclosure#243DouweM wants to merge 7 commits into
Conversation
…closure Moves CodeMode's per-tool signature catalog out of `run_code.description` (which lives in the prompt-cache-keyed tool-definitions block) and into agent instructions as a dynamic `InstructionPart`, then announces newly- discovered tools via `RunContext.enqueue` rather than by mutating the cached description. The result: the tool-defs block stays byte-stable across Tool Search discoveries / per-step toolset swaps, only the dynamic instructions and append-only system-prompt announcements grow. This is the Tier-2 reshape from #232. Opt-in (default `CodeMode` keeps the catalog in `run_code.description`, which is slightly cheaper on prompt size when the toolset never changes); pair with `ToolSearch` or a churning toolset for the cache win. Depends on pydantic/pydantic-ai#4980 for the pending message queue; pinned to the `background-tools` branch of pydantic-ai-slim until that lands. Cross-provider cache safety completes once pydantic/pydantic-ai#5437 (XML-wrapped mid-conversation SystemPromptPart mapping) ships.
Replace the separate `CodeModeDynamicCatalog` capability (which surgically wrapped `CodeMode`'s assembled toolset) with a `dynamic_catalog: bool` flag on `CodeMode` itself. When set, `CodeModeToolset` keeps only the static base prose in `run_code.description` and surfaces the sandboxed-tool catalog as a dynamic `InstructionPart` via `get_instructions`, and `CodeMode` announces newly discovered tools by enqueuing a `SystemPromptPart` through `ctx.enqueue`. This drops the shim package and its dedicated test module, merging the behavior and its tests into `code_mode`. The search addendum now stays in `run_code.description` in both modes (it's cache-stable), an improvement over the prior wrapper which dropped it.
dynamic_catalog flag for cache-stable tool disclosure
Re-lock pydantic-ai-slim to the latest background-tools HEAD, which now accepts `ModelRequestPart`s directly in `enqueue` and renders mid-conversation `SystemPromptPart`s inline rather than hoisting them (pydantic/pydantic-ai#5509). Drop the `ModelRequest` wrapper workaround and enqueue the `SystemPromptPart` directly; on the wire it renders as an XML-wrapped user prompt, so the catalog cache stays intact across providers.
…essage queue `RunContext.enqueue` now raises when the context has no pending-message queue (pydantic-ai removed the silent-drop path). Pass `pending_messages=[]` in the test helper so the discovery-announcement unit tests still exercise enqueue, and narrow `ctx.pending_messages` before reading it. Real runs wire the queue automatically.
Contributor
Author
|
@adtyavrdhn @dsfaccini Feel free to take over while I'm out! I think the dynamic catalog could be the default. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tier-2 follow-up to #232 (Tier-1 fix merged in #240). Implements the structural cache reshape described in the tool-search × code-mode grapple, directly on
CodeModerather than as a separate capability.Summary
Adds a
dynamic_catalog: boolflag toCodeMode. When set, it moves the tool-disclosure surface out ofrun_code.description(which lives in the prompt-cache-keyed tool-definitions block) into a cache-friendly shape:run_code.descriptionbecomes static -- just the base prose (sandbox restrictions, return-value contract) plus the cache-stable search addendum. Never re-rendered → the tool-defs block stays byte-stable across Tool Search discoveries / per-step toolset swaps.InstructionPart. Anthropic/Bedrock split static vs. dynamic instructions, so the static instruction prefix survives discoveries.ctx.enqueue(Add pending message queue (ctx.enqueue/agent_run.enqueue) pydantic-ai#4980) as aSystemPromptPartin a freshModelRequest. Append-only -- never touches a cached prefix.Opt-in: default
CodeMode(dynamic_catalog=False) keeps the catalog inrun_code.description(slightly cheaper on prompt size when the toolset never changes). Pair withToolSearchor a churning toolset for the cache win.Why a flag, not a separate capability
The earlier revision shipped this as a
CodeModeDynamicCatalogcapability that surgically wrappedCodeMode's assembled toolset and rewrote itsrun_codeToolDefinition. That coupling (importingCodeMode's private toolset internals, matching itsoutermostordering tier to avoid awrapscycle) made it a shim on top ofCodeModerather than part of it. Folding it into a flag removes the shim, the cross-capability private imports, and the ordering dance -- and lets the search addendum stay inrun_code.descriptionin both modes (it's cache-stable), which the wrapper previously dropped.Implementation notes
CodeModeToolsetgainsdynamic_catalog. Inget_tools, when set, the description is just the base prose and the rendered catalog is stashed (_last_catalog);get_instructionsreturns it asInstructionPart(dynamic=True), merged after any upstream instructions.for_run_stepcarries the stash alongside_repl/_warned_deferred.CodeModegains the announcement hooks:after_tool_executereadsToolSearchReturnContentfrom local-pathsearch_toolsreturns;after_model_requestdoes the same for native-pathNativeToolSearchReturnParts. Both are inert unlessdynamic_catalogis set._announced_tools: set[str]is per-run (for_runreturns a fresh instance when the flag is on) and idempotent across repeat discoveries. The announcement is enqueued as a bareSystemPromptPart(noModelRequestwrapper)._render_catalog(callable_defs) -> str(extracted in the Tier-1 prep) renders the catalog for both the default description path and the new instructions path, so there's no duplicated rendering logic.Dependency note
[tool.uv.sources]pointspydantic-ai-slimat thebackground-toolsbranch (where the pending message queue lives) until pydantic/pydantic-ai#4980 lands. Switch back to a published version once it's released.Cross-provider cache safety
With pydantic/pydantic-ai#5509 (the #5437 fix) merged into
background-tools, mid-conversationSystemPromptParts render inline (XML-wrapped on providers without native mid-conversation system support) instead of being hoisted to the top-level system block. The announcement is therefore cache-safe across providers — it never mutates the cached prefix — and therun_code.descriptionstays byte-stable across discoveries.Test plan
tests/code_mode/test_code_mode.py(TestDynamicCatalog): catalog leavesrun_code.descriptionand surfaces as a dynamicInstructionPart, appends to upstreamstr/Sequence; default keeps it in the description with no instructions; empty catalog emits nothing; search addendum stays in the description;for_run_steppreserves the stash; per-run_announced_toolsreset (andfor_runreturns self when disabled); local-search announcement (success / disabled / empty / non-search / idempotent); native-search announcement (success / unrelated parts);_extract_discovered_namesedge cases; two end-to-endAgent.runexercises withFunctionModel(catalog-in-instructions + discovery announcement, and an eager tool callable viarun_code).code_mode/_capability.pyandcode_mode/_toolset.py.make lint && make typecheckclean; full suite (109 tests) passes.Related
defer_loading=Truefrom sandboxing #232 -- Tier-1 fix (merged in fix(code_mode): honor Tool Search's deferred-loading contract #240) and the Tier-2 outline this implementsctx.enqueue/agent_run.enqueue) pydantic-ai#4980 -- pending message queue (theenqueueprimitive this rides on)SystemPromptParts as XML-wrappedUserPromptParts for Anthropic/Google pydantic-ai#5437 -- mid-conversationSystemPromptPartmapping fix (closes the cross-provider cache loop)