Skip to content

feat: add Planning capability with cache-safe plan reminders#266

Open
strawgate wants to merge 3 commits into
mainfrom
claude/planning-capability
Open

feat: add Planning capability with cache-safe plan reminders#266
strawgate wants to merge 3 commits into
mainfrom
claude/planning-capability

Conversation

@strawgate
Copy link
Copy Markdown
Contributor

Summary

Adds a Planning capability: a model-owned task plan that's surfaced back to the model every turn without ever invalidating the prompt cache.

This is a from-scratch design (informed by reviewing #180), built on existing core primitives — no Pydantic AI core change required.

Design

  • One tool, write_plan(items) — whole-plan replacement. The model passes the full ordered list every call. This eliminates the index/parent_index bookkeeping that incremental-edit designs need (and the bugs that come with it), is concurrency-safe, and matches the most battle-tested agent todo tools. Status enum: pending / in_progress / completed / cancelled. No get_plan tool — the plan is already in context via the reminder.

  • Cache-safe surfacing. The current plan is injected as an ephemeral tail reminder in wrap_model_request, which runs after core persists the durable history (_agent_graph.py:911) and whose per-request message list is never written back. So the reminder reaches the model but never enters message_history — no accumulation, no stale copies. A CachePoint is placed immediately before the reminder, so the cached prefix (tools + system + real conversation) stays byte-identical turn over turn and only the small reminder falls outside the cache.

  • The mutable plan is never put in instructions/system prompt (that's the cache-busting path). Only static usage guidance goes there, which is cache-stable.

Why not dynamic instructions

Injecting the plan into the system prompt re-reads the whole conversation at full token price on every plan edit. Even with core's static/dynamic instruction split (which protects the tools+static-system prefix on Anthropic), the system precedes messages in cache order, so the message-history cache still busts. The ephemeral tail reminder avoids that entirely.

Files

  • pydantic_ai_harness/planning/{_capability,_toolset,__init__}.py + README.md
  • tests/planning/test_planning.py
  • public export from pydantic_ai_harness

Tests

  • 22 tests, 100% branch coverage of the package.
  • Covers whole-plan replacement, per-run isolation, and — end-to-end via FunctionModel — that the reminder reaches the model with a CachePoint in front of it and is absent from result.all_messages() (i.e. genuinely ephemeral, never persisted).

Validation

make lint clean; pyright clean on the package; full suite green.

https://claude.ai/code/session_01LQ9NTr8q95A99UnVcq6Nzf


Generated by Claude Code

claude added 3 commits June 2, 2026 00:01
Gives agents a model-owned task plan via a single whole-plan-replacement
`write_plan` tool, avoiding the index-bookkeeping fragility of incremental
edits.

The current plan is surfaced back to the model as an ephemeral reminder
appended to the tail of each request in `wrap_model_request` (which runs after
the durable history is persisted, so the reminder never enters
`message_history`), with a `CachePoint` placed before it. The cached prefix
therefore stays byte-identical across turns -- the plan is always fresh in the
model's view without ever invalidating previously-cached tokens. The mutable
plan is deliberately never injected into instructions/system prompt, which
would bust the cache; only static guidance goes there.

Includes a README, agent-spec serialization, and tests (100% branch coverage)
covering the ephemerality and cache-breakpoint placement end-to-end.
… coverage

The CachePoint-not-persisted test could never execute its assertion under
TestModel (no list-content UserPromptParts survive in durable history),
leaving line 300 uncovered. Ephemerality is already proven by the FunctionModel
test asserting the reminder is absent from all_messages(); the CachePoint only
ever rides that reminder, so the separate check was redundant.
Mutation testing surfaced 4 weak assertions that used substring checks where
exact output mattered: the render_plan line separator and the write_plan
in-progress note (empty vs present, wording, and wrapping). Assert exact
strings / exact tails instead. Kill rate 30/36 -> 34/36; the 2 remaining
survivors are equivalent mutants (add_function name= falls back to the
method's __name__, which equals the explicit name).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants