feat: add Planning capability with cache-safe plan reminders by strawgate · Pull Request #266 · pydantic/pydantic-ai-harness

strawgate · 2026-06-02T00:01:27Z

Summary

Adds a Planning capability: a model-owned task plan that's surfaced back to the model every turn without ever invalidating the prompt cache.

This is a from-scratch design (informed by reviewing #180), built on existing core primitives — no Pydantic AI core change required.

Design

One tool, write_plan(items) — whole-plan replacement. The model passes the full ordered list every call. This eliminates the index/parent_index bookkeeping that incremental-edit designs need (and the bugs that come with it), is concurrency-safe, and matches the most battle-tested agent todo tools. Status enum: pending / in_progress / completed / cancelled. No get_plan tool — the plan is already in context via the reminder.
Cache-safe surfacing. The current plan is injected as an ephemeral tail reminder in wrap_model_request, which runs after core persists the durable history (_agent_graph.py:911) and whose per-request message list is never written back. So the reminder reaches the model but never enters message_history — no accumulation, no stale copies. A CachePoint is placed immediately before the reminder, so the cached prefix (tools + system + real conversation) stays byte-identical turn over turn and only the small reminder falls outside the cache.
The mutable plan is never put in instructions/system prompt (that's the cache-busting path). Only static usage guidance goes there, which is cache-stable.

Why not dynamic instructions

Injecting the plan into the system prompt re-reads the whole conversation at full token price on every plan edit. Even with core's static/dynamic instruction split (which protects the tools+static-system prefix on Anthropic), the system precedes messages in cache order, so the message-history cache still busts. The ephemeral tail reminder avoids that entirely.

Files

pydantic_ai_harness/planning/{_capability,_toolset,__init__}.py + README.md
tests/planning/test_planning.py
public export from pydantic_ai_harness

Tests

22 tests, 100% branch coverage of the package.
Covers whole-plan replacement, per-run isolation, and — end-to-end via FunctionModel — that the reminder reaches the model with a CachePoint in front of it and is absent from result.all_messages() (i.e. genuinely ephemeral, never persisted).

Validation

make lint clean; pyright clean on the package; full suite green.

https://claude.ai/code/session_01LQ9NTr8q95A99UnVcq6Nzf

Generated by Claude Code

Gives agents a model-owned task plan via a single whole-plan-replacement `write_plan` tool, avoiding the index-bookkeeping fragility of incremental edits. The current plan is surfaced back to the model as an ephemeral reminder appended to the tail of each request in `wrap_model_request` (which runs after the durable history is persisted, so the reminder never enters `message_history`), with a `CachePoint` placed before it. The cached prefix therefore stays byte-identical across turns -- the plan is always fresh in the model's view without ever invalidating previously-cached tokens. The mutable plan is deliberately never injected into instructions/system prompt, which would bust the cache; only static guidance goes there. Includes a README, agent-spec serialization, and tests (100% branch coverage) covering the ephemerality and cache-breakpoint placement end-to-end.

… coverage The CachePoint-not-persisted test could never execute its assertion under TestModel (no list-content UserPromptParts survive in durable history), leaving line 300 uncovered. Ephemerality is already proven by the FunctionModel test asserting the reminder is absent from all_messages(); the CachePoint only ever rides that reminder, so the separate check was redundant.

Mutation testing surfaced 4 weak assertions that used substring checks where exact output mattered: the render_plan line separator and the write_plan in-progress note (empty vs present, wording, and wrapping). Assert exact strings / exact tails instead. Kill rate 30/36 -> 34/36; the 2 remaining survivors are equivalent mutants (add_function name= falls back to the method's __name__, which equals the explicit name).

claude added 3 commits June 2, 2026 00:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Planning capability with cache-safe plan reminders#266

feat: add Planning capability with cache-safe plan reminders#266
strawgate wants to merge 3 commits into
mainfrom
claude/planning-capability

strawgate commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

strawgate commented Jun 2, 2026

Summary

Design

Why not dynamic instructions

Files

Tests

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants