Add StepPersistence capability for step-event durability across delegates#251
Add StepPersistence capability for step-event durability across delegates#251dsfaccini wants to merge 19 commits into
StepPersistence capability for step-event durability across delegates#251Conversation
…ates Supersedes PR #176 (SessionPersistence): orchestrators like pydanty need visible event trails for delegate runs that may time out before a "save full session after the run" hook can fire, and need to continue or fork a delegate's prior investigation without rediscovering context. A single after-run snapshot is too coarse for that use case. The capability now records (a) append-only StepEvents at every boundary (run/model-request/tool-call start, completion, failure), (b) a ContinuableSnapshot only when message history is provider-valid (every ToolCallPart has a matching ToolReturnPart / RetryPromptPart) — saved mid-run after CallToolsNode and at after_run, and (c) a ToolEffectRecord ledger so a run killed between before_tool_execute and after_tool_execute leaves an `unknown_after_crash`-style record rather than a falsely-continuable snapshot. Lineage metadata (parent_run_id, agent_name) ties delegate runs back to their orchestrator. `continue_run` / `fork_run` helpers load the latest continuable snapshot for a run. Backends: InMemoryStepStore (tests) and FileStepStore (JSONL events + JSON snapshots, with run_id path-traversal validation and anyio.to_thread for blocking I/O). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
StepPersistence capability for step-event durability across delegates
… model
Correctness fixes from pydanty's PR review:
- FileStepStore: snapshot filenames are now a per-run monotonic counter,
not `ctx.run_step` — `run_step` resets each Agent.run, so re-using a
`run_id` across calls would let an earlier run's higher step-index
snapshot mask a later run's lower-step-index one.
- StepStore.get_tool_effect now takes both `run_id` and `tool_call_id`.
TestModel and other providers can reuse deterministic tool-call ids
across runs; the previous unscoped lookup let one run's effect leak
into another's record (including `started_at`).
- is_provider_valid now rejects orphan, duplicate, and out-of-order
tool returns — the old `set.discard` pattern silently accepted any
return regardless of whether a matching call was open.
Identity model:
- `run_id` resolution: explicit > `{agent_name}-{8-char-hex}` > UUID.
Materialised per Agent.run in `for_run`, so reusing one capability
instance never silently merges runs.
- `parent_run_id` auto-inferred via a module-level ContextVar set in
`wrap_run`, so an orchestrator's tool that synchronously calls
`delegate.run(...)` produces a delegate `RunRecord.parent_run_id`
pointing at the orchestrator's `run_id` with zero threading. Explicit
`parent_run_id=` still wins.
- `conversation_id` propagated to `StepEvent` and `RunRecord`;
`store.list_runs(conversation_id=..., parent_run_id=...)` supports
filtering by either or both. Mirrors pydantic_ai's three-level
identity (conversation -> run -> step) so "run 1, run 2, run 3" of
one dialogue is queryable as a group via `conversation_id`.
- `continue_from=` field dropped from the capability. Continuation is
now only via `continue_run(store, run_id=...)` -> standard
`Agent.run(message_history=...)`. One way to pass history into
pydantic_ai, no parallel capability flag.
README rewritten around the final API. New sections: three-level
identity, run lineage with auto-inferred parent, inspecting a run tree,
failure recovery.
Tests: 168 total (up from 64), 100% branch coverage on the package.
New coverage for the snapshot seq counter, cross-run tool-effect
isolation, orphan/duplicate/out-of-order return rejection, ContextVar
parent inference across nested agent.run, conversation_id propagation,
and the agent_name-derived run_id default.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
David's AICA here: pushed Correctness (P1/P2/P3 from pydanty):
Identity model:
Tests: 168 total (was 64), 100% branch coverage on the package. README rewritten around the final API: new sections on the three-level identity model, run-tree inspection, failure recovery, and the OpenTelemetry: the prior research agent confirmed every boundary we touch is already spanned by pydantic_ai's |
… metadata Correctness: - is_provider_valid no longer rejects non-tool RetryPromptParts. Pydantic AI emits `RetryPromptPart(tool_name=None)` for output-validation failures and providers map those as plain user messages, not tool results. The previous check required every RetryPromptPart to resolve an open tool call, so a run with one output retry produced no final continuable snapshot despite being fully valid. - StepStore.list_runs now guarantees chronological (started_at ascending) ordering across both backends. FileStepStore was previously returning directory-name order (lexicographic), so the README's `[-1]` pattern for "latest run in conversation" could pick the older run when run ids did not sort by recency. - after_tool_execute and on_tool_execute_error preserve idempotency_key and effect_summary from the prior `started` record. Previously the terminal record was written without those fields, so any annotation the tool body wrote was lost on completion. - from_spec raises ValueError for unknown backends instead of silently falling back to in-memory storage. For a persistence capability, turning a typo into accidental non-durability is the wrong failure mode. API: - New annotate_tool_effect(store, ctx, *, idempotency_key=None, effect_summary=None) helper. Tool bodies that write external state call it to attach idempotency + effect metadata to the in-flight ToolEffectRecord without knowing the (run_id, tool_call_id) plumbing. Resolves run_id from a ContextVar set by wrap_run; reads tool_call_id / tool_name from RunContext. - ContextVar moved from `_capability.py` into a new `_context.py` module so the helper and the capability can share it without circular imports and without crossing the private-name barrier. Docs: README fixes a non-existent `list_runs(agent_name=None)` call, documents the chronological-ordering guarantee, and replaces the hand-wavy "populate fields on the ToolEffectRecord" line with a concrete `annotate_tool_effect` example. Tests: 178 total (was 168), 100% branch coverage on the package. Added coverage for non-tool retry acceptance, chronological list_runs on both backends, metadata preservation across completed/failed transitions, annotate_tool_effect under realistic agent.tool, and from_spec backend validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
David's AICA here: pushed Correctness:
API:
README: fixed the non-existent Tests: 178 total (was 168), 100% branch coverage. New coverage for non-tool retry acceptance, chronological |
|
Are you planning to add other backends like SQL or DynamoDB? And how would you handle files (e.g. BinaryContent) in the message history? @dsfaccini |
…r layout The class docstring still showed snapshots/<step_index>.json from the pre-fix layout, but both the README and _next_snapshot_seq document the monotonic counter. Bring the class docstring in line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…haring Pydanty round-3 review: - README continuation and lineage examples queried `list_runs(conversation_id=...)` on conversations the earlier `.run(...)` calls never set, so the examples crashed with IndexError on `[-1]`. Pass the conversation_id to the earlier calls so the lookup actually works. - The capability docstring claimed reusing a `StepPersistence` instance across `Agent.run` calls does NOT share the id. That is true only for the auto-derived (`agent_name`-prefixed or `ctx.run_id`) cases — an explicit `run_id=` is shared across every `.run()` by design, since that is the orchestrator pattern where the caller owns one logical identity across turns. Rewrite the resolution-order docs to spell out which cases share and which don't, and when to pick each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
David's AICA here: round-3 cleanup pushed in
All checks still green. |
…call, not shared)
Pydanty round-4 review: the prior round documented explicit `run_id` as
shared across `.run()` calls on one capability instance — that framing
caused real correctness gaps. The `ToolEffectRecord` ledger is keyed by
`(run_id, tool_call_id)` and providers reuse deterministic tool-call ids
(e.g. `TestModel` emits `pyd_ai_tool_call_id__{name}`), so a second
`.run()` overwrites the first's effect record under the same key — the
`unknown_after_crash` signal from turn 1 disappears when turn 2 lands.
Realign:
- `run_id` is per-`Agent.run`, matching `pydantic_ai.RunContext.run_id`.
- For multi-turn logical grouping, use `conversation_id=` on
`Agent.run(...)` — that is the pyai-native primitive. The orchestrator
pattern is `conversation_id='orch'` with each turn auto-deriving its
own `run_id`.
- Explicit `run_id=` remains supported but is documented as single-shot
(testing, replay, debugging). Reusing it across calls is a caller
contract violation, not an implementation feature.
Code is unchanged — the implementation was already correct under the
right contract. Only the docs were misleading.
Tests:
- `TestRunIdIsPerCall::test_multi_turn_orchestrator_uses_conversation_id`
exercises the recommended pattern: three turns sharing a
`conversation_id`, three distinct auto-derived `run_id`s, all
queryable as a group.
- `TestRunIdIsPerCall::test_explicit_run_id_reuse_collides_ledger` locks
down the misuse contract: reusing one explicit `run_id` across two
`.run()` calls produces colliding effect records under the
`(run_id, tool_call_id)` key. The behavior is documented; the test
exists so a future refactor cannot silently change it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
David's AICA here: round-4 cleanup pushed in Pydanty caught a real correctness gap in my prior framing: the README said explicit Aligned with pydantic_ai's identity semantics:
Code is unchanged — the implementation was already correct under the right contract. Only the README + capability docstring were misleading. Two new tests:
94 step-persistence tests, 100% branch coverage on the package. Lint + typecheck + CI all green. |
Pyai-aligned review flagged this as a P3 explainer: pydantic_ai already has three single-slot cross-run signals (RUN_ID_BAGGAGE_KEY, ctx.run_id, _CURRENT_RUN_CONTEXT). All three get overwritten by the inner Instrumentation.wrap_run before any nested capability can see the parent identity. A separate harness-local ContextVar, snapshotted before our own wrap_run rebinds it, is the only correct mechanism today. Spell this out so the next reader doesn't try to 'simplify' it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
David's AICA here: pyai-aligned subagent review came back. Verdict: none blocking merge. The subagent read both trees (
Follow-ups opened (none merge-blocking):
Plus the explainer commit Identity model + 178/180 test suite still green. Ready for review. |
Pydanty round-5 review accepted the docs-only contract but flagged that "documented but not enforced" is a soft spot. Enforce it: `before_run` calls `store.get_run(run_id=...)` when the user supplied an explicit `run_id`, and raises `ValueError` if a record with that id already exists. The auto-derived cases cannot trigger this check (each call materialises a fresh id in `for_run`). The check is one extra store read per Agent.run when an explicit run_id is set, only. The error message points the caller at `conversation_id` for multi-turn grouping. Test renamed from `test_explicit_run_id_reuse_collides_ledger` to `test_explicit_run_id_reuse_raises` — asserts the second `.run()` raises and the first run's records survive untouched. README + capability docstring updated: the misuse path is now "raises" not "caller's contract." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
David's AICA here: pushed The reviewer accepted the docs-only contract but noted "documented but not enforced." Enforce: Error message points the caller at One extra store read per 94 step-persistence tests, 100% branch coverage, lint + typecheck + CI green. Also: PR #176 closed in favour of this one (with your earlier permission). |
Two patterns that match the existing CLAUDE.local.md ignore convention: - AGENTS.local.md — canonical local-instructions file (CLAUDE.local.md is symlinked to it where the worktree follows the same AGENTS.md/CLAUDE.md symlink pattern). - .agents/skills/branch-context/ — per-worktree decisions log (`pr-decisions.md`) and the skill's local SKILL.md. Pattern lifted from `~/pydantic/ai/base/.claude/skills/branch-context/` where pyai uses an identical setup. Neither is intended to land in PRs — they record cross-iteration design calls so future AICA sessions in this worktree don't silently undo them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new `pydantic_ai_harness.media` package (MediaStore protocol + DiskMediaStore / SqliteMediaStore / S3MediaStore) and wires it into the file/sqlite step-persistence backends so large BinaryContent payloads get externalized out of snapshot JSON / table rows by default. Defaults are zero-config: FileStepStore writes blobs under `<root>/media/<sha256>.bin`; SqliteStepStore writes them to a sibling `media` table in the same DB. Threshold is 64 KiB and URI scheme is `media+sha256://<hex>` so blobs are content-addressed across stores. Pass `media_store=None` to keep bytes inline, or a custom `MediaStore` to redirect (e.g. `S3MediaStore` for R2 / AWS / MinIO). S3MediaStore handrolls SigV4 over httpx to avoid a botocore/boto3 dependency. Verified working against Cloudflare R2. `StepPersistence.from_spec(backend='sqlite', database=...)` now resolves. 180 → 261 tests, 100% branch coverage maintained. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
David's AICA here: pushed New: New: Wired into existing stores by default: S3-compatible store works against AWS / R2 / MinIO. Handrolled SigV4 (no # Zero-config — both backends Just Work for media
StepPersistence.from_spec(backend='sqlite', database='runs.db')
# Override the store (e.g. ship media to R2, keep the SQLite ledger local)
from pydantic_ai_harness.media import S3MediaStore
SqliteStepStore(
database='runs.db',
media_store=S3MediaStore(
bucket='...', endpoint='https://<acc>.r2.cloudflarestorage.com',
region='auto', access_key_id='...', secret_access_key='...',
),
)What's deliberately not in this PR:
180 → 261 tests, 100% branch coverage. Lint + typecheck + CI green. Updated Branch-context skill logged the load-bearing decisions (URI scheme, threshold, sqlite schema, S3 v1 scope, deferred MediaExternalizer). cc @aristide1997 — does this design land where you needed it? |
Adds replay-driven tests under `tests/media/test_s3_cassettes.py` that exercise `S3MediaStore.put/get/exists` against pre-recorded Cloudflare R2 responses. CI runs them without any S3 creds via the committed cassettes under `tests/media/cassettes/`. Sanitisation policy: - `before_record_request`/`before_record_response` swap the real R2 account-id subdomain and bucket name for fixed placeholders (`account.r2.cloudflarestorage.com`, `harness-test-bucket`) - `Authorization` and `x-amz-date` filtered to `REDACTED` - CF-RAY, x-amz-version-id, x-amz-checksum-*, x-amz-request-id headers dropped (none load-bearing for tests; some carry identifying info) - Non-2xx response bodies blanked (R2's gzipped XML error envelope leaks the bucket name; our code only checks status code) The `s3_credentials` fixture uses `os.environ.get(NAME, PLACEHOLDER)` per field, so real R2 creds are used when recording locally with `.env` loaded, and the placeholder constants match the scrubbed cassettes during replay. Because the placeholders are fixed, any scrubber miss during a future re-record shows up as a replay URL mismatch — built-in canary against credential / private-data leakage in committed cassettes. Adds `pytest-recording` (pulls `vcrpy`) to the dev deps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
David's AICA here: pushed Verified live first. Before recording cassettes I ran a manual smoke against your R2 bucket (PUT → HEAD → GET → bytes-back) to confirm the SigV4 implementation actually works end-to-end. The previous round only had mock-transport tests, which is too weak a signal for a handrolled signer. Now in CI. Five cassettes under Sanitisation:
The To re-record (when SigV4 / store behavior changes): (R2 only accepts Coverage: still 100% branch (266/266 tests passing, 2 skipped — the original Docs/HTML touched as you asked, not exploded:
Re your question on blobs without |
Adds `MediaStore.public_url(uri) -> str | None` plus a `public_url=` constructor parameter on every concrete store. The parameter accepts a sync or async callable; the store auto-detects and awaits if needed. This is the bottom-layer primitive for the forthcoming `MediaExternalizer` capability — that capability will call `store.public_url(...)` per externalized blob and swap `BinaryContent` for `ImageUrl` / `AudioUrl` parts before the model sees the message. The callable shape covers both static URLs (public bucket / CDN — use `make_static_public_url` helper) and dynamic URLs (presigned, per-request signing — pass any async callable with TTL captured in its closure). Why a callable rather than a static config: a public bucket's URL host is not derivable from the bucket creds (R2 public buckets use `pub-<hash>.r2.dev`, AWS public buckets use a different scheme than the path-style endpoint we sign for). The URL is always user-supplied information, so a callable is the right primitive — same shape for the static and presigned cases, and `get` stays untouched (it serves the harness's internal byte fetch, not the model's external HTTP fetch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… operations Adds `MediaContext` (frozen, kw-only dataclass with `media_type`, `filename`, `metadata`) and threads it through every `MediaStore` method and both user callables (`PublicUrlResolver`, `KeyStrategy`). New context fields can be added non-breakingly; existing call sites and resolvers keep working. Also adds: - `KeyStrategy = Callable[[str, MediaContext], str]` for per-store layout control. Default `default_key_strategy` produces `<sha256>.bin`. Disk store validates the result against `..` traversal. - `metadata` persistence on `SqliteMediaStore` (new JSON column) and `S3MediaStore` (signed `x-amz-meta-*` headers, ASCII key validation). Disk store explicitly does NOT persist metadata in v1 — sidecar / xattr options each have load-bearing drawbacks; we ship nothing rather than a half-true persistence promise. - `make_static_public_url(...)` updated to the new `(uri, ctx)` signature. The shift is motivated by the same principle as pydantic_ai's `RunContext`: extension via fields on a context bag rather than via breaking signature changes. Every new requirement (TTL hints for presigned URLs, audit ids, response-header overrides, etc.) becomes a field addition, not an API revision. Cassettes from the previous commit replay unchanged — match-on does not include the signed headers and the request URLs are stable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
David's AICA here: pushed What changed
What this implies for the
|
|
hey @aristide1997 that was a good question, I included some storage backends and the ability to roundtrip media automatically. doing my best here to get a good abstraction in. would you feel confident using this branch in your workflows and providing feedback while we release it? I'll merge it when I've done a thorough final review but I'll personally be using it for a while before we actually release it anyway. |
…ategy
Adds `MediaStore.get_metadata(uri) -> Mapping[str, str]` to the protocol
and implements it on all three concrete stores:
- `DiskMediaStore`: writes a sidecar `<resolved>.meta.json` alongside
the blob on put (atomic tmp + rename), reads it back on
`get_metadata`. Returns `{}` when no metadata was supplied. v1 had
documented this as a deliberate gap — sidecar JSON is straightforward
and the xattr / ADS drawbacks don't apply.
- `SqliteMediaStore`: `SELECT metadata FROM <table> WHERE sha256=?` +
`json.loads`. Raises `FileNotFoundError` for unknown URIs.
- `S3MediaStore`: HEAD + collects `x-amz-meta-*` response headers,
strips the prefix. Reuses the existing 404 / non-2xx error shape.
Drops `key_strategy=` from `SqliteMediaStore`. The digest is the primary
key by content-addressing construction — a user-chosen key would either
break dedup or be a no-op. Kept on Disk + S3 where bucket / directory
layout is a real concern.
README + branch-context entries updated to reflect: all three stores
round-trip metadata; key_strategy is Disk + S3 only.
Coverage stays at 100% branch.
|
David's AICA here: post-merge-prep bloat review trimmed Closed (
Dropped:
README + branch-context decision entries updated. Coverage stays at 100% branch; cassettes replay unchanged (0.13s). PR is ready to merge from my end. |
|
David's AICA here: integration review from pydanty after wiring this branch into Verdict: #251 is the right replacement for #176 for pydanty's failure mode. Step events + provider-valid snapshots during the run solve the timeout-before-after-run gap, and the identity model now matches pydantic-ai well: One non-blocking implementation note I found while reading the current branch: Pydanty-side integration notes I am handling in
After those land, the remaining product-level step is to actually use |
|
Thanks for adding the S3 store for binary content, that would undoubtedly come in very handy for chat applications. Regarding supporting any sort of cloud/database storage for text history though: is that something you’d ever consider adding as first-party (e.g. psql), or something you’d want devs to implement themselves? I’m happy to give this a shot if you think the surface is not going to change, is it something you’re not thinking of shipping soon? @dsfaccini |
|
hey @aristide1997 , so: postgres and any other RDBMS should be simple enough to use with the Sqlite one (maybe I can do another iteration to abstract that enough that people can simply pass in their engine (sqlalchemy), I just really wanted to avoid extra deps, so people may need to install that separately) on dynamodb (more broadly non-sql storage backends), I don't use them so I don't feel confident implementing it without either knowing what I'm doing nor actually battle testing it in my own usage. I may have claude include an example snippet for people to create their own NoSql storage adapter |
|
Hi @dsfaccini That makes sense, DynamoDB has quirks that make it slightly harder to implement, like a 400KB snapshot ceiling. What about using S3 as the whole backend, like AWS’s Strands Agents does? Cheap storage, very little infra to manage. https://github.com/strands-agents/sdk-python/blob/main/src/strands/session/s3_session_manager.py |
|
hey @aristide1997 I already added an s3 backed media externalizer so having an s3 storage is no problem at all lol, let me get back to you |
`Path(root) / absolute_path` returns `absolute_path` — the root is silently discarded — so a custom `key_strategy` returning `/etc/passwd` (or similar) escapes the store directory even though the previous check only blocked `..`. Tighten the validator to reject both shapes. Caught by pydanty during its #251 integration review.
|
David's AICA here: pushed
Still 100% branch coverage. Doesn't affect |
# Conflicts: # pydantic_ai_harness/__init__.py # pyproject.toml # uv.lock
…ad-affinity The terminal CallToolsNode already saves the final provider-valid snapshot with the correct step_index. after_run was re-saving the same tail stamped with step_index=0 (ctx.run_step is reset by then), so latest_snapshot reported a misleading step and every run wrote a duplicate. Track whether a node snapshot was taken via a task-local ContextVar and make after_run a fallback that only fires when the run reached no provider-valid boundary. Also document that a caller-owned sqlite connection= must set check_same_thread=False (store SQL runs on anyio worker threads), on both SqliteStepStore and SqliteMediaStore, and correct the WAL-on-every-connection claim. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nown fields S3MediaStore signed `_canonical_uri(path)` (each segment percent-encoded) but sent the raw path, letting httpx apply looser encoding. A custom key_prefix / key_strategy emitting reserved chars (`@`, `(`, `=`, ...) diverged from the signed path -> SignatureDoesNotMatch. Send the canonical bytes via httpx `raw_path` so signer and sender agree. Default `<hex>.bin` keys are unaffected. The externalize/restore walker hardcoded the BinaryContent key set, silently dropping any field pydantic_ai adds upstream. Copy the node and swap only `data` <-> marker keys so unknown fields round-trip. Adds tests for reserved-char path agreement, unknown-field preservation, and restore over a pruned blob. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # pydantic_ai_harness/__init__.py # uv.lock
Summary
Supersedes #176 (
SessionPersistence). Pivots from "save the full session after each run" to step-event persistence + provider-valid continuable snapshots + a tool-effect ledger, and adds a content-addressed media subsystem so snapshots stay small when messages carry largeBinaryContent. Design per the discussion on #176 (comment 1, comment 2).pydanty dogfooding (
pydantic/pydantic-ai#5612) surfaced two failure modes the coarse "load-then-save" design cannot address:after_runfires lose all event trail.Agent.runis treated as independent.step_persistence/— what this deliversStepPersistencecapability emittingStepEvents at every boundary:run_started/completed/failed,model_request_started/completed/failed,tool_call_started/completed/failed.ContinuableSnapshotsaved only when the message history is provider-valid (everyToolCallParthas a matchingToolReturnPartor tool-boundRetryPromptPart; orphan / duplicate / out-of-order returns rejected). Saved at the end of eachCallToolsNode;after_runsaves a fallback snapshot only when the run reached no provider-valid boundary.ToolEffectRecordledger keyed by(run_id, tool_call_id)(started/completed/failed). A run killed betweenbefore_tool_executeandafter_tool_executeleaves astartedrecord with no terminal update — theunknown_after_crashsignal an orchestrator needs before blindly replaying a side-effectful tool.annotate_tool_effect(store, ctx, *, idempotency_key=None, effect_summary=None)for tool bodies that write external state. Resolvesrun_idfrom aContextVarandtool_call_id/tool_namefromRunContext, then merges metadata into the in-flight record.run_idis per-Agent.run(matchesRunContext.run_id). For multi-turn logical grouping useconversation_id=onAgent.run(...).parent_run_idis auto-inferred for in-process delegate runs via aContextVarset inwrap_run.store.list_runs(parent_run_id=..., conversation_id=...)filters by either or both and returns chronological order as a protocol guarantee.continue_run(store, run_id=...)/fork_run(store, run_id=...)return the latest provider-valid snapshot's messages; pass toAgent.run(message_history=...).StepStoreprotocol with three backends:InMemoryStepStore,FileStepStore(JSONL events + per-run monotonic snapshot counter), andSqliteStepStore(single-file: runs/events/snapshots/tool-effects + siblingmediatable; WAL; upsert tool-effects).FileStepStorevalidatesrun_idagainst[A-Za-z0-9_.-]{1,200}(rejecting..) and dispatches blocking I/O viaanyio.to_thread.from_spec(backend='memory'|'file'|'sqlite')with explicitValueErroron unknown backends (no silent fallback to memory).media/— content-addressed offload subsystemKeeps large media out of snapshot JSON; shared URI scheme
media+sha256://<hex>(content-addressed → automatic dedup). Reused later by a plannedMediaExternalizercapability for in-flight wire-payload reduction.MediaStoreprotocol +MediaContext(extensible per-operation bag:media_type,filename,metadata).DiskMediaStore— one file per blob, atomic writes, metadata sidecar, traversal-safekey_strategy. Default backend forFileStepStore.SqliteMediaStore— one row per blob,INSERT OR IGNOREdedup, metadata JSON column. Default backend forSqliteStepStore.S3MediaStore— S3 / R2 / MinIO via path-style URLs + hand-rolled AWS SigV4 (no botocore/boto3),x-amz-meta-*metadata. PUT/GET/HEAD only.externalize_media/restore_media— swap inlineBinaryContent≥ threshold (default 64 KiB) formedia+sha256://markers and back, operating on the serialized JSON shape.public_urlresolvers (make_static_public_url, custom sync/async callables) and pluggablekey_strategy.What this PR explicitly does not deliver
GraphAgentStaterestore (GraphAgentState serialization + resume_from_state() for durable checkpointing #149, Checkpointing capability (save, rewind, and fork conversation state) #196)idempotency_key/effect_summary)Acceptance test
TestCrashMidToolCallContract::test_visible_trail_no_false_continuation_point: a run killed after a tool starts but before a tool return leaves a visible event trail (tool_call_startedwith notool_call_completed, astartedToolEffectRecord) but does not expose that point as a validmessage_historycontinuation —latest_snapshotreturns the prior provider-valid snapshot.Review-comment status from #176
FileSessionStore_validate_idrejects.., separators, empty, oversized IDslist_sessionsdrops.meta-suffixed IDsStepStoreprotocol is async; file/sqlite dispatch viaanyio.to_threadself.session_idignores per-call ctxconversation_id/run_id/step_index) mirrors pydantic_aiTest plan
make lint && make typecheck && make testcovclean — 100% branch coverageAgent(..., capabilities=[StepPersistence(...)])withTestModel(no real model calls)S3MediaStoreexercised through VCR cassettes recorded against Cloudflare R2 (pytest-recording); credentials/signatures scrubbed, with a replay leak-canaryCloses #176.
🤖 Generated with Claude Code