Add Memory capability with pluggable storage backends by DouweM · Pull Request #179 · pydantic/pydantic-ai-harness

DouweM · 2026-04-10T01:02:32Z

Summary

Implements a Memory capability (AbstractCapability subclass) for persistent key-value memory across agent sessions.

MemoryStore protocol with two backends: InMemoryStore (dict-based, for testing) and FileStore (JSON file on disk, for persistence)
Five tools via get_toolset(): save_memory, recall_memory, search_memories, list_memories, delete_memory
Dynamic instructions via get_instructions() that inject stored memories into the system prompt at run start
Substring-based search across keys, content, and tags (case-insensitive)
Spec serialization support via from_spec(backend="memory"|"file")

Closes #30

Test plan

48 tests covering all code paths (MemoryEntry, InMemoryStore, FileStore, Memory capability, tool functions, instructions, protocol conformance)
ruff check and ruff format pass
pyright strict mode passes with 0 errors
All existing tests still pass

🤖 Generated with Claude Code

Implements a Memory capability (AbstractCapability subclass) for persistent key-value memory across agent sessions, addressing #30. - MemoryStore protocol with InMemoryStore (dict-based, for testing) and FileStore (JSON file on disk, for persistence) backends - Five tools via get_toolset(): save_memory, recall_memory, search_memories, list_memories, delete_memory - Dynamic instructions via get_instructions() that inject stored memories into the system prompt at run start - Substring-based search across keys, content, and tags - Spec serialization support (Memory.from_spec with backend="memory"|"file") - 48 tests covering all code paths, passing lint, format, and typecheck Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Address audit findings from PR review: - Better search: word-boundary matching with relevance scoring (count of matching words across key/content/tags, sorted by score descending). Underscores and hyphens treated as word separators. - Memory scoping: `scope: str = 'global'` field on MemoryEntry, with optional `scope` parameter on `search_memories` and `list_memories` tools and `list_all`/`search` store methods. - TTL/expiration: `expires_at: str | None = None` on MemoryEntry with `is_expired()` method. Stores filter out expired entries automatically. `save_memory` tool accepts optional `ttl_minutes` parameter. - Dedup warning: when saving a memory whose key is very similar to an existing key (same 10-char prefix, Levenshtein distance <= 2), log a warning via the `pydantic_harness.memory` logger. Tests: 48 -> 99, all passing with 100% coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…Any types Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… and FileStore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… backend Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…L, and conformance Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- personal_assistant.py: FileStore persistence, preferences, instructions injection - study_coach.py: TTL/spaced repetition, tags, search - coding_assistant.py: procedural memory, rules, search, delete All examples assert on memory state and are instrumented with logfire spans. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-04-10T01:07:06Z

+    def get(self, key: str) -> MemoryEntry | None:
+        """Retrieve a memory entry by key."""
+        return self._entries.get(key)


🚩 Expired entries are never cleaned up from storage

The store-level get() method (_BaseDictStore.get at line 186) returns entries regardless of expiration status. Filtering is only done in list_all, search, and the recall_memory tool. This means expired entries accumulate indefinitely in both InMemoryStore (memory leak) and FileStore (disk bloat). For short-lived processes this is fine, but long-running agents with TTL-based entries will see unbounded growth. A periodic or lazy cleanup strategy (e.g., purging expired entries on list_all/search or on a timer) would be worth considering.

Was this helpful? React with 👍 or 👎 to provide feedback.

DouweM · 2026-04-10T15:07:54Z

Originally posted by @DouweM in #137 comment (PR was recreated)

Audit vs prior art: Memory

Worth adding now:

Word-boundary search with relevance scoring (substring is too primitive)
Memory scoping/namespaces: scope field on entries + search filtering
TTL/expiration: expires_at on entries
Dedup on save (warn if very similar key/content exists)

Follow-up opportunities:

Vector/embedding backends, SQLite/Redis stores, auto-summarization

DouweM · 2026-04-10T15:07:55Z

Originally posted by @dsfaccini in #137 comment (PR was recreated)

Claude here: We reviewed this PR and pushed several improvements. Here's what changed:

Code Quality (7 commits)

Type Safety

MemoryEntryDict TypedDict — replaced dict[str, Any] in to_dict/from_dict with a proper TypedDict. Eliminated all avoidable Any types (Any remains only in from_spec return Memory[Any], an unavoidable framework constraint).
Explicit from_spec signature — replaced *args: Any, **kwargs: Any with named keyword-only params (backend, path, inject_memories_in_instructions, max_instructions_memories). Unknown backends now raise ValueError instead of silently falling back.

Code Deduplication

Extracted _BaseDictStore base class — InMemoryStore and FileStore shared identical get, list_all, search methods (~40 lines of duplication). Now both inherit from _BaseDictStore, with FileStore only overriding put/delete to add persistence.

Robustness

Graceful FileStore._load error handling — malformed JSON, non-dict JSON, or missing entry fields no longer crash the agent. Logs a warning and starts with an empty store instead.

Style

Replaced all RST-style double backticks with markdown single backticks in docstrings.
Fixed default_factory=list[str] (a GenericAlias, not a callable) to proper form that satisfies pyright strict.

Tests (48 → 119)

Added edge case tests for:

_score_entry: regex metacharacters in queries, underscore/hyphen word boundaries, partial word matches, empty word list
_simple_similarity: edit distance boundary (exactly 3 = rejected), 9-char keys (below threshold), 10-char keys
format_entry: empty key, empty content
build_instructions: exact max boundary (overflow == 0)
save_memory: TTL=0 immediate expiration
from_spec: unknown backend raises, explicit backend='memory', forwarded options
FileStore._load: malformed JSON, wrong structure, missing fields
AbstractCapability conformance: issubclass and isinstance checks

Examples (3 scripts)

All instrumented with Logfire, assert on memory state:

examples/memory/personal_assistant.py — FileStore persistence across sessions, preferences with tags/scoping, instructions injection, preference updates
examples/memory/study_coach.py — TTL/spaced repetition (facts expire after 1 min), tag-based search, list filtering
examples/memory/coding_assistant.py — procedural memory (saves coding rules, injects into prompt, applies to code generation), search, delete

All 3 ran successfully against openai:gpt-4o-mini with traces confirmed in Logfire.

The example scripts use logfire (not in any dep group) and other runtime-only imports. Strict pyright on them blocks pre-commit hooks without adding value — examples are illustrative, not part of the typed library surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

_BaseDictStore.get() now filters expired entries (consistency with list_all/search) and FileStore._save() drops expired entries before writing, so long-running file-backed agents no longer accumulate dead records on disk. Tightens is_expired docstring with wall-clock semantics. TTL=0 now correctly results in an immediately-invisible entry (test updated accordingly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eMemoryStore Avoids collision with PR #176's SessionPersistence (InMemorySessionStore / FileSessionStore) and pydantic-ai's binary file-store branch. The <Implementation>MemoryStore convention makes the domain explicit at import sites — `DictMemoryStore` describes what backs the store (`dict`), `FileMemoryStore` describes its persistence target. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e to MemoryEntry Five additive fields aligning MemoryEntry with the structured-record conventions in Mem0/LangGraph/Letta: - summary: short version preferred over content for system-prompt injection (used by build_instructions in a follow-up commit) - metadata: JSON-serializable structured attributes for filterable search (used by search filter= in a follow-up commit) - read_only: pin entry against modification by save_memory/delete_memory tools — useful for system-curated facts (persona, policies) - char_limit: hard cap on content length, enforced at MemoryEntry construction; raises ValueError when exceeded - importance: relevance booster for search scoring The save_memory tool now accepts summary and importance as optional LLM-facing parameters; metadata, read_only, and char_limit are dev-only (set via direct MemoryEntry construction). save_memory and delete_memory refuse to modify read_only entries with a clear message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Borrowed from LangGraph BaseStore. Hierarchical namespaces enable patterns like ('users', 'alice', 'prefs') and ('agents', 'planner', 'facts') that flat scope strings can't represent. Filters use prefix matching: namespace=('users',) matches all entries under that root. Changes: - MemoryEntry.scope (str, default 'global') → namespace (tuple[str, ...], default ('global',)) - MemoryStore.list_all/search now take namespace=tuple[str, ...] | None - New MemoryStore.list_namespaces(prefix, suffix, max_depth) returns unique namespaces in the store, sorted - save_memory/search_memories/list_memories tools accept list[str] and coerce to tuple internally - format_entry shows nested namespaces as 'a/b/c'; the default ('global',) is still omitted for brevity - Added _namespace_matches helper for prefix-match logic - Tests cover prefix matching, max_depth truncation, suffix filtering, and persistence of nested namespaces Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…earch Three additions to MemoryStore.search and list_all (filter only on list_all): - filter: dict[str, object] | None — equality match against MemoryEntry.metadata. Drops entries that don't match all keys. - recency_scorer: Callable[[MemoryEntry], float] — pluggable per-call recency boost added to the base keyword-match score. Built-in exponential_decay(half_life_days, weight) factory ships as the Memory capability default (30-day half-life, weight 0.5). Set to None to disable. - entry.importance: float | None — when set, added to the search score unconditionally so devs/agents can pin entries above keyword matches. Score formula: keyword_match_count + (importance or 0) + (recency or 0). Entries with zero keyword match are still excluded — recency and importance only re-rank within the matched set. Also adds RecencyScorer type alias, both exported from pydantic_harness for custom scorers (e.g. linear decay, importance-only ranking). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…in instructions build_instructions now: - Prefers entry.summary over entry.content when injecting into the system prompt (via format_entry's new prefer_summary flag), keeping token budgets predictable for entries with long bodies. - Respects an optional byte_budget: int | None on Memory — non-pinned entries are skipped once adding the next would exceed the cap (UTF-8 bytes). - Always injects read_only=True entries regardless of count cap or byte budget. Pinned entries are listed first. Selection precedence: pinned (always) → up to max_instructions_memories non-pinned, capped by byte_budget when set. Overflow is reported with a "... and N more" suffix nudging the LLM toward search/list tools. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per-tool description override at toolset construction. Keys are tool names (save_memory, recall_memory, search_memories, list_memories, delete_memory); values replace the docstring used by the LLM. Useful for nudging agent behavior toward project-specific conventions (e.g., "Save aggressively, even small facts"). Borrowed from pydantic-deepagents AgentMemoryToolset. Tools without an override fall back to their docstring as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…res_store.py Reference implementation of MemoryStore against Postgres via psycopg. ~150 LOC. Schema: key TEXT PK, namespace TEXT[], data JSONB. Implements all six Protocol methods (get, put, delete, list_all, search, list_namespaces). Filtering happens in SQL (namespace prefix via array slicing, metadata equality via JSONB ops); keyword scoring runs in Python after the DB pre-filter, matching DictMemoryStore semantics. Documented as a starting point — production users will want connection pooling, schema migrations, and full-text or pgvector ranking. Also exempts examples/ from the ruff D (pydocstyle) ruleset, mirroring the existing tests/ exemption — example scripts are illustrative, not API surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- docs/capabilities/memory.md — full user-facing capability docs covering quick start, MemoryEntry shape, built-in vs custom backends, namespaces, multi-agent shared store, search/recency, prompt-cache trade-off, tool description overrides, custom backends, and the known followups list (semantic retrieval, dedup hook). - Memory class docstring gains a multi-agent shared-store example so the pattern is discoverable from autocomplete without reading docs. - PLAN.md fully refreshed: reflects all current fields, the namespace tuple, recency scoring, byte budget, read_only pinning, the Postgres reference, and the Future Work followups. - docs/capabilities/index.md links the new memory page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

build_instructions now skips re-injecting an entry when the LLM has already seen its current value via a save_memory call in this run's tool history. Default ON (dedup_recent_saves=True); disable per-Memory instance via the flag. The check is content-aware: scans ctx.messages for ToolCallParts named 'save_memory', tracks the last (key, content) saved per key, and only suppresses an entry when entry.content == last_saved_content. If something updated the entry externally (another agent, manual store mutation, etc.), the saved content no longer matches the store, so we inject to let the LLM see the current value. read_only entries always inject regardless of dedup. This is the previously-deferred "tool-history dedup" followup; the content-equality safeguard resolves the in-run-update gotcha. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adapt to the pydantic_harness → pydantic_ai_harness package rename: relocate memory.py to pydantic_ai_harness/memory/_capability.py with __init__.py re-exports, move docs/capabilities/memory.md to the package's README.md, wire Memory into the top-level __init__'s lazy loader alongside CodeMode, and update test/example imports. Pyproject exclude list aligned with main; logfire.instrument_openai() typing gaps in examples silenced with targeted pyright ignores. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hartungstenio · 2026-05-22T16:28:45Z

+
+
+@runtime_checkable
+class MemoryStore(Protocol):


Great addition!

One thing I'd love to see here is async support for MemoryStore.

Each method could optionally return an Awaitable (or have another protocol, AsyncMemoryStore), following the same pattern already used elsewhere in Pydantic AI:

existing = memory_store.get(...) if is_awaitable(existing): existing = await existing

This would let users plug in async backends (Redis, a database, a vector store) without blocking the event loop, while keeping sync implementations like DictMemoryStore working as-is.

hartungstenio · 2026-05-22T16:31:32Z

+class MemoryStore(Protocol):
+    """Protocol for pluggable memory storage backends."""
+
+    def get(self, key: str) -> MemoryEntry | None:  # pragma: no cover


It would also be useful if the protocol received the run context (or deps), so the store can scope memories per user, tenant, or session without the caller having to manage namespaces manually.

I guess I could do this in my own MemoryStore, like

capabilities = [Memory(store=ScopedMemoryStore(scope=my_user))]

🤔

hartungstenio · 2026-05-22T16:36:26Z

+class _MemoryEntryDictRequired(TypedDict):
+    """Required fields for MemoryEntryDict."""
+
+    key: str
+    content: str


If feasible, using a single TypedDict with typing.Required (or typing_extensions.Required) instead of the _MemoryEntryDictRequired base class would be more idiomatic.

Add persistent memory capability for Pydantic AI agents. This capability provides five tools: memory_store, memory_retrieve, memory_list, memory_delete, and memory_compact. Key features: - SQLite backend with FTS5 full-text search - AbstractMemoryBackend interface for custom storage engines - Tag-based filtering and glob pattern matching - Access tracking and automatic compaction - 18 tests covering models, backend, and edge cases Implements: pydantic#179 (Persistent key-value memory)

DouweM and others added 10 commits April 2, 2026 05:28

refactor(memory): add MemoryEntryDict TypedDict, eliminate avoidable …

d9ce688

…Any types Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor(memory): extract _BaseDictStore to deduplicate InMemoryStore…

63cd254

… and FileStore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(memory): handle malformed JSON gracefully in FileStore._load

f9b1066

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor(memory): make from_spec signature explicit, raise on unknown…

7ddf098

… backend Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test(memory): add edge case tests for scoring, similarity, format, TT…

11e944c

…L, and conformance Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore(memory): update exports and plan to reflect review changes

c9dc52c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: remove settings.local.json from tracking, restore original deps

58c70a7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

DouweM requested review from Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin as code owners April 10, 2026 01:02

devin-ai-integration Bot reviewed Apr 10, 2026

View reviewed changes

DouweM assigned dsfaccini Apr 10, 2026

DouweM removed request for Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin April 10, 2026 15:12

DouweM added this to the 2026-05 milestone Apr 23, 2026

dsfaccini and others added 5 commits May 7, 2026 09:23

dsfaccini and others added 7 commits May 7, 2026 09:40

hartungstenio reviewed May 22, 2026

View reviewed changes

This was referenced May 31, 2026

feat(memory): add MemoryCapability with SQLite backend mustafabozkaya/pydantic-ai-harness#1

Open

feat(memory): add MemoryCapability with SQLite backend #263

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Memory capability with pluggable storage backends#179

Add Memory capability with pluggable storage backends#179
DouweM wants to merge 22 commits into
mainfrom
capability/memory

DouweM commented Apr 10, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

hartungstenio May 22, 2026

Uh oh!

hartungstenio May 22, 2026

Uh oh!

hartungstenio May 22, 2026

Uh oh!

hartungstenio May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DouweM commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

DouweM commented Apr 10, 2026

Audit vs prior art: Memory

Uh oh!

DouweM commented Apr 10, 2026

Code Quality (7 commits)

Type Safety

Code Deduplication

Robustness

Style

Tests (48 → 119)

Examples (3 scripts)

Uh oh!

hartungstenio May 22, 2026

Choose a reason for hiding this comment

Uh oh!

hartungstenio May 22, 2026

Choose a reason for hiding this comment

Uh oh!

hartungstenio May 22, 2026

Choose a reason for hiding this comment

Uh oh!

hartungstenio May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DouweM commented Apr 10, 2026 •

edited

Loading