Add Memory capability with pluggable storage backends#179
Conversation
Implements a Memory capability (AbstractCapability subclass) for persistent key-value memory across agent sessions, addressing #30. - MemoryStore protocol with InMemoryStore (dict-based, for testing) and FileStore (JSON file on disk, for persistence) backends - Five tools via get_toolset(): save_memory, recall_memory, search_memories, list_memories, delete_memory - Dynamic instructions via get_instructions() that inject stored memories into the system prompt at run start - Substring-based search across keys, content, and tags - Spec serialization support (Memory.from_spec with backend="memory"|"file") - 48 tests covering all code paths, passing lint, format, and typecheck Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address audit findings from PR review: - Better search: word-boundary matching with relevance scoring (count of matching words across key/content/tags, sorted by score descending). Underscores and hyphens treated as word separators. - Memory scoping: `scope: str = 'global'` field on MemoryEntry, with optional `scope` parameter on `search_memories` and `list_memories` tools and `list_all`/`search` store methods. - TTL/expiration: `expires_at: str | None = None` on MemoryEntry with `is_expired()` method. Stores filter out expired entries automatically. `save_memory` tool accepts optional `ttl_minutes` parameter. - Dedup warning: when saving a memory whose key is very similar to an existing key (same 10-char prefix, Levenshtein distance <= 2), log a warning via the `pydantic_harness.memory` logger. Tests: 48 -> 99, all passing with 100% coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Any types Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… and FileStore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… backend Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…L, and conformance Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- personal_assistant.py: FileStore persistence, preferences, instructions injection - study_coach.py: TTL/spaced repetition, tags, search - coding_assistant.py: procedural memory, rules, search, delete All examples assert on memory state and are instrumented with logfire spans. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| def get(self, key: str) -> MemoryEntry | None: | ||
| """Retrieve a memory entry by key.""" | ||
| return self._entries.get(key) |
There was a problem hiding this comment.
🚩 Expired entries are never cleaned up from storage
The store-level get() method (_BaseDictStore.get at line 186) returns entries regardless of expiration status. Filtering is only done in list_all, search, and the recall_memory tool. This means expired entries accumulate indefinitely in both InMemoryStore (memory leak) and FileStore (disk bloat). For short-lived processes this is fine, but long-running agents with TTL-based entries will see unbounded growth. A periodic or lazy cleanup strategy (e.g., purging expired entries on list_all/search or on a timer) would be worth considering.
Was this helpful? React with 👍 or 👎 to provide feedback.
Audit vs prior art: MemoryWorth adding now:
Follow-up opportunities:
|
Claude here: We reviewed this PR and pushed several improvements. Here's what changed: Code Quality (7 commits)Type Safety
Code Deduplication
Robustness
Style
Tests (48 → 119)Added edge case tests for:
Examples (3 scripts)All instrumented with Logfire, assert on memory state:
All 3 ran successfully against |
The example scripts use logfire (not in any dep group) and other runtime-only imports. Strict pyright on them blocks pre-commit hooks without adding value — examples are illustrative, not part of the typed library surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_BaseDictStore.get() now filters expired entries (consistency with list_all/search) and FileStore._save() drops expired entries before writing, so long-running file-backed agents no longer accumulate dead records on disk. Tightens is_expired docstring with wall-clock semantics. TTL=0 now correctly results in an immediately-invisible entry (test updated accordingly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eMemoryStore Avoids collision with PR #176's SessionPersistence (InMemorySessionStore / FileSessionStore) and pydantic-ai's binary file-store branch. The <Implementation>MemoryStore convention makes the domain explicit at import sites — `DictMemoryStore` describes what backs the store (`dict`), `FileMemoryStore` describes its persistence target. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e to MemoryEntry Five additive fields aligning MemoryEntry with the structured-record conventions in Mem0/LangGraph/Letta: - summary: short version preferred over content for system-prompt injection (used by build_instructions in a follow-up commit) - metadata: JSON-serializable structured attributes for filterable search (used by search filter= in a follow-up commit) - read_only: pin entry against modification by save_memory/delete_memory tools — useful for system-curated facts (persona, policies) - char_limit: hard cap on content length, enforced at MemoryEntry construction; raises ValueError when exceeded - importance: relevance booster for search scoring The save_memory tool now accepts summary and importance as optional LLM-facing parameters; metadata, read_only, and char_limit are dev-only (set via direct MemoryEntry construction). save_memory and delete_memory refuse to modify read_only entries with a clear message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Borrowed from LangGraph BaseStore. Hierarchical namespaces enable
patterns like ('users', 'alice', 'prefs') and ('agents', 'planner',
'facts') that flat scope strings can't represent. Filters use prefix
matching: namespace=('users',) matches all entries under that root.
Changes:
- MemoryEntry.scope (str, default 'global') → namespace (tuple[str, ...],
default ('global',))
- MemoryStore.list_all/search now take namespace=tuple[str, ...] | None
- New MemoryStore.list_namespaces(prefix, suffix, max_depth) returns
unique namespaces in the store, sorted
- save_memory/search_memories/list_memories tools accept list[str] and
coerce to tuple internally
- format_entry shows nested namespaces as 'a/b/c'; the default
('global',) is still omitted for brevity
- Added _namespace_matches helper for prefix-match logic
- Tests cover prefix matching, max_depth truncation, suffix filtering,
and persistence of nested namespaces
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…earch Three additions to MemoryStore.search and list_all (filter only on list_all): - filter: dict[str, object] | None — equality match against MemoryEntry.metadata. Drops entries that don't match all keys. - recency_scorer: Callable[[MemoryEntry], float] — pluggable per-call recency boost added to the base keyword-match score. Built-in exponential_decay(half_life_days, weight) factory ships as the Memory capability default (30-day half-life, weight 0.5). Set to None to disable. - entry.importance: float | None — when set, added to the search score unconditionally so devs/agents can pin entries above keyword matches. Score formula: keyword_match_count + (importance or 0) + (recency or 0). Entries with zero keyword match are still excluded — recency and importance only re-rank within the matched set. Also adds RecencyScorer type alias, both exported from pydantic_harness for custom scorers (e.g. linear decay, importance-only ranking). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…in instructions build_instructions now: - Prefers entry.summary over entry.content when injecting into the system prompt (via format_entry's new prefer_summary flag), keeping token budgets predictable for entries with long bodies. - Respects an optional byte_budget: int | None on Memory — non-pinned entries are skipped once adding the next would exceed the cap (UTF-8 bytes). - Always injects read_only=True entries regardless of count cap or byte budget. Pinned entries are listed first. Selection precedence: pinned (always) → up to max_instructions_memories non-pinned, capped by byte_budget when set. Overflow is reported with a "... and N more" suffix nudging the LLM toward search/list tools. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-tool description override at toolset construction. Keys are tool names (save_memory, recall_memory, search_memories, list_memories, delete_memory); values replace the docstring used by the LLM. Useful for nudging agent behavior toward project-specific conventions (e.g., "Save aggressively, even small facts"). Borrowed from pydantic-deepagents AgentMemoryToolset. Tools without an override fall back to their docstring as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…res_store.py Reference implementation of MemoryStore against Postgres via psycopg. ~150 LOC. Schema: key TEXT PK, namespace TEXT[], data JSONB. Implements all six Protocol methods (get, put, delete, list_all, search, list_namespaces). Filtering happens in SQL (namespace prefix via array slicing, metadata equality via JSONB ops); keyword scoring runs in Python after the DB pre-filter, matching DictMemoryStore semantics. Documented as a starting point — production users will want connection pooling, schema migrations, and full-text or pgvector ranking. Also exempts examples/ from the ruff D (pydocstyle) ruleset, mirroring the existing tests/ exemption — example scripts are illustrative, not API surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/capabilities/memory.md — full user-facing capability docs covering quick start, MemoryEntry shape, built-in vs custom backends, namespaces, multi-agent shared store, search/recency, prompt-cache trade-off, tool description overrides, custom backends, and the known followups list (semantic retrieval, dedup hook). - Memory class docstring gains a multi-agent shared-store example so the pattern is discoverable from autocomplete without reading docs. - PLAN.md fully refreshed: reflects all current fields, the namespace tuple, recency scoring, byte budget, read_only pinning, the Postgres reference, and the Future Work followups. - docs/capabilities/index.md links the new memory page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build_instructions now skips re-injecting an entry when the LLM has already seen its current value via a save_memory call in this run's tool history. Default ON (dedup_recent_saves=True); disable per-Memory instance via the flag. The check is content-aware: scans ctx.messages for ToolCallParts named 'save_memory', tracks the last (key, content) saved per key, and only suppresses an entry when entry.content == last_saved_content. If something updated the entry externally (another agent, manual store mutation, etc.), the saved content no longer matches the store, so we inject to let the LLM see the current value. read_only entries always inject regardless of dedup. This is the previously-deferred "tool-history dedup" followup; the content-equality safeguard resolves the in-run-update gotcha. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adapt to the pydantic_harness → pydantic_ai_harness package rename: relocate memory.py to pydantic_ai_harness/memory/_capability.py with __init__.py re-exports, move docs/capabilities/memory.md to the package's README.md, wire Memory into the top-level __init__'s lazy loader alongside CodeMode, and update test/example imports. Pyproject exclude list aligned with main; logfire.instrument_openai() typing gaps in examples silenced with targeted pyright ignores. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||
|
|
||
| @runtime_checkable | ||
| class MemoryStore(Protocol): |
There was a problem hiding this comment.
Great addition!
One thing I'd love to see here is async support for MemoryStore.
Each method could optionally return an Awaitable (or have another protocol, AsyncMemoryStore), following the same pattern already used elsewhere in Pydantic AI:
existing = memory_store.get(...)
if is_awaitable(existing):
existing = await existingThis would let users plug in async backends (Redis, a database, a vector store) without blocking the event loop, while keeping sync implementations like DictMemoryStore working as-is.
| class MemoryStore(Protocol): | ||
| """Protocol for pluggable memory storage backends.""" | ||
|
|
||
| def get(self, key: str) -> MemoryEntry | None: # pragma: no cover |
There was a problem hiding this comment.
It would also be useful if the protocol received the run context (or deps), so the store can scope memories per user, tenant, or session without the caller having to manage namespaces manually.
There was a problem hiding this comment.
I guess I could do this in my own MemoryStore, like
capabilities = [Memory(store=ScopedMemoryStore(scope=my_user))]🤔
| class _MemoryEntryDictRequired(TypedDict): | ||
| """Required fields for MemoryEntryDict.""" | ||
|
|
||
| key: str | ||
| content: str |
There was a problem hiding this comment.
If feasible, using a single TypedDict with typing.Required (or typing_extensions.Required) instead of the _MemoryEntryDictRequired base class would be more idiomatic.
Add persistent memory capability for Pydantic AI agents. This capability provides five tools: memory_store, memory_retrieve, memory_list, memory_delete, and memory_compact. Key features: - SQLite backend with FTS5 full-text search - AbstractMemoryBackend interface for custom storage engines - Tag-based filtering and glob pattern matching - Access tracking and automatic compaction - 18 tests covering models, backend, and edge cases Implements: pydantic#179 (Persistent key-value memory)
Summary
Implements a
Memorycapability (AbstractCapabilitysubclass) for persistent key-value memory across agent sessions.MemoryStoreprotocol with two backends:InMemoryStore(dict-based, for testing) andFileStore(JSON file on disk, for persistence)get_toolset():save_memory,recall_memory,search_memories,list_memories,delete_memoryget_instructions()that inject stored memories into the system prompt at run startfrom_spec(backend="memory"|"file")Closes #30
Test plan
ruff checkandruff formatpasspyrightstrict mode passes with 0 errors🤖 Generated with Claude Code