Skip to content

Add Memory capability with pluggable storage backends#179

Open
DouweM wants to merge 22 commits into
mainfrom
capability/memory
Open

Add Memory capability with pluggable storage backends#179
DouweM wants to merge 22 commits into
mainfrom
capability/memory

Conversation

@DouweM
Copy link
Copy Markdown
Contributor

@DouweM DouweM commented Apr 10, 2026

Summary

Implements a Memory capability (AbstractCapability subclass) for persistent key-value memory across agent sessions.

  • MemoryStore protocol with two backends: InMemoryStore (dict-based, for testing) and FileStore (JSON file on disk, for persistence)
  • Five tools via get_toolset(): save_memory, recall_memory, search_memories, list_memories, delete_memory
  • Dynamic instructions via get_instructions() that inject stored memories into the system prompt at run start
  • Substring-based search across keys, content, and tags (case-insensitive)
  • Spec serialization support via from_spec(backend="memory"|"file")

Closes #30

Test plan

  • 48 tests covering all code paths (MemoryEntry, InMemoryStore, FileStore, Memory capability, tool functions, instructions, protocol conformance)
  • ruff check and ruff format pass
  • pyright strict mode passes with 0 errors
  • All existing tests still pass

🤖 Generated with Claude Code

DouweM and others added 10 commits April 2, 2026 05:28
Implements a Memory capability (AbstractCapability subclass) for persistent
key-value memory across agent sessions, addressing #30.

- MemoryStore protocol with InMemoryStore (dict-based, for testing) and
  FileStore (JSON file on disk, for persistence) backends
- Five tools via get_toolset(): save_memory, recall_memory, search_memories,
  list_memories, delete_memory
- Dynamic instructions via get_instructions() that inject stored memories
  into the system prompt at run start
- Substring-based search across keys, content, and tags
- Spec serialization support (Memory.from_spec with backend="memory"|"file")
- 48 tests covering all code paths, passing lint, format, and typecheck

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address audit findings from PR review:

- Better search: word-boundary matching with relevance scoring (count of
  matching words across key/content/tags, sorted by score descending).
  Underscores and hyphens treated as word separators.
- Memory scoping: `scope: str = 'global'` field on MemoryEntry, with
  optional `scope` parameter on `search_memories` and `list_memories`
  tools and `list_all`/`search` store methods.
- TTL/expiration: `expires_at: str | None = None` on MemoryEntry with
  `is_expired()` method. Stores filter out expired entries automatically.
  `save_memory` tool accepts optional `ttl_minutes` parameter.
- Dedup warning: when saving a memory whose key is very similar to an
  existing key (same 10-char prefix, Levenshtein distance <= 2), log a
  warning via the `pydantic_harness.memory` logger.

Tests: 48 -> 99, all passing with 100% coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Any types

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… and FileStore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… backend

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…L, and conformance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- personal_assistant.py: FileStore persistence, preferences, instructions injection
- study_coach.py: TTL/spaced repetition, tags, search
- coding_assistant.py: procedural memory, rules, search, delete

All examples assert on memory state and are instrumented with logfire spans.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment thread src/pydantic_harness/memory.py Outdated
Comment on lines +186 to +188
def get(self, key: str) -> MemoryEntry | None:
"""Retrieve a memory entry by key."""
return self._entries.get(key)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Expired entries are never cleaned up from storage

The store-level get() method (_BaseDictStore.get at line 186) returns entries regardless of expiration status. Filtering is only done in list_all, search, and the recall_memory tool. This means expired entries accumulate indefinitely in both InMemoryStore (memory leak) and FileStore (disk bloat). For short-lived processes this is fine, but long-running agents with TTL-based entries will see unbounded growth. A periodic or lazy cleanup strategy (e.g., purging expired entries on list_all/search or on a timer) would be worth considering.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@DouweM
Copy link
Copy Markdown
Contributor Author

DouweM commented Apr 10, 2026

Originally posted by @DouweM in #137 comment (PR was recreated)

Audit vs prior art: Memory

Worth adding now:

  • Word-boundary search with relevance scoring (substring is too primitive)
  • Memory scoping/namespaces: scope field on entries + search filtering
  • TTL/expiration: expires_at on entries
  • Dedup on save (warn if very similar key/content exists)

Follow-up opportunities:

  • Vector/embedding backends, SQLite/Redis stores, auto-summarization

@DouweM
Copy link
Copy Markdown
Contributor Author

DouweM commented Apr 10, 2026

Originally posted by @dsfaccini in #137 comment (PR was recreated)

Claude here: We reviewed this PR and pushed several improvements. Here's what changed:

Code Quality (7 commits)

Type Safety

  • MemoryEntryDict TypedDict — replaced dict[str, Any] in to_dict/from_dict with a proper TypedDict. Eliminated all avoidable Any types (Any remains only in from_spec return Memory[Any], an unavoidable framework constraint).
  • Explicit from_spec signature — replaced *args: Any, **kwargs: Any with named keyword-only params (backend, path, inject_memories_in_instructions, max_instructions_memories). Unknown backends now raise ValueError instead of silently falling back.

Code Deduplication

  • Extracted _BaseDictStore base class — InMemoryStore and FileStore shared identical get, list_all, search methods (~40 lines of duplication). Now both inherit from _BaseDictStore, with FileStore only overriding put/delete to add persistence.

Robustness

  • Graceful FileStore._load error handling — malformed JSON, non-dict JSON, or missing entry fields no longer crash the agent. Logs a warning and starts with an empty store instead.

Style

  • Replaced all RST-style double backticks with markdown single backticks in docstrings.
  • Fixed default_factory=list[str] (a GenericAlias, not a callable) to proper form that satisfies pyright strict.

Tests (48 → 119)

Added edge case tests for:

  • _score_entry: regex metacharacters in queries, underscore/hyphen word boundaries, partial word matches, empty word list
  • _simple_similarity: edit distance boundary (exactly 3 = rejected), 9-char keys (below threshold), 10-char keys
  • format_entry: empty key, empty content
  • build_instructions: exact max boundary (overflow == 0)
  • save_memory: TTL=0 immediate expiration
  • from_spec: unknown backend raises, explicit backend='memory', forwarded options
  • FileStore._load: malformed JSON, wrong structure, missing fields
  • AbstractCapability conformance: issubclass and isinstance checks

Examples (3 scripts)

All instrumented with Logfire, assert on memory state:

  1. examples/memory/personal_assistant.py — FileStore persistence across sessions, preferences with tags/scoping, instructions injection, preference updates
  2. examples/memory/study_coach.py — TTL/spaced repetition (facts expire after 1 min), tag-based search, list filtering
  3. examples/memory/coding_assistant.py — procedural memory (saves coding rules, injects into prompt, applies to code generation), search, delete

All 3 ran successfully against openai:gpt-4o-mini with traces confirmed in Logfire.

@DouweM DouweM added this to the 2026-05 milestone Apr 23, 2026
dsfaccini and others added 5 commits May 7, 2026 09:23
The example scripts use logfire (not in any dep group) and other
runtime-only imports. Strict pyright on them blocks pre-commit hooks
without adding value — examples are illustrative, not part of the
typed library surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_BaseDictStore.get() now filters expired entries (consistency with
list_all/search) and FileStore._save() drops expired entries before
writing, so long-running file-backed agents no longer accumulate dead
records on disk. Tightens is_expired docstring with wall-clock
semantics. TTL=0 now correctly results in an immediately-invisible
entry (test updated accordingly).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eMemoryStore

Avoids collision with PR #176's SessionPersistence (InMemorySessionStore /
FileSessionStore) and pydantic-ai's binary file-store branch. The
<Implementation>MemoryStore convention makes the domain explicit at
import sites — `DictMemoryStore` describes what backs the store
(`dict`), `FileMemoryStore` describes its persistence target. No
behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e to MemoryEntry

Five additive fields aligning MemoryEntry with the structured-record
conventions in Mem0/LangGraph/Letta:

- summary: short version preferred over content for system-prompt
  injection (used by build_instructions in a follow-up commit)
- metadata: JSON-serializable structured attributes for filterable
  search (used by search filter= in a follow-up commit)
- read_only: pin entry against modification by save_memory/delete_memory
  tools — useful for system-curated facts (persona, policies)
- char_limit: hard cap on content length, enforced at MemoryEntry
  construction; raises ValueError when exceeded
- importance: relevance booster for search scoring

The save_memory tool now accepts summary and importance as optional
LLM-facing parameters; metadata, read_only, and char_limit are dev-only
(set via direct MemoryEntry construction). save_memory and
delete_memory refuse to modify read_only entries with a clear message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Borrowed from LangGraph BaseStore. Hierarchical namespaces enable
patterns like ('users', 'alice', 'prefs') and ('agents', 'planner',
'facts') that flat scope strings can't represent. Filters use prefix
matching: namespace=('users',) matches all entries under that root.

Changes:
- MemoryEntry.scope (str, default 'global') → namespace (tuple[str, ...],
  default ('global',))
- MemoryStore.list_all/search now take namespace=tuple[str, ...] | None
- New MemoryStore.list_namespaces(prefix, suffix, max_depth) returns
  unique namespaces in the store, sorted
- save_memory/search_memories/list_memories tools accept list[str] and
  coerce to tuple internally
- format_entry shows nested namespaces as 'a/b/c'; the default
  ('global',) is still omitted for brevity
- Added _namespace_matches helper for prefix-match logic
- Tests cover prefix matching, max_depth truncation, suffix filtering,
  and persistence of nested namespaces

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dsfaccini and others added 7 commits May 7, 2026 09:40
…earch

Three additions to MemoryStore.search and list_all (filter only on
list_all):

- filter: dict[str, object] | None — equality match against
  MemoryEntry.metadata. Drops entries that don't match all keys.
- recency_scorer: Callable[[MemoryEntry], float] — pluggable per-call
  recency boost added to the base keyword-match score. Built-in
  exponential_decay(half_life_days, weight) factory ships as the
  Memory capability default (30-day half-life, weight 0.5). Set to
  None to disable.
- entry.importance: float | None — when set, added to the search score
  unconditionally so devs/agents can pin entries above keyword matches.

Score formula: keyword_match_count + (importance or 0) + (recency or 0).
Entries with zero keyword match are still excluded — recency and
importance only re-rank within the matched set.

Also adds RecencyScorer type alias, both exported from
pydantic_harness for custom scorers (e.g. linear decay, importance-only
ranking).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…in instructions

build_instructions now:
- Prefers entry.summary over entry.content when injecting into the
  system prompt (via format_entry's new prefer_summary flag), keeping
  token budgets predictable for entries with long bodies.
- Respects an optional byte_budget: int | None on Memory — non-pinned
  entries are skipped once adding the next would exceed the cap (UTF-8
  bytes).
- Always injects read_only=True entries regardless of count cap or byte
  budget. Pinned entries are listed first.

Selection precedence: pinned (always) → up to max_instructions_memories
non-pinned, capped by byte_budget when set. Overflow is reported with
a "... and N more" suffix nudging the LLM toward search/list tools.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-tool description override at toolset construction. Keys are tool
names (save_memory, recall_memory, search_memories, list_memories,
delete_memory); values replace the docstring used by the LLM. Useful
for nudging agent behavior toward project-specific conventions
(e.g., "Save aggressively, even small facts").

Borrowed from pydantic-deepagents AgentMemoryToolset. Tools without
an override fall back to their docstring as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…res_store.py

Reference implementation of MemoryStore against Postgres via psycopg.
~150 LOC. Schema: key TEXT PK, namespace TEXT[], data JSONB. Implements
all six Protocol methods (get, put, delete, list_all, search,
list_namespaces). Filtering happens in SQL (namespace prefix via array
slicing, metadata equality via JSONB ops); keyword scoring runs in
Python after the DB pre-filter, matching DictMemoryStore semantics.

Documented as a starting point — production users will want connection
pooling, schema migrations, and full-text or pgvector ranking.

Also exempts examples/ from the ruff D (pydocstyle) ruleset, mirroring
the existing tests/ exemption — example scripts are illustrative, not
API surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/capabilities/memory.md — full user-facing capability docs
  covering quick start, MemoryEntry shape, built-in vs custom backends,
  namespaces, multi-agent shared store, search/recency, prompt-cache
  trade-off, tool description overrides, custom backends, and the
  known followups list (semantic retrieval, dedup hook).
- Memory class docstring gains a multi-agent shared-store example so
  the pattern is discoverable from autocomplete without reading docs.
- PLAN.md fully refreshed: reflects all current fields, the namespace
  tuple, recency scoring, byte budget, read_only pinning, the Postgres
  reference, and the Future Work followups.
- docs/capabilities/index.md links the new memory page.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build_instructions now skips re-injecting an entry when the LLM has
already seen its current value via a save_memory call in this run's
tool history. Default ON (dedup_recent_saves=True); disable per-Memory
instance via the flag.

The check is content-aware: scans ctx.messages for ToolCallParts named
'save_memory', tracks the last (key, content) saved per key, and only
suppresses an entry when entry.content == last_saved_content. If
something updated the entry externally (another agent, manual store
mutation, etc.), the saved content no longer matches the store, so we
inject to let the LLM see the current value. read_only entries always
inject regardless of dedup.

This is the previously-deferred "tool-history dedup" followup; the
content-equality safeguard resolves the in-run-update gotcha.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adapt to the pydantic_harness → pydantic_ai_harness package rename:
relocate memory.py to pydantic_ai_harness/memory/_capability.py with
__init__.py re-exports, move docs/capabilities/memory.md to the
package's README.md, wire Memory into the top-level __init__'s lazy
loader alongside CodeMode, and update test/example imports. Pyproject
exclude list aligned with main; logfire.instrument_openai() typing
gaps in examples silenced with targeted pyright ignores.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>


@runtime_checkable
class MemoryStore(Protocol):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition!

One thing I'd love to see here is async support for MemoryStore.

Each method could optionally return an Awaitable (or have another protocol, AsyncMemoryStore), following the same pattern already used elsewhere in Pydantic AI:

existing = memory_store.get(...)
if is_awaitable(existing):
    existing = await existing

This would let users plug in async backends (Redis, a database, a vector store) without blocking the event loop, while keeping sync implementations like DictMemoryStore working as-is.

class MemoryStore(Protocol):
"""Protocol for pluggable memory storage backends."""

def get(self, key: str) -> MemoryEntry | None: # pragma: no cover
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be useful if the protocol received the run context (or deps), so the store can scope memories per user, tenant, or session without the caller having to manage namespaces manually.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I could do this in my own MemoryStore, like

capabilities = [Memory(store=ScopedMemoryStore(scope=my_user))]

🤔

Comment on lines +29 to +33
class _MemoryEntryDictRequired(TypedDict):
"""Required fields for MemoryEntryDict."""

key: str
content: str
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If feasible, using a single TypedDict with typing.Required (or typing_extensions.Required) instead of the _MemoryEntryDictRequired base class would be more idiomatic.

mustafabozkaya pushed a commit to mustafabozkaya/pydantic-ai-harness that referenced this pull request May 31, 2026
Add persistent memory capability for Pydantic AI agents. This capability
provides five tools: memory_store, memory_retrieve, memory_list,
memory_delete, and memory_compact.

Key features:
- SQLite backend with FTS5 full-text search
- AbstractMemoryBackend interface for custom storage engines
- Tag-based filtering and glob pattern matching
- Access tracking and automatic compaction
- 18 tests covering models, backend, and edge cases

Implements: pydantic#179 (Persistent key-value memory)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory capability

3 participants