pydantic · dsfaccini · Jun 5, 2026 · Apr 2, 2026 · Apr 2, 2026 · Apr 2, 2026
diff --git a/PLAN.md b/PLAN.md
@@ -0,0 +1,63 @@
+# Compaction Capability — Implementation Plan
+
+Closes #21
+
+## Overview
+
+This PR adds three compaction-related capabilities to `pydantic-harness`:
+
+1. **`SlidingWindow`** — Zero-cost message trimming via a configurable sliding window.
+2. **`LimitWarner`** — Injects warning messages when the agent approaches iteration, context-window, or total-token limits.
+3. **`Compaction`** — LLM-powered summarization that replaces older messages with a compact summary.
+
+All three are `AbstractCapability` subclasses that operate via the `before_model_request` hook, modifying `request_context.messages` before each model call.
+
+## Design Decisions
+
+### Tool-call / tool-return pair safety
+
+The most critical invariant: trimming or compacting must **never** orphan a `ToolCallPart` without its corresponding `ToolReturnPart` (or vice versa). Doing so causes HTTP 400 errors from LLM providers.
+
+The implementation uses a `_is_safe_cutoff()` function that searches around a proposed cutoff point for tool-call pairs that would be split. If a cutoff is unsafe, it walks backward to find a safe one. This approach is adapted from [vstorm-co/summarization-pydantic-ai](https://github.com/vstorm-co/summarization-pydantic-ai)'s `_cutoff.py`.
+
+### Trigger and retention modes
+
+Both `SlidingWindow` and `Compaction` support two trigger modes:
+- `max_messages` — fire when message count exceeds threshold
+- `max_tokens` — fire when estimated token count exceeds threshold
+
+And two retention modes:
+- `keep_messages` — retain N tail messages
+- `keep_tokens` — retain messages fitting within a token budget
+
+### Token estimation
+
+A simple `estimate_token_count()` function approximates tokens at ~4 characters per token. This avoids requiring a tokenizer dependency while providing reasonable estimates for threshold detection.
+
+### LimitWarner design
+
+Warnings are injected as a trailing `ModelRequest` with a `UserPromptPart` (not a system message), because models tend to pay more attention to user messages. A `[LimitWarner]` marker enables stripping previous warnings before injecting new ones, preventing warning accumulation.
+
+### Compaction summarization
+
+The `Compaction` capability creates a temporary `pydantic_ai.Agent` with the configured summarization model. System prompts from the beginning of the conversation are preserved and prepended to the summary message.
+
+## Dependencies
+
+- Requires `pydantic-ai-slim` with the capabilities branch (not yet on PyPI).
+- For local development, add a `[tool.uv.sources]` override pointing to the capabilities branch checkout.
+
+## Files
+
+- `src/pydantic_harness/compaction.py` — All three capabilities plus helpers
+- `src/pydantic_harness/__init__.py` — Package exports
+- `tests/test_compaction.py` — 81 tests covering all code paths
+- `pyproject.toml` — Coverage threshold adjustment (98% due to branch coverage of elif chains)
+
+## References
+
+- [pydantic/pydantic-ai#4137](https://github.com/pydantic/pydantic-ai/issues/4137) — First-class Context Compaction API
+- [pydantic/pydantic-ai#4267](https://github.com/pydantic/pydantic-ai/issues/4267) — Anthropic Compactions
+- [pydantic/pydantic-ai#4013](https://github.com/pydantic/pydantic-ai/issues/4013) — OpenAI Compactions
+- [pydantic/pydantic-harness#35](https://github.com/pydantic/pydantic-harness/issues/35) — Expose context window size on ModelProfile
+- [vstorm-co/summarization-pydantic-ai](https://github.com/vstorm-co/summarization-pydantic-ai) — Prior art for cutoff logic
diff --git a/pydantic_ai_harness/experimental/__init__.py b/pydantic_ai_harness/experimental/__init__.py
@@ -0,0 +1,13 @@
+"""Experimental pydantic-ai-harness capabilities.
+
+Anything under `pydantic_ai_harness.experimental` may change or be removed in any release,
+without a deprecation period.  Importing an experimental capability emits a
+`HarnessExperimentalWarning` that tells you how to silence the whole category at once.
+
+Importing this module on its own does **not** emit a warning, so you can pull in
+`HarnessExperimentalWarning` to silence the warnings before importing a capability.
+"""
+
+from pydantic_ai_harness.experimental._warn import HarnessExperimentalWarning
+
+__all__ = ['HarnessExperimentalWarning']
diff --git a/pydantic_ai_harness/experimental/_warn.py b/pydantic_ai_harness/experimental/_warn.py
@@ -0,0 +1,40 @@
+"""Experimental-feature warning machinery for pydantic-ai-harness."""
+
+from __future__ import annotations
+
+import warnings
+
+
+class HarnessExperimentalWarning(UserWarning):
+    """Signals that a pydantic-ai-harness feature is experimental.
+
+    Experimental features may change or be removed in any release, without a deprecation
+    period.  Silence every experimental-harness warning at once with::
+
+        import warnings
+        from pydantic_ai_harness.experimental import HarnessExperimentalWarning
+
+        warnings.filterwarnings('ignore', category=HarnessExperimentalWarning)
+    """
+
+
+_SILENCE_HINT = (
+    '    import warnings\n'
+    '    from pydantic_ai_harness.experimental import HarnessExperimentalWarning\n'
+    "    warnings.filterwarnings('ignore', category=HarnessExperimentalWarning)"
+)
+
+
+def warn_experimental(feature: str) -> None:
+    """Emit a `HarnessExperimentalWarning` for *feature*, including how to silence all of them.
+
+    One filter silences the whole category — every experimental capability — so users never
+    need a suppression line per capability.
+    """
+    warnings.warn(
+        f'`pydantic_ai_harness.experimental.{feature}` is experimental: its API may change or be '
+        f'removed in any release, without a deprecation period.\n\n'
+        f'Silence all pydantic-ai-harness experimental warnings with:\n\n{_SILENCE_HINT}\n',
+        category=HarnessExperimentalWarning,
+        stacklevel=2,
+    )
diff --git a/pydantic_ai_harness/experimental/compaction/README.md b/pydantic_ai_harness/experimental/compaction/README.md
@@ -0,0 +1,124 @@
+# Compaction capabilities
+
+> [!WARNING]
+> **Experimental.** These capabilities live under `pydantic_ai_harness.experimental` and may
+> change or be removed in any release, without a deprecation period. Import them from the
+> experimental path — there is no top-level export:
+>
+> ```python
+> from pydantic_ai_harness.experimental.compaction import TieredCompaction
+> ```
+>
+> Importing any experimental capability emits a `HarnessExperimentalWarning`. Silence **all**
+> harness experimental warnings with a single filter (no per-capability lines needed):
+>
+> ```python
+> import warnings
+> from pydantic_ai_harness.experimental import HarnessExperimentalWarning
+>
+> warnings.filterwarnings('ignore', category=HarnessExperimentalWarning)
+> ```
+
+A menu of strategies for keeping an agent's conversation history within a model's context
+window. Each is a Pydantic AI `Capability` that runs in the `before_model_request` hook; edits
+**persist** into the run's message history, so a trim/clear/summary carries forward to later
+steps (it is not recomputed from the full history every turn).
+
+All strategies preserve tool-call / tool-return **pairing** — core does not validate this, and a
+provider rejects an orphaned pair. The zero-LLM strategies never call a model.
+
+## The menu
+
+| Capability | Cost | What it does | Reach for it when |
+|---|---|---|---|
+| `SlidingWindow` | zero-LLM | Drops the oldest whole messages down to a tail | You only need the recent turns and can discard old context entirely |
+| `ClearToolResults` | zero-LLM | Blanks the content of old tool *results* in place, keeping the last `keep_pairs` | Tool outputs dominate context and can be re-fetched on demand (the cheap first tier) |
+| `DeduplicateFileReads` | zero-LLM | Blanks every file read superseded by a newer read of the same file | The agent re-reads files and only the latest version matters |
+| `SummarizingCompaction` | one LLM call | Summarizes older messages into a structured summary, keeping the recent tail | Old context still matters but must be compressed; use behind the cheap tiers |
+| `TieredCompaction` | escalates | Runs cheap passes first, summarizes only if still over `target_tokens` | You want the SOTA default: spend the expensive summary only when needed |
+| `LimitWarner` | zero-LLM | Injects an URGENT/CRITICAL warning as limits approach | You want the agent to wrap up rather than have its history rewritten |
+
+## Triggers
+
+Every size-based strategy triggers on `max_messages` and/or `max_tokens` (estimated). Token counts
+use a ~4-chars-per-token heuristic by default; pass a `tokenizer` callable (e.g. `tiktoken`) for
+accuracy. `DeduplicateFileReads` runs on every request when no trigger is set (it is cheap and
+near-lossless). `TieredCompaction` triggers and stops on a single `target_tokens` budget.
+
+## Cost: why summarization is the last resort
+
+Summarization turns input tokens into output tokens, which are billed at a premium and generated
+serially — so it is genuinely expensive. The zero-LLM strategies touch only the cheaper input side.
+The field consensus (Anthropic, OpenCode, Letta) is to clear/dedupe first and summarize only when
+that is not enough — which is exactly what `TieredCompaction` encodes:
+
+```python
+from pydantic_ai import Agent
+from pydantic_ai_harness.experimental.compaction import (
+    ClearToolResults,
+    DeduplicateFileReads,
+    SummarizingCompaction,
+    TieredCompaction,
+)
+
+agent = Agent(
+    'openai:gpt-4o',
+    capabilities=[
+        TieredCompaction(
+            tiers=[
+                DeduplicateFileReads(file_key=my_file_key),
+                ClearToolResults(max_tokens=1, keep_pairs=3),
+                SummarizingCompaction(max_messages=1, keep_messages=20),  # model inherits the run's
+            ],
+            target_tokens=120_000,
+        )
+    ],
+)
+```
+
+A tier inside `TieredCompaction` is driven directly by the orchestrator, which re-measures after each
+and stops once under `target_tokens` — so a tier's own `max_*` trigger is irrelevant there (set it to
+anything valid). Any object with `async def compact(messages, ctx) -> list[ModelMessage]`
+(`CompactionStrategy`) can be a tier, so you can plug in your own.
+
+## Cache tradeoff (read before using `ClearToolResults`)
+
+Clearing or deduplicating rewrites message content, which invalidates the provider's prompt cache
+from the edit point onward — the next request pays a cache-write. Use `ClearToolResults`'
+`min_clear_tokens` to skip clearing that reclaims too little to be worth busting the cache.
+
+## Model inheritance
+
+`SummarizingCompaction(model=...)` accepts a model name or `Model`; when left `None` it inherits the
+running agent's model. No token caps are imposed on the summary call.
+
+## Usage accounting
+
+The summary call is a real request to the model, so its full usage — tokens **and** the request
+itself — is folded into the run's `ctx.usage`. This is deliberate: it keeps cost honest, keeps the
+request count consistent (a model request that didn't count as one would be the surprise), and lets a
+`UsageLimits` request limit catch a runaway compaction. A run-request / iteration limiter will
+therefore see compaction calls among its requests.
+
+## `DeduplicateFileReads.file_key`
+
+There is no default `file_key`: identifying a file read is agent-specific, and a wrong guess would
+drop live data. Supply a callable mapping a `ToolCallPart` to a stable file key, or `None` when the
+call is not a file read:
+
+```python
+from pydantic_ai.messages import ToolCallPart
+
+
+def my_file_key(call: ToolCallPart) -> str | None:
+    if call.tool_name != 'read_file':
+        return None
+    args = call.args
+    return args.get('path') if isinstance(args, dict) else None
+```
+
+## Out of scope
+
+These strategies compress or drop context *inside* the window. Moving large tool outputs *out* of the
+window — overflowing them to a file the agent (or a subagent) can query on demand — is a separate
+capability, not lossy truncation. Prefer it over capping individual tool outputs.
diff --git a/pydantic_ai_harness/experimental/compaction/__init__.py b/pydantic_ai_harness/experimental/compaction/__init__.py
@@ -0,0 +1,28 @@
+"""Compaction capabilities: keep an agent's conversation history within the context window.
+
+Each capability lives in its own module; shared utilities (token estimation, the
+`CompactionStrategy` protocol, tool-pair-safe cutoffs, in-place clearing) live in `_shared`.
+"""
+
+from pydantic_ai_harness.experimental._warn import warn_experimental
+from pydantic_ai_harness.experimental.compaction._clear_tool_results import ClearToolResults
+from pydantic_ai_harness.experimental.compaction._deduplicate_file_reads import DeduplicateFileReads
+from pydantic_ai_harness.experimental.compaction._limit_warner import LimitWarner, WarningKind
+from pydantic_ai_harness.experimental.compaction._shared import CompactionStrategy, estimate_token_count
+from pydantic_ai_harness.experimental.compaction._sliding_window import SlidingWindow
+from pydantic_ai_harness.experimental.compaction._summarizing_compaction import SummarizingCompaction
+from pydantic_ai_harness.experimental.compaction._tiered_compaction import TieredCompaction
+
+warn_experimental('compaction')
+
+__all__ = [
+    'ClearToolResults',
+    'CompactionStrategy',
+    'DeduplicateFileReads',
+    'LimitWarner',
+    'SlidingWindow',
+    'SummarizingCompaction',
+    'TieredCompaction',
+    'WarningKind',
+    'estimate_token_count',
+]