Add compaction menu: `ClearToolResults`, `DeduplicateFileReads`, `TieredCompaction`; rename `Compaction` → `SummarizingCompaction` by DouweM · Pull Request #191 · pydantic/pydantic-ai-harness

DouweM · 2026-04-10T15:13:25Z

Extends the compaction capability set to the SOTA-aligned menu and aligns naming with the compaction umbrella term.

Capabilities

Class	Cost	What it does
`SlidingWindow`	zero-LLM	Drop oldest whole messages down to a tail
`ClearToolResults`	zero-LLM	Blank old tool results in place, keep last `keep_pairs` (Anthropic `clear_tool_uses`)
`DeduplicateFileReads`	zero-LLM	Keep only the latest read per file (via a required `file_key` seam)
`SummarizingCompaction`	one LLM call	Structured-section summary of older messages (renamed from `Compaction`)
`TieredCompaction`	escalates	Cheap passes first; summarize only if still over `target_tokens`
`LimitWarner`	zero-LLM	Inject URGENT/CRITICAL warnings as limits approach

Design

Shared CompactionStrategy.compact(messages, ctx) seam (exported) — trigger lives in before_model_request, the transform is unconditional, so TieredCompaction drives tiers directly and re-measures between them (true escalation, not threshold-stacking). Users can plug in custom tiers.
All strategies preserve tool-call/return pairing (core does not validate this; an orphaned pair is rejected by the provider).
SummarizingCompaction.model is now optional — inherits the run's model when unset; no token caps.
before_model_request edits persist into the run history (verified against core), so trims/clears/summaries carry forward and summarization is not recomputed each step.

From pyai-expert review

clear_tool_inputs clears args to JSON-valid '{}' (a bare string would reach providers as malformed function-args via args_as_json_str()).
SummarizingCompaction threads ctx.usage into the summary run so its tokens fold into the parent run (this also counts the summary as a request — documented).

Not included

Per-result truncation (TruncateToolOutputs) was intentionally dropped in favour of a future capability that overflows large tool outputs to a file the agent/subagent can query — lossy truncation is unnecessary.

Quality

compaction.py at 100% branch coverage; ruff + pyright strict clean; tests use TestModel/mocks only. Adds capability README (compaction.md).

🤖 Generated with Claude Code

Implements three compaction-related capabilities for managing conversation context in long-running agents: - SlidingWindow: zero-cost message trimming that preserves tool-call pairs - LimitWarner: injects warnings when approaching iteration/token limits - Compaction: LLM-powered summarization of older messages All three use the before_model_request hook to modify request_context.messages transparently. The safe cutoff logic ensures tool-call / tool-return pairs are never orphaned, preventing HTTP 400 errors from LLM providers. Closes #21 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add explicit `set[str]` type annotations and replace unnecessary `isinstance` checks with plain `else` branches. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tion Implements three improvements from the audit findings on PR #140: - Optional `tokenizer: Callable[[str], int] | None` parameter on SlidingWindow, Compaction, estimate_token_count, and _find_token_cutoff. When provided, enables accurate token counting; the 4-chars/token heuristic stays as fallback. - `preserve_first_user_message: bool = True` on SlidingWindow and Compaction. When True, the first ModelRequest containing a UserPromptPart is always retained after trimming/compaction, preserving the original task context. - `incremental: bool = True` on Compaction. When True and a prior compaction summary exists in the message history, it is included in the summarization prompt via a <previous_summary> tag so the LLM extends it rather than regenerating from scratch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

DouweM · 2026-04-10T15:13:45Z

Originally posted by @DouweM in #140 comment (PR was recreated)

Note: This PR implements client-side compaction (LLM summarization + sliding window). Provider-side compaction (OpenAI/Anthropic) additionally requires the core primitive in #141 (CompactionPart message type + compact_messages on Model).

DouweM · 2026-04-10T15:13:45Z

Originally posted by @DouweM in #140 comment (PR was recreated)

Audit vs prior art: Compaction

devin-ai-integration

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-04-10T15:20:18Z

+        system_parts = _extract_system_prompts(messages)
+        to_summarize = messages[:cutoff]
+        preserved = messages[cutoff:]
+
+        previous_summary = _extract_previous_summary(messages) if self.incremental else None
+        summary = await self._summarize(to_summarize, previous_summary=previous_summary)
+
+        summary_part = SystemPromptPart(content=f'{_SUMMARY_PREFIX}{summary}')
+        summary_message = ModelRequest(parts=[*system_parts, summary_part])


🔴 Old compaction summaries accumulate as SystemPromptParts across multiple compaction cycles

After the first compaction, the summary message contains [SystemPromptPart('original sys prompt'), SystemPromptPart('Summary of previous conversation:\n\n...')]. When a second compaction triggers, _extract_system_prompts(messages) at line 726 extracts ALL leading SystemPromptParts from this message — including the old summary part (since it's also a SystemPromptPart). The old summary is then re-included in the new summary message at line 734 alongside the new summary. After N compactions, the summary message contains N stale summary parts plus the new one, growing the context unboundedly and defeating the purpose of compaction.

Trace through two compaction cycles

After first compaction, result.messages[0] = ModelRequest(parts=[SystemPromptPart('sys'), SystemPromptPart('Summary of previous conversation:\n\nfirst summary')]).

When second compaction triggers, _extract_system_prompts (src/pydantic_harness/compaction.py:594-605) sees both parts are SystemPromptPart, extracts both. Then line 734 creates ModelRequest(parts=[SystemPromptPart('sys'), SystemPromptPart('...first summary'), SystemPromptPart('...second summary')]). The old summary is never removed.

Suggested change

system_parts = _extract_system_prompts(messages)

to_summarize = messages[:cutoff]

preserved = messages[cutoff:]

previous_summary = _extract_previous_summary(messages) if self.incremental else None

summary = await self._summarize(to_summarize, previous_summary=previous_summary)

summary_part = SystemPromptPart(content=f'{_SUMMARY_PREFIX}{summary}')

summary_message = ModelRequest(parts=[*system_parts, summary_part])

system_parts = [

p for p in _extract_system_prompts(messages)

if not p.content.startswith(_SUMMARY_PREFIX)

]

to_summarize = messages[:cutoff]

preserved = messages[cutoff:]

previous_summary = _extract_previous_summary(messages) if self.incremental else None

summary = await self._summarize(to_summarize, previous_summary=previous_summary)

summary_part = SystemPromptPart(content=f'{_SUMMARY_PREFIX}{summary}')

summary_message = ModelRequest(parts=[*system_parts, summary_part])

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-10T15:20:20Z


 [tool.coverage.report]
-fail_under = 100
+fail_under = 98


🚩 Coverage threshold lowered from 100% to 98%

The fail_under threshold in pyproject.toml:96 was reduced from 100 to 98, with the commit noting 'due to branch coverage of elif chains'. This permanently lowers the bar for the entire project. Consider using # pragma: no branch on specific elif chains instead of lowering the global threshold.

Was this helpful? React with 👍 or 👎 to provide feedback.

mpfaffenberger · 2026-04-13T13:09:05Z

+# Token estimation
+# ---------------------------------------------------------------------------
+
+_CHARS_PER_TOKEN = 4


This will underestimate for Anthropic models unfortunately. It works pretty well for OpenAI ones. I settled on 2.5 in Code Puppy to give a lot of slack (to avoid the errors in Vertex).

Coming back to this, I suggest we make it configurable somehow? Perhaps an environment var (yucky, but there should be a way to override it for power users)

mpfaffenberger · 2026-04-13T13:09:45Z

+                    segments.append(str(part.content))
+        else:
+            for part in msg.parts:
+                if isinstance(part, TextPart):


You don't want to include ThinkingPart?

mpfaffenberger · 2026-04-13T13:10:54Z

+# Safe cutoff logic — preserves tool-call / tool-return pairs
+# ---------------------------------------------------------------------------
+
+_TOOL_PAIR_SEARCH_RANGE = 5


If I'm understanding this correctly, it could fail if your model performs any number > 5 parallel tool calls.

mpfaffenberger · 2026-04-13T13:12:02Z

+    """Number of tail messages to preserve after compaction (message-count trigger)."""
+
+    keep_tokens: int | None = None
+    """Target token budget to preserve after compaction (token-count trigger).


Love this <3 - I used this strategy in Code Puppy and the agent keeps coherence very nicely. It can get expensive though.

mpfaffenberger

Left a few comments. Hope they're helpful.

grahamcracker1234 · 2026-04-14T18:35:30Z

+    model: str
+    """Model to use for generating summaries (e.g. ``'openai:gpt-4o-mini'``)."""


This should likely include KnownModelName and Model, and use infer_model under the hood.

…Compaction → SummarizingCompaction Extend the compaction menu with the SOTA-aligned strategies missing from the initial cut, and align naming with the "compaction" umbrella term. - ClearToolResults: zero-LLM in-place clearing of old tool results (Anthropic clear_tool_uses); keep_pairs, exclude_tools, clear_tool_inputs (JSON-valid args), min_clear_tokens to protect the prompt cache. - DeduplicateFileReads: zero-LLM; keep only the latest read per file via a required file_key seam. - TieredCompaction: escalation orchestrator — cheap passes first, summarize only if still over target_tokens. CompactionStrategy protocol exported for custom tiers. - Rename Compaction → SummarizingCompaction; structured-section summary prompt; model now optional (inherits the run's model); summary-call usage folded into the parent run. All strategies preserve tool-call/return pairing. compaction.py at 100% branch coverage; pyright strict + ruff clean. Adds capability README (compaction.md). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Resolve the repo restructure (package renamed pydantic_harness → pydantic_ai_harness, moved to per-capability subpackages). Migrate compaction into the new layout: - pydantic_ai_harness/compaction/{__init__,_capability}.py + README.md - root __init__.py exposes the 6 compaction capabilities via lazy __getattr__ - tests moved to tests/compaction/; imports split public vs ._capability - drop now-unused noqa (main's ruff config does not enforce D102/D105) - widen _call_args test helper for the new ToolCallPart.args union (ToolSearchArgs) - pyproject coverage config kept identical to main (protocol stub excluded via pragma) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The summary call's tokens and request are folded into the run's usage by design — consistent, cost-honest, and runaway-safe. Make that explicit in the README. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Split the monolithic _capability.py into a module per capability plus a _shared module for cross-capability utilities (token estimation, the CompactionStrategy protocol, tool-pair-safe cutoffs, first-user preservation, in-place tool-result clearing). - _shared.py exposes its package-internal API without leading underscores (the module itself is private); genuinely module-local helpers keep their underscore. - _sliding_window / _clear_tool_results / _deduplicate_file_reads / _limit_warner / _summarizing_compaction / _tiered_compaction each own one capability. - __init__.py re-exports the public API; tests import privates from their new homes. No behavior change. Every compaction module at 100% branch coverage; ruff + pyright strict clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

DouweM and others added 3 commits April 2, 2026 05:28

Fix pyright strict mode errors in test_compaction

af347ff

Add explicit `set[str]` type annotations and replace unnecessary `isinstance` checks with plain `else` branches. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

DouweM requested review from Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin as code owners April 10, 2026 15:13

DouweM removed request for Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin April 10, 2026 15:13

DouweM marked this pull request as draft April 10, 2026 15:13

devin-ai-integration Bot reviewed Apr 10, 2026

View reviewed changes

mpfaffenberger reviewed Apr 13, 2026

View reviewed changes

grahamcracker1234 reviewed Apr 14, 2026

View reviewed changes

DouweM added this to the 2026-05 milestone Apr 23, 2026

dsfaccini changed the title ~~Add compaction capabilities: SlidingWindow, LimitWarner, Compaction~~ Add compaction capabilities: SlidingWindow, LimitWarner, Compaction Jun 2, 2026

dsfaccini changed the title ~~Add compaction capabilities: SlidingWindow, LimitWarner, Compaction~~ Add compaction menu: ClearToolResults, DeduplicateFileReads, TieredCompaction; rename Compaction → SummarizingCompaction Jun 3, 2026

dsfaccini changed the title ~~Add compaction menu: ClearToolResults, DeduplicateFileReads, TieredCompaction; rename Compaction → SummarizingCompaction~~ Add compaction menu: ClearToolResults, DeduplicateFileReads, TieredCompaction; rename Compaction → SummarizingCompaction Jun 3, 2026

docs(compaction): document summary-call usage accounting

fe9ccb6

The summary call's tokens and request are folded into the run's usage by design — consistent, cost-honest, and runaway-safe. Make that explicit in the README. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compaction menu: `ClearToolResults`, `DeduplicateFileReads`, `TieredCompaction`; rename `Compaction` → `SummarizingCompaction`#191

Add compaction menu: `ClearToolResults`, `DeduplicateFileReads`, `TieredCompaction`; rename `Compaction` → `SummarizingCompaction`#191
DouweM wants to merge 7 commits into
mainfrom
capability/compaction

DouweM commented Apr 10, 2026 •

edited by dsfaccini

Loading

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Uh oh!

mpfaffenberger Apr 13, 2026

Uh oh!

mpfaffenberger Apr 13, 2026

Uh oh!

mpfaffenberger Apr 13, 2026

Uh oh!

mpfaffenberger Apr 13, 2026

Uh oh!

mpfaffenberger Apr 13, 2026

Uh oh!

mpfaffenberger left a comment

Uh oh!

grahamcracker1234 Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		model: str
		"""Model to use for generating summaries (e.g. ``'openai:gpt-4o-mini'``)."""

Conversation

DouweM commented Apr 10, 2026 • edited by dsfaccini Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Capabilities

Design

From pyai-expert review

Not included

Quality

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Audit vs prior art: Compaction

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

mpfaffenberger Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

mpfaffenberger Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

mpfaffenberger Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

mpfaffenberger Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

mpfaffenberger Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

mpfaffenberger left a comment

Choose a reason for hiding this comment

Uh oh!

grahamcracker1234 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DouweM commented Apr 10, 2026 •

edited by dsfaccini

Loading