Add compaction menu: ClearToolResults, DeduplicateFileReads, TieredCompaction; rename Compaction → SummarizingCompaction#191
Add compaction menu: ClearToolResults, DeduplicateFileReads, TieredCompaction; rename Compaction → SummarizingCompaction#191DouweM wants to merge 7 commits into
ClearToolResults, DeduplicateFileReads, TieredCompaction; rename Compaction → SummarizingCompaction#191Conversation
Implements three compaction-related capabilities for managing conversation context in long-running agents: - SlidingWindow: zero-cost message trimming that preserves tool-call pairs - LimitWarner: injects warnings when approaching iteration/token limits - Compaction: LLM-powered summarization of older messages All three use the before_model_request hook to modify request_context.messages transparently. The safe cutoff logic ensures tool-call / tool-return pairs are never orphaned, preventing HTTP 400 errors from LLM providers. Closes #21 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add explicit `set[str]` type annotations and replace unnecessary `isinstance` checks with plain `else` branches. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion Implements three improvements from the audit findings on PR #140: - Optional `tokenizer: Callable[[str], int] | None` parameter on SlidingWindow, Compaction, estimate_token_count, and _find_token_cutoff. When provided, enables accurate token counting; the 4-chars/token heuristic stays as fallback. - `preserve_first_user_message: bool = True` on SlidingWindow and Compaction. When True, the first ModelRequest containing a UserPromptPart is always retained after trimming/compaction, preserving the original task context. - `incremental: bool = True` on Compaction. When True and a prior compaction summary exists in the message history, it is included in the summarization prompt via a <previous_summary> tag so the LLM extends it rather than regenerating from scratch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Note: This PR implements client-side compaction (LLM summarization + sliding window). Provider-side compaction (OpenAI/Anthropic) additionally requires the core primitive in #141 (CompactionPart message type + compact_messages on Model). |
Audit vs prior art: Compaction |
| system_parts = _extract_system_prompts(messages) | ||
| to_summarize = messages[:cutoff] | ||
| preserved = messages[cutoff:] | ||
|
|
||
| previous_summary = _extract_previous_summary(messages) if self.incremental else None | ||
| summary = await self._summarize(to_summarize, previous_summary=previous_summary) | ||
|
|
||
| summary_part = SystemPromptPart(content=f'{_SUMMARY_PREFIX}{summary}') | ||
| summary_message = ModelRequest(parts=[*system_parts, summary_part]) |
There was a problem hiding this comment.
🔴 Old compaction summaries accumulate as SystemPromptParts across multiple compaction cycles
After the first compaction, the summary message contains [SystemPromptPart('original sys prompt'), SystemPromptPart('Summary of previous conversation:\n\n...')]. When a second compaction triggers, _extract_system_prompts(messages) at line 726 extracts ALL leading SystemPromptParts from this message — including the old summary part (since it's also a SystemPromptPart). The old summary is then re-included in the new summary message at line 734 alongside the new summary. After N compactions, the summary message contains N stale summary parts plus the new one, growing the context unboundedly and defeating the purpose of compaction.
Trace through two compaction cycles
After first compaction, result.messages[0] = ModelRequest(parts=[SystemPromptPart('sys'), SystemPromptPart('Summary of previous conversation:\n\nfirst summary')]).
When second compaction triggers, _extract_system_prompts (src/pydantic_harness/compaction.py:594-605) sees both parts are SystemPromptPart, extracts both. Then line 734 creates ModelRequest(parts=[SystemPromptPart('sys'), SystemPromptPart('...first summary'), SystemPromptPart('...second summary')]). The old summary is never removed.
| system_parts = _extract_system_prompts(messages) | |
| to_summarize = messages[:cutoff] | |
| preserved = messages[cutoff:] | |
| previous_summary = _extract_previous_summary(messages) if self.incremental else None | |
| summary = await self._summarize(to_summarize, previous_summary=previous_summary) | |
| summary_part = SystemPromptPart(content=f'{_SUMMARY_PREFIX}{summary}') | |
| summary_message = ModelRequest(parts=[*system_parts, summary_part]) | |
| system_parts = [ | |
| p for p in _extract_system_prompts(messages) | |
| if not p.content.startswith(_SUMMARY_PREFIX) | |
| ] | |
| to_summarize = messages[:cutoff] | |
| preserved = messages[cutoff:] | |
| previous_summary = _extract_previous_summary(messages) if self.incremental else None | |
| summary = await self._summarize(to_summarize, previous_summary=previous_summary) | |
| summary_part = SystemPromptPart(content=f'{_SUMMARY_PREFIX}{summary}') | |
| summary_message = ModelRequest(parts=[*system_parts, summary_part]) |
Was this helpful? React with 👍 or 👎 to provide feedback.
|
|
||
| [tool.coverage.report] | ||
| fail_under = 100 | ||
| fail_under = 98 |
There was a problem hiding this comment.
🚩 Coverage threshold lowered from 100% to 98%
The fail_under threshold in pyproject.toml:96 was reduced from 100 to 98, with the commit noting 'due to branch coverage of elif chains'. This permanently lowers the bar for the entire project. Consider using # pragma: no branch on specific elif chains instead of lowering the global threshold.
Was this helpful? React with 👍 or 👎 to provide feedback.
| # Token estimation | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| _CHARS_PER_TOKEN = 4 |
There was a problem hiding this comment.
This will underestimate for Anthropic models unfortunately. It works pretty well for OpenAI ones. I settled on 2.5 in Code Puppy to give a lot of slack (to avoid the errors in Vertex).
There was a problem hiding this comment.
Coming back to this, I suggest we make it configurable somehow? Perhaps an environment var (yucky, but there should be a way to override it for power users)
| segments.append(str(part.content)) | ||
| else: | ||
| for part in msg.parts: | ||
| if isinstance(part, TextPart): |
There was a problem hiding this comment.
You don't want to include ThinkingPart?
| # Safe cutoff logic — preserves tool-call / tool-return pairs | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| _TOOL_PAIR_SEARCH_RANGE = 5 |
There was a problem hiding this comment.
If I'm understanding this correctly, it could fail if your model performs any number > 5 parallel tool calls.
| """Number of tail messages to preserve after compaction (message-count trigger).""" | ||
|
|
||
| keep_tokens: int | None = None | ||
| """Target token budget to preserve after compaction (token-count trigger). |
There was a problem hiding this comment.
Love this <3 - I used this strategy in Code Puppy and the agent keeps coherence very nicely. It can get expensive though.
mpfaffenberger
left a comment
There was a problem hiding this comment.
Left a few comments. Hope they're helpful.
| model: str | ||
| """Model to use for generating summaries (e.g. ``'openai:gpt-4o-mini'``).""" |
There was a problem hiding this comment.
This should likely include KnownModelName and Model, and use infer_model under the hood.
SlidingWindow, LimitWarner, Compaction
…Compaction → SummarizingCompaction Extend the compaction menu with the SOTA-aligned strategies missing from the initial cut, and align naming with the "compaction" umbrella term. - ClearToolResults: zero-LLM in-place clearing of old tool results (Anthropic clear_tool_uses); keep_pairs, exclude_tools, clear_tool_inputs (JSON-valid args), min_clear_tokens to protect the prompt cache. - DeduplicateFileReads: zero-LLM; keep only the latest read per file via a required file_key seam. - TieredCompaction: escalation orchestrator — cheap passes first, summarize only if still over target_tokens. CompactionStrategy protocol exported for custom tiers. - Rename Compaction → SummarizingCompaction; structured-section summary prompt; model now optional (inherits the run's model); summary-call usage folded into the parent run. All strategies preserve tool-call/return pairing. compaction.py at 100% branch coverage; pyright strict + ruff clean. Adds capability README (compaction.md). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
SlidingWindow, LimitWarner, CompactionResolve the repo restructure (package renamed pydantic_harness → pydantic_ai_harness,
moved to per-capability subpackages). Migrate compaction into the new layout:
- pydantic_ai_harness/compaction/{__init__,_capability}.py + README.md
- root __init__.py exposes the 6 compaction capabilities via lazy __getattr__
- tests moved to tests/compaction/; imports split public vs ._capability
- drop now-unused noqa (main's ruff config does not enforce D102/D105)
- widen _call_args test helper for the new ToolCallPart.args union (ToolSearchArgs)
- pyproject coverage config kept identical to main (protocol stub excluded via pragma)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ClearToolResults, DeduplicateFileReads, TieredCompaction; rename Compaction → SummarizingCompaction
The summary call's tokens and request are folded into the run's usage by design — consistent, cost-honest, and runaway-safe. Make that explicit in the README. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Split the monolithic _capability.py into a module per capability plus a _shared module for cross-capability utilities (token estimation, the CompactionStrategy protocol, tool-pair-safe cutoffs, first-user preservation, in-place tool-result clearing). - _shared.py exposes its package-internal API without leading underscores (the module itself is private); genuinely module-local helpers keep their underscore. - _sliding_window / _clear_tool_results / _deduplicate_file_reads / _limit_warner / _summarizing_compaction / _tiered_compaction each own one capability. - __init__.py re-exports the public API; tests import privates from their new homes. No behavior change. Every compaction module at 100% branch coverage; ruff + pyright strict clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extends the compaction capability set to the SOTA-aligned menu and aligns naming with the
compactionumbrella term.Capabilities
SlidingWindowClearToolResultskeep_pairs(Anthropicclear_tool_uses)DeduplicateFileReadsfile_keyseam)SummarizingCompactionCompaction)TieredCompactiontarget_tokensLimitWarnerDesign
CompactionStrategy.compact(messages, ctx)seam (exported) — trigger lives inbefore_model_request, the transform is unconditional, soTieredCompactiondrives tiers directly and re-measures between them (true escalation, not threshold-stacking). Users can plug in custom tiers.SummarizingCompaction.modelis now optional — inherits the run's model when unset; no token caps.before_model_requestedits persist into the run history (verified against core), so trims/clears/summaries carry forward and summarization is not recomputed each step.From pyai-expert review
clear_tool_inputsclearsargsto JSON-valid'{}'(a bare string would reach providers as malformed function-args viaargs_as_json_str()).SummarizingCompactionthreadsctx.usageinto the summary run so its tokens fold into the parent run (this also counts the summary as a request — documented).Not included
Per-result truncation (
TruncateToolOutputs) was intentionally dropped in favour of a future capability that overflows large tool outputs to a file the agent/subagent can query — lossy truncation is unnecessary.Quality
compaction.pyat 100% branch coverage; ruff + pyright strict clean; tests useTestModel/mocks only. Adds capability README (compaction.md).🤖 Generated with Claude Code