pydantic · dsfaccini · Jun 2, 2026 · May 26, 2026 · May 27, 2026 · May 28, 2026
diff --git a/.gitignore b/.gitignore
@@ -25,3 +25,4 @@ wheels/
 # Hypothesis
 .hypothesis/
 .vscode/
+mutants/
diff --git a/README.md b/README.md
@@ -103,8 +103,8 @@ We studied leading coding agents, agent frameworks, and Claw-style assistants to
 |---|---|---|---|---|
 | **Tools &&nbsp;execution** | **Code mode** | Sandboxed Python execution via [Monty](https://github.com/pydantic/monty) -- one `run_code` call replaces N tool calls | :white_check_mark: [Docs](pydantic_ai_harness/code_mode/) | |
 | | **Tool search** | Progressive tool discovery for large tool sets | :white_check_mark: [Pydantic&nbsp;AI](https://pydantic.dev/docs/ai/tools-toolsets/toolsets/#deferred-loading) | |
-| | **File system** | Read, write, edit, search files with path traversal prevention | :construction: [PR&nbsp;#177](https://github.com/pydantic/pydantic-ai-harness/pull/177) | [pydantic-ai-backend](https://github.com/vstorm-co/pydantic-ai-backend) (vstorm&#8209;co) |
-| | **Shell** | Execute commands with allowlists, denylists, and timeouts | :construction: [PR&nbsp;#177](https://github.com/pydantic/pydantic-ai-harness/pull/177) | [pydantic-ai-backend](https://github.com/vstorm-co/pydantic-ai-backend) (vstorm&#8209;co) |
+| | **File system** | Read, write, edit, search files with path traversal prevention | :white_check_mark: [Docs](pydantic_ai_harness/filesystem/) | [pydantic-ai-backend](https://github.com/vstorm-co/pydantic-ai-backend) (vstorm&#8209;co) |
+| | **Shell** | Execute commands with allowlists, denylists, and timeouts | :white_check_mark: [Docs](pydantic_ai_harness/shell/) | [pydantic-ai-backend](https://github.com/vstorm-co/pydantic-ai-backend) (vstorm&#8209;co) |
 | | **Repo context injection** | Auto-load CLAUDE.md/AGENTS.md and repo structure | :construction: [PR&nbsp;#175](https://github.com/pydantic/pydantic-ai-harness/pull/175) | [pydantic-deep](https://github.com/vstorm-co/pydantic-deepagents) (vstorm&#8209;co) |
 | | **Verification loop** | Run tests after edits, auto-fix failures | :construction: [PR&nbsp;#169](https://github.com/pydantic/pydantic-ai-harness/pull/169) | |
 | **Context management** | **Sliding window** | Trim conversation history to stay within token limits | :construction: [PR&nbsp;#191](https://github.com/pydantic/pydantic-ai-harness/pull/191) | [summarization-pydantic-ai](https://github.com/vstorm-co/summarization-pydantic-ai) (vstorm&#8209;co) |

diff --git a/docs/mutation-testing.md b/docs/mutation-testing.md
@@ -0,0 +1,47 @@
+# Mutation Testing
+
+Mutation testing complements the 100% branch-coverage requirement: coverage
+proves every line and branch runs, mutation testing proves the assertions
+actually pin the behavior down.
+
+Covers `pydantic_ai_harness/filesystem/_toolset.py` and
+`pydantic_ai_harness/shell/_toolset.py`.
+
+Run with [mutmut](https://mutmut.readthedocs.io/) v3 via `scripts/run-mutmut.sh`,
+which installs mutmut ephemerally with `uv run --with` — no dev dependency
+required.
+
+```bash
+scripts/run-mutmut.sh run --max-children 1
+scripts/run-mutmut.sh results
+scripts/run-mutmut.sh show <mutant-name>
+```
+
+## Interpreting survivors
+
+A surviving mutant is either a missing test or an equivalent mutant — a change
+that produces behavior no test could distinguish from the original. Triage each
+survivor; the recurring equivalent-mutant categories in this codebase are:
+
+- **Trampoline default params** — mutmut v3 wraps functions, and the wrapper
+  keeps the original defaults, so a mutated default is never observed.
+- **Omitted `name=` in `add_function()`** — pydantic-ai falls back to
+  `method.__name__`, which equals the explicit name being mutated away.
+- **`'utf-8'` encoding mutations** — Python's codec lookup is case-insensitive
+  and UTF-8 is the default text encoding, so case/omission changes are no-ops.
+- **`errors='replace'` mutations** — exercised only by invalid bytes; valid
+  UTF-8 test data never invokes the error handler.
+- **Unreachable `except` blocks** (marked `pragma: no cover`) — paths that
+  can't be triggered in the test environment.
+- **`CancelScope(shield=True)` flips** — require an outer cancellation during
+  the near-instant cleanup window.
+
+Anything outside these categories should be treated as a real gap and killed
+with a new test.
+
+## Limitations
+
+Trio-parametrized tests are excluded during mutation testing (`-k 'not trio'`
+in `pyproject.toml [tool.mutmut]`) because trio segfaults in mutmut's
+subprocess environment on Python 3.14 / macOS. The kill rate is unaffected —
+the trio tests exercise the same code paths as the asyncio tests.
diff --git a/pydantic_ai_harness/__init__.py b/pydantic_ai_harness/__init__.py
@@ -1,16 +1,26 @@
-"""The batteries for your Pydantic AI agent -- the official capability library."""
+"""Pydantic AI capability library."""
 
 from typing import TYPE_CHECKING
 
 if TYPE_CHECKING:
     from .code_mode import CodeMode
+    from .filesystem import FileSystem
+    from .shell import Shell
 
-__all__ = ['CodeMode']
+__all__ = ['CodeMode', 'FileSystem', 'Shell']
 
 
 def __getattr__(name: str) -> object:
     if name == 'CodeMode':
         from .code_mode import CodeMode
 
         return CodeMode
+    elif name == 'FileSystem':
+        from .filesystem import FileSystem
+
+        return FileSystem
+    elif name == 'Shell':
+        from .shell import Shell
+
+        return Shell
     raise AttributeError(f'module {__name__!r} has no attribute {name!r}')
diff --git a/pydantic_ai_harness/filesystem/README.md b/pydantic_ai_harness/filesystem/README.md
@@ -0,0 +1,136 @@
+# FileSystem
+
+Give an agent sandboxed, pattern-filtered access to a directory tree.
+
+## The problem
+
+Letting an agent touch the filesystem directly is risky: path traversal
+(`../../etc/passwd`), symlinks that escape the project, clobbering `.git`, or
+leaking `.env` secrets. Hand-rolling the guards around every tool call is
+repetitive and easy to get subtly wrong.
+
+## The solution
+
+`FileSystem` exposes a fixed set of file tools, all scoped to a single
+`root_dir`. Every path is resolved and containment-checked (symlinks included)
+before any I/O, and access is filtered through allow / deny / protected glob
+patterns.
+
+```python
+from pydantic_ai import Agent
+from pydantic_ai_harness import FileSystem
+
+agent = Agent(
+    'anthropic:claude-sonnet-4-6',
+    capabilities=[FileSystem(root_dir='./workspace')],
+)
+
+result = agent.run_sync('Read config.toml and tell me the package name.')
+print(result.output)
+```
+
+## Tools
+
+| Tool | Purpose |
+|---|---|
+| `read_file` | Read a text file with line numbers and a content hash. Binary files are detected and not dumped. |
+| `write_file` | Create or overwrite a file. Optional `expected_hash` rejects stale writes (optimistic concurrency). |
+| `edit_file` | Exact-string replacement; `old_text` must match exactly once. Optional `expected_hash`. |
+| `list_directory` | List a directory's entries with type indicators and sizes. |
+| `search_files` | Regex search over file contents, optionally narrowed by an `include_glob`. |
+| `find_files` | Glob search over file names (e.g. `*.py`, `**/*.json`). |
+| `create_directory` | Create a directory and any missing parents. |
+| `file_info` | Metadata for a file or directory (size, type, line count, hash, symlink target). |
+
+## Security model
+
+- **Containment.** Paths resolve relative to `root_dir`; anything resolving
+  outside — via `..`, an absolute path, or a symlink — is rejected. Symlinks
+  are resolved with `os.path.realpath` *before* the containment check, closing
+  the TOCTTOU window.
+- **Binary detection.** `read_file` returns a placeholder instead of dumping
+  binary bytes into the model context.
+- **Optimistic concurrency.** `write_file`/`edit_file` accept an
+  `expected_hash` so an agent operating on a stale read is told to re-read
+  rather than silently overwriting newer content.
+
+## Pattern filtering
+
+Three independent glob lists control access. Patterns are matched with
+`fnmatch`, whose `*` spans `/`, so `*.py` matches `src/main.py` and you rarely
+need `**`.
+
+| Field | Effect |
+|---|---|
+| `allowed_patterns` | If non-empty, only matching paths are accessible (allowlist). |
+| `denied_patterns` | Matching paths are always rejected (denylist). |
+| `protected_patterns` | Matching paths are read-only — reads succeed, writes are rejected. |
+
+`protected_patterns` defaults to `.git/`, `.env`/`.env.*`, `*.pem`, `*.key`,
+and `**/secrets*`. Pass an empty list to disable protection.
+
+### Direct access vs. walkers
+
+The three rules apply at two different granularities:
+
+- **Direct access** (`read_file`, `write_file`, `edit_file`, `file_info`,
+  `create_directory`) gates the operation's target path. You must name a path
+  that the patterns permit.
+- **Walkers** (`list_directory`, `search_files`, `find_files`) gate their root
+  by deny/protected patterns, but **not** by `allowed_patterns` — a directory
+  root like `.` never matches a file pattern such as `src/*.py`, so requiring
+  it to would make every listing fail. Instead, the root is always walked and
+  each **entry** is filtered against all three lists. A directory listing can
+  never surface a path the agent couldn't otherwise read or write.
+
+So with `allowed_patterns=['*.py']`, `list_directory('.')` succeeds and shows
+only the `.py` entries; `read_file('notes.md')` is rejected.
+
+> Dotfiles and dot-directories (`.git`, `.env`, `.github`, …) are skipped by
+> all three walkers — `list_directory`, `search_files`, and `find_files` —
+> regardless of patterns.
+
+## Configuration
+
+```python
+FileSystem(
+    root_dir='.',                  # str | Path — sandbox root
+    allowed_patterns=[],           # allowlist globs (empty = allow all)
+    denied_patterns=[],            # denylist globs
+    protected_patterns=[...],      # read-only globs (defaults to secrets/.git)
+    max_read_lines=2000,           # cap for a single read_file
+    max_search_results=1000,       # cap for search_files
+    max_find_results=1000,         # cap for find_files
+)
+```
+
+The integer limits must be positive; they are validated at construction.
+
+## Agent spec (YAML/JSON)
+
+`FileSystem` works with Pydantic AI's
+[agent spec](https://ai.pydantic.dev/agent-spec/):
+
+```yaml
+# agent.yaml
+model: anthropic:claude-sonnet-4-6
+capabilities:
+  - FileSystem:
+      root_dir: ./workspace
+      allowed_patterns: ['*.py', '*.toml']
+```
+
+```python
+from pydantic_ai import Agent
+from pydantic_ai_harness import FileSystem
+
+agent = Agent.from_file('agent.yaml', custom_capability_types=[FileSystem])
+```
+
+Pass `custom_capability_types` so the spec loader knows how to instantiate
+`FileSystem`.
+
+## Further reading
+
+- [Pydantic AI capabilities](https://ai.pydantic.dev/capabilities/)
+- [Toolsets](https://ai.pydantic.dev/toolsets/)
diff --git a/pydantic_ai_harness/filesystem/__init__.py b/pydantic_ai_harness/filesystem/__init__.py
@@ -0,0 +1,6 @@
+"""Filesystem capability: gives agents configurable, sandboxed file system access."""
+
+from pydantic_ai_harness.filesystem._capability import FileSystem
+from pydantic_ai_harness.filesystem._toolset import FileSystemToolset
+
+__all__ = ['FileSystem', 'FileSystemToolset']
diff --git a/pydantic_ai_harness/filesystem/_capability.py b/pydantic_ai_harness/filesystem/_capability.py
@@ -0,0 +1,81 @@
+"""Filesystem capability that provides sandboxed file system access."""
+
+from __future__ import annotations
+
+from collections.abc import Sequence
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any
+
+from pydantic_ai.capabilities import AbstractCapability
+from pydantic_ai.tools import AgentDepsT
+from pydantic_ai.toolsets import AgentToolset
+
+from pydantic_ai_harness.filesystem._toolset import FileSystemToolset
+
+_DEFAULT_PROTECTED: list[str] = [
+    '.git/*',
+    '.env',
+    '.env.*',
+    '*.pem',
+    '*.key',
+    '**/secrets*',
+]
+
+
+@dataclass
+class FileSystem(AbstractCapability[AgentDepsT]):
+    """File system access scoped to a root directory.
+
+    All paths are resolved relative to `root_dir`. Traversal above the root
+    is rejected. Symlinks are resolved before authorization.
+    """
+
+    root_dir: str | Path = '.'
+    """Root directory for all file operations. Defaults to the current directory."""
+
+    allowed_patterns: Sequence[str] = field(default_factory=list[str])
+    """If non-empty, only paths matching at least one glob pattern are accessible."""
+
+    denied_patterns: Sequence[str] = field(default_factory=list[str])
+    """Paths matching any of these glob patterns are rejected."""
+
+    protected_patterns: Sequence[str] = field(default_factory=lambda: list(_DEFAULT_PROTECTED))
+    """Paths matching these patterns are read-only (writes are rejected).
+
+    Defaults to protecting `.git/`, `.env`, key files, and secrets.
+    Set to an empty list to disable protection.
+    """
+
+    max_read_lines: int = 2000
+    """Maximum number of lines returned by a single `read_file` call."""
+
+    max_search_results: int = 1000
+    """Maximum number of matches returned by `search_files`."""
+
+    max_find_results: int = 1000
+    """Maximum number of matches returned by `find_files`."""
+
+    def __post_init__(self) -> None:
+        # Runtime validation: dataclass field annotations are advisory, not enforced.
+        # A config-driven caller could pass a string that would otherwise propagate.
+        values: dict[str, Any] = {
+            'max_read_lines': self.max_read_lines,
+            'max_search_results': self.max_search_results,
+            'max_find_results': self.max_find_results,
+        }
+        for name, value in values.items():
+            if not isinstance(value, int) or value <= 0:
+                raise ValueError(f'{name} must be a positive integer, got {value!r}')
+
+    def get_toolset(self) -> AgentToolset[AgentDepsT]:
+        """Build and return the filesystem toolset."""
+        return FileSystemToolset[AgentDepsT](
+            root_dir=Path(self.root_dir),
+            allowed_patterns=self.allowed_patterns,
+            denied_patterns=self.denied_patterns,
+            protected_patterns=self.protected_patterns,
+            max_read_lines=self.max_read_lines,
+            max_search_results=self.max_search_results,
+            max_find_results=self.max_find_results,
+        )