-
Notifications
You must be signed in to change notification settings - Fork 33
feat: add FileSystem and Shell capabilities #260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
bd268c8
feat: add FileSystem and Shell capabilities with exhaustive testing
strawgate 592ba63
Clean-up Filesystem and Shell capabilities
strawgate dd01ea6
Address non-controversial reviewer feedback on PR #260
strawgate a269e6c
Move mutmut out of dev deps into a one-off script
strawgate 961452a
Replace getattr with direct field access in FileSystem.__post_init__
strawgate a7eed3d
Replace field-name references with literal defaults in docstrings
strawgate d6a6ee5
filter recursive listings/searches by protected and denied patterns
strawgate ea12712
fix(filesystem): let walkers list under a file-shaped allowlist; add …
claude b8a5fdc
fix(filesystem): hide dotfiles in list_directory for walker consistency
claude bb9974f
fix(shell): keep the tail when truncating command output
claude c9542ba
fix(filesystem,shell): recoverable errors, per-run isolation, cwd har…
dsfaccini File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,3 +25,4 @@ wheels/ | |
| # Hypothesis | ||
| .hypothesis/ | ||
| .vscode/ | ||
| mutants/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| # Mutation Testing | ||
|
|
||
| Mutation testing complements the 100% branch-coverage requirement: coverage | ||
| proves every line and branch runs, mutation testing proves the assertions | ||
| actually pin the behavior down. | ||
|
|
||
| Covers `pydantic_ai_harness/filesystem/_toolset.py` and | ||
| `pydantic_ai_harness/shell/_toolset.py`. | ||
|
|
||
| Run with [mutmut](https://mutmut.readthedocs.io/) v3 via `scripts/run-mutmut.sh`, | ||
| which installs mutmut ephemerally with `uv run --with` — no dev dependency | ||
| required. | ||
|
|
||
| ```bash | ||
| scripts/run-mutmut.sh run --max-children 1 | ||
| scripts/run-mutmut.sh results | ||
| scripts/run-mutmut.sh show <mutant-name> | ||
| ``` | ||
|
|
||
| ## Interpreting survivors | ||
|
|
||
| A surviving mutant is either a missing test or an equivalent mutant — a change | ||
| that produces behavior no test could distinguish from the original. Triage each | ||
| survivor; the recurring equivalent-mutant categories in this codebase are: | ||
|
|
||
| - **Trampoline default params** — mutmut v3 wraps functions, and the wrapper | ||
| keeps the original defaults, so a mutated default is never observed. | ||
| - **Omitted `name=` in `add_function()`** — pydantic-ai falls back to | ||
| `method.__name__`, which equals the explicit name being mutated away. | ||
| - **`'utf-8'` encoding mutations** — Python's codec lookup is case-insensitive | ||
| and UTF-8 is the default text encoding, so case/omission changes are no-ops. | ||
| - **`errors='replace'` mutations** — exercised only by invalid bytes; valid | ||
| UTF-8 test data never invokes the error handler. | ||
| - **Unreachable `except` blocks** (marked `pragma: no cover`) — paths that | ||
| can't be triggered in the test environment. | ||
| - **`CancelScope(shield=True)` flips** — require an outer cancellation during | ||
| the near-instant cleanup window. | ||
|
|
||
| Anything outside these categories should be treated as a real gap and killed | ||
| with a new test. | ||
|
|
||
| ## Limitations | ||
|
|
||
| Trio-parametrized tests are excluded during mutation testing (`-k 'not trio'` | ||
| in `pyproject.toml [tool.mutmut]`) because trio segfaults in mutmut's | ||
| subprocess environment on Python 3.14 / macOS. The kill rate is unaffected — | ||
| the trio tests exercise the same code paths as the asyncio tests. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,16 +1,26 @@ | ||
| """The batteries for your Pydantic AI agent -- the official capability library.""" | ||
| """Pydantic AI capability library.""" | ||
|
|
||
| from typing import TYPE_CHECKING | ||
|
|
||
| if TYPE_CHECKING: | ||
| from .code_mode import CodeMode | ||
| from .filesystem import FileSystem | ||
| from .shell import Shell | ||
|
|
||
| __all__ = ['CodeMode'] | ||
| __all__ = ['CodeMode', 'FileSystem', 'Shell'] | ||
|
|
||
|
|
||
| def __getattr__(name: str) -> object: | ||
| if name == 'CodeMode': | ||
| from .code_mode import CodeMode | ||
|
|
||
| return CodeMode | ||
| elif name == 'FileSystem': | ||
| from .filesystem import FileSystem | ||
|
|
||
| return FileSystem | ||
| elif name == 'Shell': | ||
| from .shell import Shell | ||
|
|
||
| return Shell | ||
| raise AttributeError(f'module {__name__!r} has no attribute {name!r}') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,136 @@ | ||
| # FileSystem | ||
|
|
||
| Give an agent sandboxed, pattern-filtered access to a directory tree. | ||
|
|
||
| ## The problem | ||
|
|
||
| Letting an agent touch the filesystem directly is risky: path traversal | ||
| (`../../etc/passwd`), symlinks that escape the project, clobbering `.git`, or | ||
| leaking `.env` secrets. Hand-rolling the guards around every tool call is | ||
| repetitive and easy to get subtly wrong. | ||
|
|
||
| ## The solution | ||
|
|
||
| `FileSystem` exposes a fixed set of file tools, all scoped to a single | ||
| `root_dir`. Every path is resolved and containment-checked (symlinks included) | ||
| before any I/O, and access is filtered through allow / deny / protected glob | ||
| patterns. | ||
|
|
||
| ```python | ||
| from pydantic_ai import Agent | ||
| from pydantic_ai_harness import FileSystem | ||
|
|
||
| agent = Agent( | ||
| 'anthropic:claude-sonnet-4-6', | ||
| capabilities=[FileSystem(root_dir='./workspace')], | ||
| ) | ||
|
|
||
| result = agent.run_sync('Read config.toml and tell me the package name.') | ||
| print(result.output) | ||
| ``` | ||
|
|
||
| ## Tools | ||
|
|
||
| | Tool | Purpose | | ||
| |---|---| | ||
| | `read_file` | Read a text file with line numbers and a content hash. Binary files are detected and not dumped. | | ||
| | `write_file` | Create or overwrite a file. Optional `expected_hash` rejects stale writes (optimistic concurrency). | | ||
| | `edit_file` | Exact-string replacement; `old_text` must match exactly once. Optional `expected_hash`. | | ||
| | `list_directory` | List a directory's entries with type indicators and sizes. | | ||
| | `search_files` | Regex search over file contents, optionally narrowed by an `include_glob`. | | ||
| | `find_files` | Glob search over file names (e.g. `*.py`, `**/*.json`). | | ||
| | `create_directory` | Create a directory and any missing parents. | | ||
| | `file_info` | Metadata for a file or directory (size, type, line count, hash, symlink target). | | ||
|
|
||
| ## Security model | ||
|
|
||
| - **Containment.** Paths resolve relative to `root_dir`; anything resolving | ||
| outside — via `..`, an absolute path, or a symlink — is rejected. Symlinks | ||
| are resolved with `os.path.realpath` *before* the containment check, closing | ||
| the TOCTTOU window. | ||
| - **Binary detection.** `read_file` returns a placeholder instead of dumping | ||
| binary bytes into the model context. | ||
| - **Optimistic concurrency.** `write_file`/`edit_file` accept an | ||
| `expected_hash` so an agent operating on a stale read is told to re-read | ||
| rather than silently overwriting newer content. | ||
|
|
||
| ## Pattern filtering | ||
|
|
||
| Three independent glob lists control access. Patterns are matched with | ||
| `fnmatch`, whose `*` spans `/`, so `*.py` matches `src/main.py` and you rarely | ||
| need `**`. | ||
|
|
||
| | Field | Effect | | ||
| |---|---| | ||
| | `allowed_patterns` | If non-empty, only matching paths are accessible (allowlist). | | ||
| | `denied_patterns` | Matching paths are always rejected (denylist). | | ||
| | `protected_patterns` | Matching paths are read-only — reads succeed, writes are rejected. | | ||
|
|
||
| `protected_patterns` defaults to `.git/`, `.env`/`.env.*`, `*.pem`, `*.key`, | ||
| and `**/secrets*`. Pass an empty list to disable protection. | ||
|
|
||
| ### Direct access vs. walkers | ||
|
|
||
| The three rules apply at two different granularities: | ||
|
|
||
| - **Direct access** (`read_file`, `write_file`, `edit_file`, `file_info`, | ||
| `create_directory`) gates the operation's target path. You must name a path | ||
| that the patterns permit. | ||
| - **Walkers** (`list_directory`, `search_files`, `find_files`) gate their root | ||
| by deny/protected patterns, but **not** by `allowed_patterns` — a directory | ||
| root like `.` never matches a file pattern such as `src/*.py`, so requiring | ||
| it to would make every listing fail. Instead, the root is always walked and | ||
| each **entry** is filtered against all three lists. A directory listing can | ||
| never surface a path the agent couldn't otherwise read or write. | ||
|
|
||
| So with `allowed_patterns=['*.py']`, `list_directory('.')` succeeds and shows | ||
| only the `.py` entries; `read_file('notes.md')` is rejected. | ||
|
|
||
| > Dotfiles and dot-directories (`.git`, `.env`, `.github`, …) are skipped by | ||
| > all three walkers — `list_directory`, `search_files`, and `find_files` — | ||
| > regardless of patterns. | ||
|
|
||
| ## Configuration | ||
|
|
||
| ```python | ||
| FileSystem( | ||
| root_dir='.', # str | Path — sandbox root | ||
| allowed_patterns=[], # allowlist globs (empty = allow all) | ||
| denied_patterns=[], # denylist globs | ||
| protected_patterns=[...], # read-only globs (defaults to secrets/.git) | ||
| max_read_lines=2000, # cap for a single read_file | ||
| max_search_results=1000, # cap for search_files | ||
| max_find_results=1000, # cap for find_files | ||
| ) | ||
| ``` | ||
|
|
||
| The integer limits must be positive; they are validated at construction. | ||
|
|
||
| ## Agent spec (YAML/JSON) | ||
|
|
||
| `FileSystem` works with Pydantic AI's | ||
| [agent spec](https://ai.pydantic.dev/agent-spec/): | ||
|
|
||
| ```yaml | ||
| # agent.yaml | ||
| model: anthropic:claude-sonnet-4-6 | ||
| capabilities: | ||
| - FileSystem: | ||
| root_dir: ./workspace | ||
| allowed_patterns: ['*.py', '*.toml'] | ||
| ``` | ||
|
|
||
| ```python | ||
| from pydantic_ai import Agent | ||
| from pydantic_ai_harness import FileSystem | ||
|
|
||
| agent = Agent.from_file('agent.yaml', custom_capability_types=[FileSystem]) | ||
| ``` | ||
|
|
||
| Pass `custom_capability_types` so the spec loader knows how to instantiate | ||
| `FileSystem`. | ||
|
|
||
| ## Further reading | ||
|
|
||
| - [Pydantic AI capabilities](https://ai.pydantic.dev/capabilities/) | ||
| - [Toolsets](https://ai.pydantic.dev/toolsets/) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| """Filesystem capability: gives agents configurable, sandboxed file system access.""" | ||
|
|
||
| from pydantic_ai_harness.filesystem._capability import FileSystem | ||
| from pydantic_ai_harness.filesystem._toolset import FileSystemToolset | ||
|
|
||
| __all__ = ['FileSystem', 'FileSystemToolset'] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| """Filesystem capability that provides sandboxed file system access.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from collections.abc import Sequence | ||
| from dataclasses import dataclass, field | ||
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
| from pydantic_ai.capabilities import AbstractCapability | ||
| from pydantic_ai.tools import AgentDepsT | ||
| from pydantic_ai.toolsets import AgentToolset | ||
|
|
||
| from pydantic_ai_harness.filesystem._toolset import FileSystemToolset | ||
|
|
||
| _DEFAULT_PROTECTED: list[str] = [ | ||
| '.git/*', | ||
| '.env', | ||
| '.env.*', | ||
| '*.pem', | ||
| '*.key', | ||
| '**/secrets*', | ||
| ] | ||
|
|
||
|
|
||
| @dataclass | ||
| class FileSystem(AbstractCapability[AgentDepsT]): | ||
| """File system access scoped to a root directory. | ||
|
|
||
| All paths are resolved relative to `root_dir`. Traversal above the root | ||
| is rejected. Symlinks are resolved before authorization. | ||
| """ | ||
|
|
||
| root_dir: str | Path = '.' | ||
| """Root directory for all file operations. Defaults to the current directory.""" | ||
|
|
||
| allowed_patterns: Sequence[str] = field(default_factory=list[str]) | ||
| """If non-empty, only paths matching at least one glob pattern are accessible.""" | ||
|
|
||
| denied_patterns: Sequence[str] = field(default_factory=list[str]) | ||
| """Paths matching any of these glob patterns are rejected.""" | ||
|
|
||
| protected_patterns: Sequence[str] = field(default_factory=lambda: list(_DEFAULT_PROTECTED)) | ||
| """Paths matching these patterns are read-only (writes are rejected). | ||
|
|
||
| Defaults to protecting `.git/`, `.env`, key files, and secrets. | ||
| Set to an empty list to disable protection. | ||
| """ | ||
|
|
||
| max_read_lines: int = 2000 | ||
| """Maximum number of lines returned by a single `read_file` call.""" | ||
|
|
||
| max_search_results: int = 1000 | ||
| """Maximum number of matches returned by `search_files`.""" | ||
|
|
||
| max_find_results: int = 1000 | ||
| """Maximum number of matches returned by `find_files`.""" | ||
|
|
||
| def __post_init__(self) -> None: | ||
| # Runtime validation: dataclass field annotations are advisory, not enforced. | ||
| # A config-driven caller could pass a string that would otherwise propagate. | ||
| values: dict[str, Any] = { | ||
| 'max_read_lines': self.max_read_lines, | ||
| 'max_search_results': self.max_search_results, | ||
| 'max_find_results': self.max_find_results, | ||
| } | ||
| for name, value in values.items(): | ||
| if not isinstance(value, int) or value <= 0: | ||
| raise ValueError(f'{name} must be a positive integer, got {value!r}') | ||
|
|
||
| def get_toolset(self) -> AgentToolset[AgentDepsT]: | ||
| """Build and return the filesystem toolset.""" | ||
| return FileSystemToolset[AgentDepsT]( | ||
| root_dir=Path(self.root_dir), | ||
| allowed_patterns=self.allowed_patterns, | ||
| denied_patterns=self.denied_patterns, | ||
| protected_patterns=self.protected_patterns, | ||
| max_read_lines=self.max_read_lines, | ||
| max_search_results=self.max_search_results, | ||
| max_find_results=self.max_find_results, | ||
| ) | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.