-
Notifications
You must be signed in to change notification settings - Fork 33
feat: add FileSystem and Shell capabilities #260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+4,104
−5
Merged
Changes from 3 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
bd268c8
feat: add FileSystem and Shell capabilities with exhaustive testing
strawgate 592ba63
Clean-up Filesystem and Shell capabilities
strawgate dd01ea6
Address non-controversial reviewer feedback on PR #260
strawgate a269e6c
Move mutmut out of dev deps into a one-off script
strawgate 961452a
Replace getattr with direct field access in FileSystem.__post_init__
strawgate a7eed3d
Replace field-name references with literal defaults in docstrings
strawgate d6a6ee5
filter recursive listings/searches by protected and denied patterns
strawgate ea12712
fix(filesystem): let walkers list under a file-shaped allowlist; add …
claude b8a5fdc
fix(filesystem): hide dotfiles in list_directory for walker consistency
claude bb9974f
fix(shell): keep the tail when truncating command output
claude c9542ba
fix(filesystem,shell): recoverable errors, per-run isolation, cwd har…
dsfaccini File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,3 +25,4 @@ wheels/ | |
| # Hypothesis | ||
| .hypothesis/ | ||
| .vscode/ | ||
| mutants/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| # Mutation Testing Results | ||
|
|
||
| > Generated from commit `bd268c8` on 2026-05-26. Results may become stale as code | ||
| > evolves — regenerate via `uv run mutmut run --max-children 1`. | ||
|
|
||
| Covers `pydantic_ai_harness/filesystem/_toolset.py` and `pydantic_ai_harness/shell/_toolset.py`. | ||
|
|
||
| Run with [mutmut](https://mutmut.readthedocs.io/) v3 (`uv run mutmut run --max-children 1`). | ||
|
|
||
| ## Summary | ||
|
|
||
| | Metric | Value | | ||
| |---|---| | ||
| | Total mutants | 584 | | ||
| | Killed | 524 | | ||
| | Survived | 60 | | ||
| | Kill rate | **89.7%** | | ||
|
|
||
| ## Equivalent Mutants (60 survivors) | ||
|
|
||
| All 60 survivors are provably equivalent — no test can distinguish them from the original. | ||
|
|
||
| | Category | Count | Why unkillable | | ||
| |---|---|---| | ||
| | Trampoline default params | 7 | mutmut v3 wraps functions; wrapper keeps original defaults, so mutated defaults are never observed | | ||
| | `name=None` / omitted in `add_function()` | 18 | pydantic-ai falls back to `method.__name__`, which equals the original explicit name | | ||
| | Encoding case `'utf-8'` → `'UTF-8'` | 10 | Python's codec lookup is case-insensitive | | ||
| | Encoding omit/`None` (`utf-8` is default) | 11 | Default text encoding is UTF-8 on all supported platforms | | ||
| | Unreachable `except` blocks (`pragma: no cover`) | 6 | `except ValueError/OSError` paths can't be triggered in the test environment | | ||
| | `replace()` count removed/changed | 2 | Count is pre-validated as exactly 1 before the call | | ||
| | `CancelScope(shield=True)` → `False`/`None` | 2 | Requires an outer cancellation to fire during the ~instant cleanup window | | ||
| | Dead `returncode` branch | 1 | `proc.returncode` is never `None` after `await proc.wait()` | | ||
| | `errors='replace'` mutations | 3 | Test data is valid UTF-8; the error handler is never invoked | | ||
|
|
||
| ## Limitations | ||
|
|
||
| Trio-parametrized tests are excluded during mutation testing (`-k 'not trio'` in | ||
| `pyproject.toml [tool.mutmut]`) because trio segfaults in mutmut's subprocess | ||
| environment on Python 3.14 / macOS. This does not affect the kill rate — trio | ||
| tests exercise the same code paths as the asyncio tests. | ||
|
|
||
| ## Running | ||
|
|
||
| ```bash | ||
| uv run mutmut run --max-children 1 | ||
| uv run mutmut results | ||
| uv run mutmut show <mutant-name> | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,16 +1,26 @@ | ||
| """The batteries for your Pydantic AI agent -- the official capability library.""" | ||
| """Pydantic AI capability library.""" | ||
|
|
||
| from typing import TYPE_CHECKING | ||
|
|
||
| if TYPE_CHECKING: | ||
| from .code_mode import CodeMode | ||
| from .filesystem import FileSystem | ||
| from .shell import Shell | ||
|
|
||
| __all__ = ['CodeMode'] | ||
| __all__ = ['CodeMode', 'FileSystem', 'Shell'] | ||
|
|
||
|
|
||
| def __getattr__(name: str) -> object: | ||
| if name == 'CodeMode': | ||
| from .code_mode import CodeMode | ||
|
|
||
| return CodeMode | ||
| elif name == 'FileSystem': | ||
| from .filesystem import FileSystem | ||
|
|
||
| return FileSystem | ||
| elif name == 'Shell': | ||
| from .shell import Shell | ||
|
|
||
| return Shell | ||
| raise AttributeError(f'module {__name__!r} has no attribute {name!r}') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| """Filesystem capability: gives agents configurable, sandboxed file system access.""" | ||
|
|
||
| from pydantic_ai_harness.filesystem._capability import FileSystem | ||
| from pydantic_ai_harness.filesystem._toolset import FileSystemToolset | ||
|
|
||
| __all__ = ['FileSystem', 'FileSystemToolset'] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| """Filesystem capability that provides sandboxed file system access.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from collections.abc import Sequence | ||
| from dataclasses import dataclass, field | ||
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
| from pydantic_ai.capabilities import AbstractCapability | ||
| from pydantic_ai.toolsets import AgentToolset | ||
|
|
||
| from pydantic_ai_harness.filesystem._toolset import FileSystemToolset | ||
|
|
||
| _DEFAULT_PROTECTED: list[str] = [ | ||
| '.git/*', | ||
| '.env', | ||
| '.env.*', | ||
| '*.pem', | ||
| '*.key', | ||
| '**/secrets*', | ||
| ] | ||
|
|
||
|
|
||
| @dataclass | ||
| class FileSystem(AbstractCapability[Any]): | ||
| """File system access scoped to a root directory. | ||
|
|
||
| All paths are resolved relative to `root_dir`. Traversal above the root | ||
| is rejected. Symlinks are resolved before authorization. | ||
| """ | ||
|
|
||
| root_dir: str | Path = '.' | ||
| """Root directory for all file operations. Defaults to the current directory.""" | ||
|
|
||
| allowed_patterns: Sequence[str] = field(default_factory=list[str]) | ||
| """If non-empty, only paths matching at least one glob pattern are accessible.""" | ||
|
|
||
| denied_patterns: Sequence[str] = field(default_factory=list[str]) | ||
| """Paths matching any of these glob patterns are rejected.""" | ||
|
|
||
| protected_patterns: Sequence[str] = field(default_factory=lambda: list(_DEFAULT_PROTECTED)) | ||
| """Paths matching these patterns are read-only (writes are rejected). | ||
|
|
||
| Defaults to protecting `.git/`, `.env`, key files, and secrets. | ||
| Set to an empty list to disable protection. | ||
| """ | ||
|
|
||
| max_read_lines: int = 2000 | ||
| """Maximum number of lines returned by a single `read_file` call.""" | ||
|
|
||
| max_search_results: int = 1000 | ||
| """Maximum number of matches returned by `search_files`.""" | ||
|
|
||
| max_find_results: int = 1000 | ||
| """Maximum number of matches returned by `find_files`.""" | ||
|
|
||
| def __post_init__(self) -> None: | ||
| for name in ('max_read_lines', 'max_search_results', 'max_find_results'): | ||
| value = getattr(self, name) | ||
|
strawgate marked this conversation as resolved.
Outdated
|
||
| if not isinstance(value, int) or value <= 0: | ||
| raise ValueError(f'{name} must be a positive integer, got {value!r}') | ||
|
|
||
| def get_toolset(self) -> AgentToolset[Any]: | ||
| """Build and return the filesystem toolset.""" | ||
| return FileSystemToolset( | ||
| root_dir=Path(self.root_dir), | ||
| allowed_patterns=self.allowed_patterns, | ||
| denied_patterns=self.denied_patterns, | ||
| protected_patterns=self.protected_patterns, | ||
| max_read_lines=self.max_read_lines, | ||
| max_search_results=self.max_search_results, | ||
| max_find_results=self.max_find_results, | ||
| ) | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.