Add host-backed OS and filesystem access to CodeMode sandbox by adtyavrdhn · Pull Request #262 · pydantic/pydantic-ai-harness

adtyavrdhn · 2026-05-30T06:35:13Z

Summary

Extends CodeMode to support host-backed OS and filesystem access inside the sandbox via two new parameters:

os: accepts a pydantic_monty.AbstractOS instance or a raw callback (function_name, args, kwargs) -> result to handle OS calls like os.getenv(), datetime.now(), and pathlib.Path operations
mount: accepts one or more pydantic_monty.MountDir to expose host directories inside the sandbox

When either is configured, the run_code tool description is updated to reflect that these operations are host-backed instead of unavailable. The os/mount parameters are threaded through every resume() call in the execution loop to ensure OS dispatch persists across tool call suspensions and REPL state reuse.

Linked Issue

Fixes #

Changes

Core implementation (`_toolset.py`)

Added MontyOS, MontyOSCallback, and MontyMount type aliases for clarity
Split _RUN_CODE_BASE_DESCRIPTION into head/tail with swappable restriction lines (_NO_OS_RESTRICTION vs _OS_ENABLED_NOTE)
Added _base_description() helper to assemble descriptions based on OS enablement
Extended CodeModeToolset with os and mount fields
Updated _build_description() to accept os_enabled parameter and use dynamic base description
Modified call_tool() to pass os/mount to feed_start()
Updated _execution_loop() and all snapshot handlers (_handle_function_snapshot, _resolve_future_snapshot) to accept and forward os/mount to every resume() call

Capability layer (`_capability.py`)

Added os and mount parameters to CodeMode class with docstrings
Updated get_wrapper_toolset() to forward these parameters to CodeModeToolset

Public API (`init.py`)

Exported MontyOS, MontyOSCallback, and MontyMount type aliases

Documentation (`README.md`)

Added "Filesystem and OS access" section with examples of MountDir, OSAccess, and raw callbacks
Updated "Sandbox restrictions" to clarify that filesystem/clock operations are unavailable by default but become available with host-backed access
Updated API reference to document new os and mount parameters

Tests (`test_code_mode.py`)

Added TestCodeModeOSAccess class with 10 comprehensive tests covering:
- Description changes when OS access is enabled
- OS callback dispatch through tool call suspensions and REPL reuse
- AbstractOS instance dispatch
- Exception handling in OS callbacks (surfaces as ModelRetry)
- Single and multiple directory mounts
- Parameter forwarding from CodeMode to CodeModeToolset

Checklist

Linked issue exists and is referenced above
Tests added/updated for new behavior (10 new tests in TestCodeModeOSAccess)
make lint && make typecheck && make test passes locally
No changes to pyproject.toml or uv.lock
Docstrings use single backticks

https://claude.ai/code/session_0177ZDbsCTVRxMnE1yNJ5Fb5

Sandboxed `run_code` had no way to reach the filesystem, environment, or wall clock: Monty supports it through an OS callback / `AbstractOS` and directory mounts, but `CodeMode` never threaded `os`/`mount` into `feed_start` or the snapshot resume loop, so callers couldn't enable it. Add `os` and `mount` options on `CodeMode`/`CodeModeToolset`, thread them through `feed_start` and every `resume` site (OS auto-dispatch stops the moment a resume omits them), and make the `run_code` description reflect whether host-backed access is configured.

Add edge cases that pin the behaviours most likely to regress: OS access surviving across REPL-persisted `run_code` calls, a raising `os` callback degrading to `ModelRetry` instead of crashing the loop, and `mount` accepting a `list[MountDir]`. Hoist the never-invoked callback used by the description/forwarding assertions into one shared helper.

Trim the host-access docs to the essentials and make the example self-contained (drop the undefined helper). The snippet and the documented `mount`/callback constructions are run end-to-end to confirm they work.

`os`/`mount` are static capability fields (no per-run resolver), so the "stateful AbstractOS rooted at a per-user directory" guidance over-claimed. Reword to: build CodeMode per request to scope access. Every other doc line was re-checked empirically against pydantic-monty 0.0.17.

A `mount` only exposes filesystem paths; `os.getenv`/`os.environ` and `datetime.now()`/`date.today()` still require an `os` handler. The description used one host-access note for both, so mount-only agents were told env/clock were routed to the host and would emit calls that fail and burn run_code retries (verified against pydantic-monty 0.0.17). Split the description into three states (none / mount-only filesystem / os), and correct the README and docstrings that conflated the two.

…inst monty Audited every statement in the run_code description, docstrings, and README against pydantic-monty 0.0.17. Two were imprecise: - "imported at the top of your snippet" -- mid-snippet imports work, so the rule is just "before use". - OS-enabled note said calls route "to the host environment", but an in-memory AbstractOS (e.g. OSAccess) handles them too -- it's the configured OS handler, not necessarily the host.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

The mount docs implied writes reach the host, but MountDir defaults to copy-on-write overlay mode, so writes stay in the sandbox unless mode is 'read-write'. Also tighten two awkward/redundant doc lines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tral The public type aliases leaked the Monty backend name into a surface we can't rename later. Rename them to match the existing CodeMode/CodeModeToolset convention, and rename the os= parameter to os_access= so it stops shadowing the stdlib os module that sandboxed code itself uses. - MontyOS -> CodeModeOS, MontyOSCallback -> CodeModeOSCallback, MontyMount -> CodeModeMount - CodeMode/CodeModeToolset param os= -> os_access= (mount unchanged) - internal resume()/feed_start() forwarding keeps Monty's literal os= kwarg Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The OS/mount threading named its parameter `os`, shadowing the stdlib module inside the execution-loop helpers. Rename the variable to `os_access` (matching the public field) while keeping Monty's required `os=` keyword only at the resume/feed_start call sites. Also inline the single-use restriction-line helper into `_base_description`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The option list keeps growing; pin tools/max_retries as the only positional args and force os_access/mount (and future config) to be passed by name via a KW_ONLY sentinel, so adding options can't silently shift positional meaning. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Public docs should let a reader grasp the host-access surface without reverse-engineering it. Reframe the docstrings and README around when to reach for each primitive instead of what is switched off, drop the type-restating prose the annotations already carry, and lead with concrete tasks (share a dataset; inject just the secrets the agent needs). Tighten the os-access test sweep so each test asserts exactly its invariant: drop redundant negative description asserts (one note is interpolated, so the positive phrase alone proves selection), drop an assertion already owned by another test, and type the tmp_path fixtures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

adtyavrdhn · 2026-06-03T06:58:11Z

+    ```python
+    from pydantic_monty import MountDir
+
+    agent = Agent('openai:gpt-5', capabilities=[CodeMode(mount=MountDir('/work', '/tmp/agent-work'))])


Should have the latest model

The raw-callback example claimed non-allow-listed keys "stay hidden" by returning NOT_HANDLED. Verified against Monty: NOT_HANDLED *refuses* the call (raises in the sandbox -> model retry), it does not return None. A model probing for an optional secret would crash and burn retries. Distinguish the two return modes explicitly so users don't pick the wrong one: return a value (incl. None) to answer/hide, NOT_HANDLED to refuse a capability outright. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Returning a value (including None) from an os_access callback answers the call -- a None reads back like an unset env var, so the sandbox keeps running. Returning NOT_HANDLED refuses the call, raising in the sandbox and surfacing as ModelRetry. These two paths are easy to confuse and silently regress, so pin both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

claude added 4 commits May 29, 2026 10:46

docs(code_mode): tighten and verify the filesystem/OS access section

93e7b0a

Trim the host-access docs to the essentials and make the example self-contained (drop the undefined helper). The snippet and the documented `mount`/callback constructions are run end-to-end to confirm they work.

This comment was marked as resolved.

Sign in to view

claude added 2 commits May 30, 2026 06:50

devin-ai-integration Bot reviewed May 30, 2026

View reviewed changes

This was referenced Jun 1, 2026

Master issue: ship the execution-environment capability (gh-aw shim parity + pluggable backend) #265

Open

feat: execution environments (Local + Docker) #261

Closed

adtyavrdhn and others added 4 commits June 2, 2026 18:49

adtyavrdhn commented Jun 2, 2026

View reviewed changes

Comment thread pydantic_ai_harness/code_mode/_capability.py Outdated

adtyavrdhn commented Jun 2, 2026

View reviewed changes

Comment thread pydantic_ai_harness/code_mode/_capability.py Outdated

adtyavrdhn requested a review from dsfaccini June 2, 2026 15:27

adtyavrdhn force-pushed the claude/codemode-os-function-GK8ew branch from c931840 to 465e8d6 Compare June 3, 2026 06:33

adtyavrdhn commented Jun 3, 2026

View reviewed changes

adtyavrdhn and others added 2 commits June 3, 2026 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add host-backed OS and filesystem access to CodeMode sandbox#262

Add host-backed OS and filesystem access to CodeMode sandbox#262
adtyavrdhn wants to merge 13 commits into
mainfrom
claude/codemode-os-function-GK8ew

adtyavrdhn commented May 30, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Uh oh!

adtyavrdhn Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adtyavrdhn commented May 30, 2026

Summary

Linked Issue

Changes

Core implementation (_toolset.py)

Capability layer (_capability.py)

Public API (__init__.py)

Documentation (README.md)

Tests (test_code_mode.py)

Checklist

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Uh oh!

adtyavrdhn Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Core implementation (`_toolset.py`)

Capability layer (`_capability.py`)

Public API (`init.py`)

Documentation (`README.md`)

Tests (`test_code_mode.py`)