Skip to content

Add host-backed OS and filesystem access to CodeMode sandbox#262

Open
adtyavrdhn wants to merge 13 commits into
mainfrom
claude/codemode-os-function-GK8ew
Open

Add host-backed OS and filesystem access to CodeMode sandbox#262
adtyavrdhn wants to merge 13 commits into
mainfrom
claude/codemode-os-function-GK8ew

Conversation

@adtyavrdhn
Copy link
Copy Markdown
Member

Summary

Extends CodeMode to support host-backed OS and filesystem access inside the sandbox via two new parameters:

  • os: accepts a pydantic_monty.AbstractOS instance or a raw callback (function_name, args, kwargs) -> result to handle OS calls like os.getenv(), datetime.now(), and pathlib.Path operations
  • mount: accepts one or more pydantic_monty.MountDir to expose host directories inside the sandbox

When either is configured, the run_code tool description is updated to reflect that these operations are host-backed instead of unavailable. The os/mount parameters are threaded through every resume() call in the execution loop to ensure OS dispatch persists across tool call suspensions and REPL state reuse.

Linked Issue

Fixes #

Changes

Core implementation (_toolset.py)

  • Added MontyOS, MontyOSCallback, and MontyMount type aliases for clarity
  • Split _RUN_CODE_BASE_DESCRIPTION into head/tail with swappable restriction lines (_NO_OS_RESTRICTION vs _OS_ENABLED_NOTE)
  • Added _base_description() helper to assemble descriptions based on OS enablement
  • Extended CodeModeToolset with os and mount fields
  • Updated _build_description() to accept os_enabled parameter and use dynamic base description
  • Modified call_tool() to pass os/mount to feed_start()
  • Updated _execution_loop() and all snapshot handlers (_handle_function_snapshot, _resolve_future_snapshot) to accept and forward os/mount to every resume() call

Capability layer (_capability.py)

  • Added os and mount parameters to CodeMode class with docstrings
  • Updated get_wrapper_toolset() to forward these parameters to CodeModeToolset

Public API (__init__.py)

  • Exported MontyOS, MontyOSCallback, and MontyMount type aliases

Documentation (README.md)

  • Added "Filesystem and OS access" section with examples of MountDir, OSAccess, and raw callbacks
  • Updated "Sandbox restrictions" to clarify that filesystem/clock operations are unavailable by default but become available with host-backed access
  • Updated API reference to document new os and mount parameters

Tests (test_code_mode.py)

  • Added TestCodeModeOSAccess class with 10 comprehensive tests covering:
    • Description changes when OS access is enabled
    • OS callback dispatch through tool call suspensions and REPL reuse
    • AbstractOS instance dispatch
    • Exception handling in OS callbacks (surfaces as ModelRetry)
    • Single and multiple directory mounts
    • Parameter forwarding from CodeMode to CodeModeToolset

Checklist

  • Linked issue exists and is referenced above
  • Tests added/updated for new behavior (10 new tests in TestCodeModeOSAccess)
  • make lint && make typecheck && make test passes locally
  • No changes to pyproject.toml or uv.lock
  • Docstrings use single backticks

https://claude.ai/code/session_0177ZDbsCTVRxMnE1yNJ5Fb5

claude added 4 commits May 29, 2026 10:46
Sandboxed `run_code` had no way to reach the filesystem, environment, or
wall clock: Monty supports it through an OS callback / `AbstractOS` and
directory mounts, but `CodeMode` never threaded `os`/`mount` into
`feed_start` or the snapshot resume loop, so callers couldn't enable it.

Add `os` and `mount` options on `CodeMode`/`CodeModeToolset`, thread them
through `feed_start` and every `resume` site (OS auto-dispatch stops the
moment a resume omits them), and make the `run_code` description reflect
whether host-backed access is configured.
Add edge cases that pin the behaviours most likely to regress: OS access
surviving across REPL-persisted `run_code` calls, a raising `os` callback
degrading to `ModelRetry` instead of crashing the loop, and `mount`
accepting a `list[MountDir]`. Hoist the never-invoked callback used by the
description/forwarding assertions into one shared helper.
Trim the host-access docs to the essentials and make the example
self-contained (drop the undefined helper). The snippet and the documented
`mount`/callback constructions are run end-to-end to confirm they work.
`os`/`mount` are static capability fields (no per-run resolver), so the
"stateful AbstractOS rooted at a per-user directory" guidance over-claimed.
Reword to: build CodeMode per request to scope access. Every other doc line
was re-checked empirically against pydantic-monty 0.0.17.
chatgpt-codex-connector[bot]

This comment was marked as resolved.

claude added 2 commits May 30, 2026 06:50
A `mount` only exposes filesystem paths; `os.getenv`/`os.environ` and
`datetime.now()`/`date.today()` still require an `os` handler. The
description used one host-access note for both, so mount-only agents were
told env/clock were routed to the host and would emit calls that fail and
burn run_code retries (verified against pydantic-monty 0.0.17).

Split the description into three states (none / mount-only filesystem / os),
and correct the README and docstrings that conflated the two.
…inst monty

Audited every statement in the run_code description, docstrings, and README
against pydantic-monty 0.0.17. Two were imprecise:
- "imported at the top of your snippet" -- mid-snippet imports work, so the
  rule is just "before use".
- OS-enabled note said calls route "to the host environment", but an
  in-memory AbstractOS (e.g. OSAccess) handles them too -- it's the
  configured OS handler, not necessarily the host.
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

adtyavrdhn and others added 4 commits June 2, 2026 18:49
The mount docs implied writes reach the host, but MountDir defaults to
copy-on-write overlay mode, so writes stay in the sandbox unless mode is
'read-write'. Also tighten two awkward/redundant doc lines.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tral

The public type aliases leaked the Monty backend name into a surface we
can't rename later. Rename them to match the existing CodeMode/CodeModeToolset
convention, and rename the os= parameter to os_access= so it stops shadowing
the stdlib os module that sandboxed code itself uses.

- MontyOS -> CodeModeOS, MontyOSCallback -> CodeModeOSCallback, MontyMount -> CodeModeMount
- CodeMode/CodeModeToolset param os= -> os_access= (mount unchanged)
- internal resume()/feed_start() forwarding keeps Monty's literal os= kwarg

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The OS/mount threading named its parameter `os`, shadowing the stdlib
module inside the execution-loop helpers. Rename the variable to
`os_access` (matching the public field) while keeping Monty's required
`os=` keyword only at the resume/feed_start call sites. Also inline the
single-use restriction-line helper into `_base_description`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The option list keeps growing; pin tools/max_retries as the only
positional args and force os_access/mount (and future config) to be
passed by name via a KW_ONLY sentinel, so adding options can't silently
shift positional meaning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread pydantic_ai_harness/code_mode/_capability.py Outdated
Comment thread pydantic_ai_harness/code_mode/_capability.py Outdated
@adtyavrdhn adtyavrdhn requested a review from dsfaccini June 2, 2026 15:27
Public docs should let a reader grasp the host-access surface without
reverse-engineering it. Reframe the docstrings and README around when to
reach for each primitive instead of what is switched off, drop the
type-restating prose the annotations already carry, and lead with concrete
tasks (share a dataset; inject just the secrets the agent needs).

Tighten the os-access test sweep so each test asserts exactly its invariant:
drop redundant negative description asserts (one note is interpolated, so the
positive phrase alone proves selection), drop an assertion already owned by
another test, and type the tmp_path fixtures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adtyavrdhn adtyavrdhn force-pushed the claude/codemode-os-function-GK8ew branch from c931840 to 465e8d6 Compare June 3, 2026 06:33
```python
from pydantic_monty import MountDir
agent = Agent('openai:gpt-5', capabilities=[CodeMode(mount=MountDir('/work', '/tmp/agent-work'))])
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have the latest model

adtyavrdhn and others added 2 commits June 3, 2026 12:53
The raw-callback example claimed non-allow-listed keys "stay hidden" by
returning NOT_HANDLED. Verified against Monty: NOT_HANDLED *refuses* the
call (raises in the sandbox -> model retry), it does not return None. A
model probing for an optional secret would crash and burn retries.

Distinguish the two return modes explicitly so users don't pick the
wrong one: return a value (incl. None) to answer/hide, NOT_HANDLED to
refuse a capability outright.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Returning a value (including None) from an os_access callback answers
the call -- a None reads back like an unset env var, so the sandbox
keeps running. Returning NOT_HANDLED refuses the call, raising in the
sandbox and surfacing as ModelRetry. These two paths are easy to
confuse and silently regress, so pin both.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants