Add host-backed OS and filesystem access to CodeMode sandbox#262
Open
adtyavrdhn wants to merge 13 commits into
Open
Add host-backed OS and filesystem access to CodeMode sandbox#262adtyavrdhn wants to merge 13 commits into
adtyavrdhn wants to merge 13 commits into
Conversation
Sandboxed `run_code` had no way to reach the filesystem, environment, or wall clock: Monty supports it through an OS callback / `AbstractOS` and directory mounts, but `CodeMode` never threaded `os`/`mount` into `feed_start` or the snapshot resume loop, so callers couldn't enable it. Add `os` and `mount` options on `CodeMode`/`CodeModeToolset`, thread them through `feed_start` and every `resume` site (OS auto-dispatch stops the moment a resume omits them), and make the `run_code` description reflect whether host-backed access is configured.
Add edge cases that pin the behaviours most likely to regress: OS access surviving across REPL-persisted `run_code` calls, a raising `os` callback degrading to `ModelRetry` instead of crashing the loop, and `mount` accepting a `list[MountDir]`. Hoist the never-invoked callback used by the description/forwarding assertions into one shared helper.
Trim the host-access docs to the essentials and make the example self-contained (drop the undefined helper). The snippet and the documented `mount`/callback constructions are run end-to-end to confirm they work.
`os`/`mount` are static capability fields (no per-run resolver), so the "stateful AbstractOS rooted at a per-user directory" guidance over-claimed. Reword to: build CodeMode per request to scope access. Every other doc line was re-checked empirically against pydantic-monty 0.0.17.
A `mount` only exposes filesystem paths; `os.getenv`/`os.environ` and `datetime.now()`/`date.today()` still require an `os` handler. The description used one host-access note for both, so mount-only agents were told env/clock were routed to the host and would emit calls that fail and burn run_code retries (verified against pydantic-monty 0.0.17). Split the description into three states (none / mount-only filesystem / os), and correct the README and docstrings that conflated the two.
…inst monty Audited every statement in the run_code description, docstrings, and README against pydantic-monty 0.0.17. Two were imprecise: - "imported at the top of your snippet" -- mid-snippet imports work, so the rule is just "before use". - OS-enabled note said calls route "to the host environment", but an in-memory AbstractOS (e.g. OSAccess) handles them too -- it's the configured OS handler, not necessarily the host.
This was referenced Jun 1, 2026
Master issue: ship the execution-environment capability (gh-aw shim parity + pluggable backend)
#265
Open
The mount docs implied writes reach the host, but MountDir defaults to copy-on-write overlay mode, so writes stay in the sandbox unless mode is 'read-write'. Also tighten two awkward/redundant doc lines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tral The public type aliases leaked the Monty backend name into a surface we can't rename later. Rename them to match the existing CodeMode/CodeModeToolset convention, and rename the os= parameter to os_access= so it stops shadowing the stdlib os module that sandboxed code itself uses. - MontyOS -> CodeModeOS, MontyOSCallback -> CodeModeOSCallback, MontyMount -> CodeModeMount - CodeMode/CodeModeToolset param os= -> os_access= (mount unchanged) - internal resume()/feed_start() forwarding keeps Monty's literal os= kwarg Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The OS/mount threading named its parameter `os`, shadowing the stdlib module inside the execution-loop helpers. Rename the variable to `os_access` (matching the public field) while keeping Monty's required `os=` keyword only at the resume/feed_start call sites. Also inline the single-use restriction-line helper into `_base_description`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The option list keeps growing; pin tools/max_retries as the only positional args and force os_access/mount (and future config) to be passed by name via a KW_ONLY sentinel, so adding options can't silently shift positional meaning. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
adtyavrdhn
commented
Jun 2, 2026
adtyavrdhn
commented
Jun 2, 2026
Public docs should let a reader grasp the host-access surface without reverse-engineering it. Reframe the docstrings and README around when to reach for each primitive instead of what is switched off, drop the type-restating prose the annotations already carry, and lead with concrete tasks (share a dataset; inject just the secrets the agent needs). Tighten the os-access test sweep so each test asserts exactly its invariant: drop redundant negative description asserts (one note is interpolated, so the positive phrase alone proves selection), drop an assertion already owned by another test, and type the tmp_path fixtures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c931840 to
465e8d6
Compare
adtyavrdhn
commented
Jun 3, 2026
| ```python | ||
| from pydantic_monty import MountDir | ||
| agent = Agent('openai:gpt-5', capabilities=[CodeMode(mount=MountDir('/work', '/tmp/agent-work'))]) |
Member
Author
There was a problem hiding this comment.
Should have the latest model
The raw-callback example claimed non-allow-listed keys "stay hidden" by returning NOT_HANDLED. Verified against Monty: NOT_HANDLED *refuses* the call (raises in the sandbox -> model retry), it does not return None. A model probing for an optional secret would crash and burn retries. Distinguish the two return modes explicitly so users don't pick the wrong one: return a value (incl. None) to answer/hide, NOT_HANDLED to refuse a capability outright. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Returning a value (including None) from an os_access callback answers the call -- a None reads back like an unset env var, so the sandbox keeps running. Returning NOT_HANDLED refuses the call, raising in the sandbox and surfacing as ModelRetry. These two paths are easy to confuse and silently regress, so pin both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends
CodeModeto support host-backed OS and filesystem access inside the sandbox via two new parameters:os: accepts apydantic_monty.AbstractOSinstance or a raw callback(function_name, args, kwargs) -> resultto handle OS calls likeos.getenv(),datetime.now(), andpathlib.Pathoperationsmount: accepts one or morepydantic_monty.MountDirto expose host directories inside the sandboxWhen either is configured, the
run_codetool description is updated to reflect that these operations are host-backed instead of unavailable. Theos/mountparameters are threaded through everyresume()call in the execution loop to ensure OS dispatch persists across tool call suspensions and REPL state reuse.Linked Issue
Fixes #
Changes
Core implementation (
_toolset.py)MontyOS,MontyOSCallback, andMontyMounttype aliases for clarity_RUN_CODE_BASE_DESCRIPTIONinto head/tail with swappable restriction lines (_NO_OS_RESTRICTIONvs_OS_ENABLED_NOTE)_base_description()helper to assemble descriptions based on OS enablementCodeModeToolsetwithosandmountfields_build_description()to acceptos_enabledparameter and use dynamic base descriptioncall_tool()to passos/mounttofeed_start()_execution_loop()and all snapshot handlers (_handle_function_snapshot,_resolve_future_snapshot) to accept and forwardos/mountto everyresume()callCapability layer (
_capability.py)osandmountparameters toCodeModeclass with docstringsget_wrapper_toolset()to forward these parameters toCodeModeToolsetPublic API (
__init__.py)MontyOS,MontyOSCallback, andMontyMounttype aliasesDocumentation (
README.md)MountDir,OSAccess, and raw callbacksosandmountparametersTests (
test_code_mode.py)TestCodeModeOSAccessclass with 10 comprehensive tests covering:AbstractOSinstance dispatchModelRetry)CodeModetoCodeModeToolsetChecklist
TestCodeModeOSAccess)make lint && make typecheck && make testpasses locallypyproject.tomloruv.lockhttps://claude.ai/code/session_0177ZDbsCTVRxMnE1yNJ5Fb5