fix: balance temporary sandbox leases for uploads#3261
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR ensures non-local sandboxes are properly released after file upload / sync flows, and adds lease counting so repeated acquires of the same sandbox require matching releases.
Changes:
- Add sandbox release in
upload_files()and Feishu file-receive sync paths (including failure paths). - Introduce
_lease_countsinAioSandboxProviderto prevent a “temporary” acquire/release from tearing down another caller’s active sandbox. - Expand/adjust tests to validate release/lease behavior across success and failure scenarios.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/app/gateway/routers/uploads.py | Releases acquired sandboxes in more error/success paths during upload + sync. |
| backend/app/channels/feishu.py | Ensures sandbox acquired for file sync is released via finally. |
| backend/packages/harness/deerflow/community/aio_sandbox/aio_sandbox_provider.py | Adds lease counting to make sandbox lifecycle safe across multiple acquires. |
| backend/tests/test_uploads_router.py | Adds assertions/tests to verify sandbox release behavior for uploads. |
| backend/tests/test_feishu_parser.py | Adds tests to verify Feishu file receive releases sandboxes on success/failure. |
| backend/tests/test_aio_sandbox_provider.py | Adds test coverage for lease-counted acquire/release behavior. |
| backend/tests/test_sandbox_orphan_reconciliation.py | Updates provider test fixture initialization to include _lease_counts. |
|
@BXL1015, thanks for your contribution. Please create an issue to track the idea of this change, it looks more like a feature than a bug fix for me. |
|
Thanks for the guidance. Issue #3296 has been created. |
|
@BXL1015, here are more comments for the code
The Copilot review flagged this correctly. In _reuse_in_process_sandbox, the lease count increment happens outside self._lock: But The existing code already mutates _last_activity and reads _sandboxes without self._lock in this same method, so this may be intentionally lock-free (relying on the per-thread _thread_locks for serialization). However, the PR should clarify this intent, and if the per-thread lock already serializes access, a comment would help future readers.
When a sandbox_id isn't in _lease_counts but is in _sandboxes, the code assumes a count of 1 and tears down the sandbox. This is a reasonable fallback for legacy sandboxes created before this PR, but it means a missing _lease_counts entry is silently treated as "safe to release."
The _lease_counts_map() method lazily creates the dict for test-constructed providers. This is fine for backward compatibility, but every other dict (_sandboxes, _sandbox_infos) is initialized in init — the _lease_counts is too. This method adds complexity to a case that could be caught by a simpler assertion. Minor, but worth noting. |
What
Why
In remote/provisioner deployments, acquiring a sandbox may create or reuse a Kubernetes Pod plus NodePort Service. The upload route temporarily acquires the thread sandbox so it can sync uploaded files into non-mounted sandboxes, but the lease was not released on success, 413, or failure paths.
A simple unconditional
finally: release()is risky because the same thread may already have an active sandbox from the agent runtime. In that case, releasing the temporary upload acquire should only decrement that temporary lease, not park the actively used sandbox in the warm pool.How
AioSandboxProvider.release()decrement first and only move the sandbox to the warm pool when the final lease is released.Testing
uv run ruff check app/gateway/routers/uploads.py app/channels/feishu.py packages/harness/deerflow/community/aio_sandbox/aio_sandbox_provider.py tests/test_uploads_router.py tests/test_aio_sandbox_provider.py tests/test_sandbox_orphan_reconciliation.py tests/test_feishu_parser.pyuv run ruff format --check app/gateway/routers/uploads.py app/channels/feishu.py packages/harness/deerflow/community/aio_sandbox/aio_sandbox_provider.py tests/test_uploads_router.py tests/test_aio_sandbox_provider.py tests/test_sandbox_orphan_reconciliation.py tests/test_feishu_parser.pygit diff --checkuv run pytest tests/test_uploads_router.py::test_upload_files_acquires_non_local_sandbox_before_writing tests/test_uploads_router.py::test_upload_files_fails_before_writing_when_non_local_sandbox_unavailable tests/test_uploads_router.py::test_upload_files_releases_non_local_sandbox_when_get_fails tests/test_uploads_router.py::test_upload_files_does_not_sync_non_local_sandbox_when_total_size_exceeds_limit tests/test_uploads_router.py::test_upload_files_does_not_sync_non_local_sandbox_when_conversion_fails tests/test_uploads_router.py::test_upload_files_releases_non_local_sandbox_when_sync_fails tests/test_aio_sandbox_provider.py::test_reused_active_sandbox_requires_matching_releases tests/test_feishu_parser.py::test_feishu_receive_single_file_releases_sandbox_after_sync tests/test_feishu_parser.py::test_feishu_receive_single_file_releases_sandbox_when_sync_failsuv run pytest tests/test_uploads_router.py -k "not symlink"uv run pytest tests/test_aio_sandbox_provider.py tests/test_sandbox_orphan_reconciliation.py tests/test_feishu_parser.pyuv run pytest tests/test_sandbox_middleware.py