fix(sandbox): make interactive connect resilient on stopped/resumed sandboxes#215
Draft
marc-vercel wants to merge 1 commit into
Draft
fix(sandbox): make interactive connect resilient on stopped/resumed sandboxes#215marc-vercel wants to merge 1 commit into
marc-vercel wants to merge 1 commit into
Conversation
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…andboxes `sandbox connect` could hang on "Waiting for connection..." or fail when run against a stopped/resumed sandbox. Three independent issues: - The CLI swallowed real `attach()` failures: once the connection handshake landed, the same abort signal used to stop the premature-exit check also discarded any later `attach()` error, so failures were never surfaced. - The spinner's disposer called `ora.clear()` instead of `stop()`, leaving the render interval running and keeping the event loop (and the CLI) alive indefinitely on teardown. - When the interactive server exited early, the generic error hid the actual cause; we now include the server's stderr. - The in-sandbox server (pty-tunnel-server) trusted a leftover /tmp/vercel/interactive/config.json restored from a snapshot whenever its recorded PID happened to be alive, connecting to a dead socket. It now health-checks a reused server and removes the stale config before spawning a fresh one. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
b04bc6b to
beb73c4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
sandbox connect(and interactivesandbox exec) could hang indefinitely onWaiting for connection..., or fail in a confusing way, when run against a stopped sandbox that has to be resumed. It worked reliably against an already-running sandbox, which is why it only showed up intermittently after a stop/resume.Several independent issues combined to produce this:
Real connection errors were swallowed. Once the connection handshake landed, the abort signal that stops the "did the command exit early?" check was also used to filter errors from
attach(). So any failure that happened after the handshake (for example, the resumed session not yet exposing a route for the interactive port) was silently discarded instead of surfaced.The spinner kept the process alive. The progress spinner's teardown called
ora.clear(), which only erases the current frame but leaves its render interval running. That timer keeps Node's event loop alive, so on any early teardown the CLI would sit forever on the spinner instead of exiting.Early server exits were opaque. When the in-sandbox interactive server exited before connecting, the CLI showed a generic "may have timed out" hint with no detail.
The in-sandbox server trusted a stale config.
pty-tunnel-serverdecided whether a server was already running purely from a leftover config file and a liveness check on its recorded PID. Across a snapshot/resume that config is restored from the snapshot while the original process is gone, so a coincidentally-reused PID made it connect to a dead socket and exit.Solution
attach()through the connection-established abort filter, so genuine connection failures propagate instead of being swallowed.stop()the spinner on teardown (not justclear()), so a failure before the connection is established can no longer hang the process.pty-tunnel-serverhealth-check a server before reusing it, and remove any leftover config before spawning a new one, so a stale config restored from a snapshot can no longer cause a connection to a dead socket.Together these turn the previous silent hang into either a working connection or a fast, legible error.
🤖 Generated with Claude Code