Guard PublishClient.recv against torn-down stream socket (#66435)#69479
Open
dwoz wants to merge 2 commits into
Open
Guard PublishClient.recv against torn-down stream socket (#66435)#69479dwoz wants to merge 2 commits into
dwoz wants to merge 2 commits into
Conversation
PublishClient.recv(timeout=0) was passing self._stream.socket straight
to selectors.DefaultSelector().register() without checking whether the
IOStream's underlying socket had been concurrently torn down. Tornado
sets IOStream.socket to None once the stream is closed, and the
non-blocking peek would then raise
TypeError: argument must be an int, or have a fileno() method.
(or, after the fd>1023 cleanup in saltstack#68136, a ValueError from the
selectors backend) escaping all the way out to the salt CLI, breaking
every salt and salt-master invocation on hosts where the
publisher-side stream closed underneath the client.
Treat a missing socket as "no events pending" and return None so the
caller re-enters its connect/reconnect loop instead of crashing.
Fixes saltstack#66435
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds a small
None-check insalt.transport.tcp.PublishClient.recv(timeout=0)so that a stream whose underlying socket has been concurrently torn down no longer crashes the caller. The non-blocking peek now treats a missing socket as "no events pending" and returnsNone, letting the existing reconnect loop take over.What issues does this PR fix or reference?
Fixes #66435
Related (different bugs in the same module, do not fix this one):
EventListenercallback typefd > 1023select.select()failure (already merged; replacedselect.select()withselectors.DefaultSelector()but did not add theNone-socket guard)Previous Behavior
Under load — and reliably on FreeBSD 14 with the 3007.x packages and on RHEL 9.2 / Debian 12 with
ipc_mode: ipc— everysaltandsalt-masterinvocation crashed out ofPublishClient.recvwith one of:(in 3007.0, from
select.select([self._stream.socket], [], [], 0)) or, after #68136 swapped inselectors.DefaultSelector,from
selectors._fileobj_to_fd. In both cases the root cause is the same: between thewhile self._stream is None: await self.connect()check at the top ofrecv()and the selector peek a few lines later, the TornadoIOStreamfor the publish IPC socket can be closed by another task. Tornado setsIOStream.sockettoNoneon close, so the peek tries to registerNonewith the selector and dies with an unhandled exception. The error escaped all the way throughsalt.utils.asynchronous.SyncWrapperto the salt CLI, breaking every command.New Behavior
recv(timeout=0)snapshotsself._stream.socketonce. If it'sNone, the method returnsNoneimmediately — the same return value the caller already handles when no events are pending — and the existing reconnect path takes over without crashing.A regression test (
tests/pytests/unit/transport/test_publish_client.py::test_recv_timeout_zero_stream_socket_none) constructs aPublishClient, sets its_streamto a mock whose.socketisNone, and assertsrecv(timeout=0)returnsNonewithout raising. It fails on unmodified3007.xwithValueError: Invalid file object: Noneand passes with the fix.3008.x and master are unaffected: that recv path has already been rewritten to use
asyncio.ensure_future(self._read_into_unpacker())instead of a kernel-level socket peek, so no merge-forward port is needed beyond what the merge bots will do.Merge requirements satisfied?
changelog/66435.fixed.md)tests/pytests/unit/transport/test_publish_client.py)Commits signed with GPG?
No (matches surrounding non-merge commits on this branch).