-
Notifications
You must be signed in to change notification settings - Fork 1.6k
networking-calico: diagnostic instrumentation for fairy-GC-in-hub race #13022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
nelljerram
merged 6 commits into
projectcalico:master
from
nelljerram:fairy-gc-diagnostics
Jun 23, 2026
+266
−0
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
1cd3658
networking-calico: diagnostic instrumentation for fairy-GC-in-hub race
nelljerram 4522859
fairy_gc_diagnostics: hoist eventlet import out of the listener
nelljerram d3af882
Remove stored stack in check-in listener
nelljerram b1a57aa
fairy_gc_diagnostics: tighten comments on fork-inheritance behaviour
nelljerram 644fccd
Only install diagnostics when time has actually been monkey-patched
nelljerram b958ba2
fairy_gc_diagnostics: make skip idempotent and bound the stack captures
nelljerram File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
225 changes: 225 additions & 0 deletions
225
networking-calico/networking_calico/plugins/ml2/drivers/calico/fairy_gc_diagnostics.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,225 @@ | ||
| # -*- coding: utf-8 -*- | ||
| # Copyright (c) 2026 Tigera, Inc. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or | ||
| # implied. See the License for the specific language governing | ||
| # permissions and limitations under the License. | ||
|
|
||
| """Diagnostics for the SQLAlchemy-fairy GC-in-eventlet-hub race. | ||
|
|
||
| The race | ||
| -------- | ||
| We have sometimes been seeing this exception in journalctl.txt / neutron-server.log: | ||
|
|
||
| Exception ignored in: <function _ConnectionRecord.checkout.<locals>.<lambda> | ||
| at 0x72810792e290> | ||
| Traceback (most recent call last): | ||
| File ".../site-packages/sqlalchemy/pool/base.py", line 509, in <lambda> | ||
| and _finalize_fairy( | ||
| File ".../site-packages/sqlalchemy/pool/base.py", line 800, in _finalize_fairy | ||
| connection_record.checkin() | ||
| File ".../site-packages/sqlalchemy/pool/base.py", line 544, in checkin | ||
| pool.dispatch.checkin(connection, self) | ||
| File ".../site-packages/sqlalchemy/event/attr.py", line 346, in __call__ | ||
| fn(*args, **kw) | ||
| File ".../site-packages/oslo_db/sqlalchemy/engines.py", line 52, in _thread_yield | ||
| time.sleep(0) | ||
| File ".../site-packages/eventlet/greenthread.py", line 37, in sleep | ||
| hub.switch() | ||
| File ".../site-packages/eventlet/hubs/hub.py", line 310, in switch | ||
| return self.greenlet.switch() | ||
| TimeoutError: timed out | ||
|
|
||
| This occurs when a Neutron DB context uses a session (for some reason) and then leaks | ||
| it, and GC kicks in on eventlet's "hub" greenlet. An sqlalchemy connection fairy that | ||
| is GC'd while the eventlet hub greenlet is the current greenlet triggers oslo.db's | ||
| ``_thread_yield`` checkin listener to call ``time.sleep(0)`` -> ``hub.switch()``, which | ||
| deadlocks because the hub cannot switch to itself. The ``TimeoutError`` that eventually | ||
| fires is silently swallowed by Python's "Exception ignored in" finalizer-exception | ||
| mechanism, but each occurrence wedges the hub for ~10s and leaves the connection | ||
| record's pool state indeterminate. | ||
|
|
||
| I _think_ #13015 fixes the primary root cause of this, by adding a transaction wrapper | ||
| around raw `context.session.query` calls. The wrapper properly closes the session after | ||
| those calls, instead of leaking it to GC. However, in case there are any remaining | ||
| cases, e.g. because a Neutron-framework path outside our control drops a session | ||
| unclosed, or because a future change reintroduces a code path that bypasses the | ||
| ``using`` pattern -- we want a log line that points at the leaking code path rather than | ||
| just the in-hub finalizer stack that the existing "Exception ignored in" trace gives us. | ||
|
|
||
| Low-level detail | ||
| ---------------- | ||
| An SQLAlchemy connection ``_ConnectionFairy`` is the user-facing wrapper returned by | ||
| ``Pool.connect()`` (or, transitively, by ``Session.connection()`` when a session | ||
| executes its first SQL). Every fairy has a weakref finalizer (the ``<lambda>`` | ||
| registered inside ``_ConnectionRecord.checkout``) that, if the fairy is GC'd without | ||
| explicit checkin, calls ``connection_record.checkin()`` -- which dispatches the | ||
| SQLAlchemy ``checkin`` event. oslo.db registers ``_thread_yield`` on that event, which | ||
| calls ``time.sleep(0)``. | ||
|
|
||
| Under eventlet, ``time.sleep(0)`` is monkey-patched to ``hub.switch()``. If we're | ||
| already executing in the hub greenlet at the moment of GC (possible whenever GC fires | ||
| during the hub's own bookkeeping), ``hub.switch()`` from the hub to itself fails to find | ||
| a target and times out as ``TimeoutError``. Python's "Exception ignored in" mechanism | ||
| swallows the error, but the hub has been wedged for ~10s and the connection record's | ||
| state is indeterminate. | ||
|
|
||
| In current code (after PR 13015), all of our DB access is inside | ||
| ``db_api.CONTEXT_*.using`` blocks, which close the session deterministically at block | ||
| exit (oslo.db's ``_TransactionContext._session`` finally-block calls | ||
| ``session.close()``) and so never leave a fairy for GC. In principle that should be | ||
| enough. This module is the safety net for the case where it isn't -- e.g. if a | ||
| Neutron-framework path outside our control still drops a session unclosed, or if we add | ||
| a code path later that bypasses the ``using`` pattern. If the race fires again the | ||
| diagnostic output identifies which code path checked out the leaking connection, so a | ||
| targeted fix can follow. | ||
|
|
||
| What this module does | ||
| --------------------- | ||
| Two SQLAlchemy event listeners, installed on ``sqlalchemy.pool.Pool`` (so all engines in | ||
| the process inherit them): | ||
|
|
||
| 1. A ``checkout`` listener that captures a stack trace at the point the connection was | ||
| checked out, stashing it on ``connection_record.info["calico_checkout_stack"]``. | ||
| This is how we attribute a later GC to a specific code path: when the fairy is | ||
| eventually GC'd, the same connection record is used for checkin, and we can read the | ||
| stored stack to see *who* originally asked for this connection. | ||
|
|
||
| 2. A ``checkin`` listener with ``insert=True`` (so it runs *before* oslo.db's | ||
| ``_thread_yield``) that: | ||
|
|
||
| * Checks whether we're currently running in the eventlet hub greenlet. | ||
|
|
||
| * If yes -- i.e. we're about to hit the deadlock -- logs a WARNING containing both | ||
| the checkout-time stack and the current finalizer stack. | ||
|
|
||
| It does *not* attempt to suppress oslo.db's listener; the ``TimeoutError`` will still | ||
| fire and be silently caught. This keeps the module purely diagnostic (no behavioural | ||
| change), so it can be left on indefinitely. | ||
|
|
||
| Cost | ||
| ---- | ||
| Capturing a stack trace per pool checkout is not free. At rough benchmark depth (~30 | ||
| frames, plain Neutron handler chain) it adds ~50-100us per checkout. Under a heavy | ||
| scale workload this adds up to a small number of seconds of total overhead -- noticeable | ||
| but not crippling. The module is therefore default-off; enable via: | ||
|
|
||
| [calico] | ||
| fairy_gc_diagnostics = True | ||
|
|
||
| in neutron.conf when running workloads where the race is suspected. | ||
| """ | ||
|
|
||
| import traceback | ||
|
|
||
| from oslo_log import log | ||
| from sqlalchemy import event | ||
| from sqlalchemy.pool import Pool | ||
|
|
||
|
|
||
| LOG = log.getLogger(__name__) | ||
|
|
||
|
|
||
| # Module-level flag so install() runs at most once per process: subsequent calls are | ||
| # no-ops regardless of whether the first call actually registered listeners or bailed | ||
| # out early. The flag means "decision made", not "listeners active" -- without this, | ||
| # repeat calls would re-emit the early-return WARNING each time, contradicting the | ||
| # docstring's "single WARNING" promise. | ||
| _INSTALL_ATTEMPTED = False | ||
|
|
||
|
|
||
| def install(): | ||
| """Idempotently install the fairy-GC diagnostic listeners. | ||
|
|
||
| Safe to call multiple times (subsequent calls in the same process are no-ops). | ||
| Calling install() in the parent before workers are forked is sufficient: the | ||
| SQLAlchemy event listeners attach to the ``Pool`` class, so each forked worker | ||
| inherits them as part of its post-fork memory image and does not need to | ||
| re-install. | ||
|
|
||
| If eventlet is not importable -- e.g. after the planned eventlet removal in | ||
| neutron-server -- this is a no-op with a single WARNING log line. The race the | ||
| listeners detect only fires under eventlet, so there is nothing useful to install in | ||
| that case, and we deliberately do not want a per-checkin ImportError-driven log | ||
| storm. | ||
| """ | ||
| global _INSTALL_ATTEMPTED | ||
| if _INSTALL_ATTEMPTED: | ||
| return | ||
| # Commit to the decision now so any early return below still counts as | ||
| # "already attempted" and we do not re-log the early-return WARNING on | ||
| # subsequent calls. | ||
| _INSTALL_ATTEMPTED = True | ||
|
|
||
| try: | ||
| import eventlet.greenthread | ||
| import eventlet.hubs | ||
| import eventlet.patcher | ||
| except ImportError: | ||
| LOG.warning( | ||
| "Calico fairy-GC diagnostics: eventlet not importable, not " | ||
| "installing listeners. The race this module detects only " | ||
| "fires under eventlet." | ||
| ) | ||
| return | ||
|
|
||
| if not eventlet.patcher.is_monkey_patched("time"): | ||
| LOG.warning( | ||
| "Calico fairy-GC diagnostics: eventlet is importable but time is not " | ||
| "monkey-patched; not installing listeners. The race this module detects " | ||
| "requires eventlet monkey-patching." | ||
| ) | ||
| return | ||
|
|
||
| @event.listens_for(Pool, "checkout") | ||
| def _calico_track_checkout(dbapi_conn, connection_record, connection_proxy): | ||
| # Capture the call stack at the moment of checkout, so a later finalizer-fired | ||
| # checkin can report where the connection was originally given out. Format | ||
| # eagerly (rather than storing a ``sys._getframe()`` reference) because the | ||
| # frames need to outlive the current call -- the connection may sit in the pool | ||
| # for a long time before being GC'd, and lazy formatting would race the frames | ||
| # being collected themselves. | ||
| # | ||
| # Bounded at the most-recent 50 frames: a Neutron handler chain is typically | ||
| # ~30 frames, so 50 covers the full path through the driver in normal cases | ||
| # and caps the worst case (e.g. a pathological deep-recursion middleware) so | ||
| # we do not burn unbounded CPU per checkout or stash unbounded strings on | ||
| # long-lived ``connection_record.info`` entries. | ||
| connection_record.info["calico_checkout_stack"] = "".join( | ||
| traceback.format_stack(limit=50) | ||
| ) | ||
|
nelljerram marked this conversation as resolved.
|
||
|
|
||
| @event.listens_for(Pool, "checkin", insert=True) | ||
| def _calico_detect_in_hub_checkin(dbapi_conn, connection_record): | ||
| # ``insert=True`` prepends this listener so it runs before oslo.db's | ||
| # ``_thread_yield``. If we're in the hub greenlet, ``_thread_yield`` is about | ||
| # to call ``time.sleep(0)`` -> ``hub.switch()`` and deadlock. Log enough | ||
| # context to identify the originating code path before that happens. | ||
| creating_stack = connection_record.info.pop( | ||
| "calico_checkout_stack", "<not captured>" | ||
| ) | ||
| hub = eventlet.hubs.get_hub() | ||
| if eventlet.greenthread.getcurrent() is hub.greenlet: | ||
| LOG.warning( | ||
| "Calico fairy-GC diagnostic: connection checkin firing " | ||
| "in eventlet hub greenlet -- oslo.db's _thread_yield is " | ||
| "about to call time.sleep(0), which will deadlock as " | ||
| "TimeoutError.\n" | ||
| " Connection was checked out at:\n%s\n" | ||
| " Current finalizer stack:\n%s", | ||
| creating_stack, | ||
| # Finalizer stack is typically shallower than the checkout-time | ||
| # stack (just the GC / weakref-finalizer path into us), so 30 | ||
| # frames is plenty. | ||
| "".join(traceback.format_stack(limit=30)), | ||
| ) | ||
|
|
||
| LOG.info("Calico fairy-GC diagnostics installed") | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.