From 6992a84ec13460f0470d816968c0875b183b155a Mon Sep 17 00:00:00 2001 From: Nell Jerram Date: Wed, 17 Jun 2026 17:22:43 +0100 Subject: [PATCH 1/5] networking-calico: eliminate "SQL execution without transaction" warnings DevStack runs neutron-server with ``[DEFAULT] debug = True``, which enables an ORM-execute listener in ``neutron.objects.base`` that warns on every ``session.execute(...)`` where ``session.in_transaction()`` is False: ORM session: SQL execution without transaction in progress, traceback: ... It's a SQLAlchemy-2.0-compliance tripwire: it flags statements that are running in implicit-transaction mode (auto-begun and auto-discarded per statement) instead of the explicit reader/writer pattern. The warning is debug-only and doesn't affect correctness today, but it's real 2.0-readiness signal and very noisy in the journal during scale runs. After PR #12930 removed our outer ``CONTEXT_WRITER.using`` wrapper from the mech driver's postcommit body, every raw ``context.session.query(...)`` in ``endpoints.WorkloadEndpointSyncer.get_extra_port_information`` ran with no enginefacade-recognised transaction active, and so tripped the listener. The PR-#12930 lesson stands: we can't restore a single broad wrapper, because the body also calls ``@retry_if_session_inactive``- decorated upstream methods (``self.db.get_subnet`` via ``add_port_gateways``, ``self.db.get_security_groups`` via ``add_port_sg_names``) whose retry recovery for the ``_ensure_default_security_group`` IntegrityError race is exactly what gets disabled by an outer transaction. Restructure ``get_extra_port_information`` into three phases by transaction shape: * Phase A -- raw ``context.session.query(...)`` reads (fixed IPs, floating IPs, network, QoS rules) plus ``self.db._get_port_security_group_bindings`` (private upstream helper, no ``@retry_if_session_inactive``). These get a single ``CONTEXT_READER.using(context)`` block so ``session.in_transaction()`` is True at execute time. Safe to wrap because none of these calls goes through a retry-decorated upstream method that would have its retry disabled. All five reads are now inlined into the ``with`` block. The QoS rule queries use ``list(...)`` to force materialisation inside the reader; the other four are list-comprehensions that iterate eagerly before the ``with`` exits. * Phase B -- ``add_port_gateways`` and ``add_port_sg_names`` call into ``@retry_if_session_inactive``-decorated upstream methods and stay outside Phase A's reader for the same reason PR #12930 stripped the old wrapper: keep the inner retry decorator enabled. Each upstream call manages its own reader transaction internally, so the warning doesn't fire here either. * Phase C -- ``add_port_interface_name`` (pure compute) and ``add_port_project_data`` (Keystone-backed in-process cache) touch no DB; position relative to A/B is correctness-irrelevant. The QoS handling is also factored: ``add_port_qos`` becomes a pure static method ``build_qos_controls(bw_rules, pr_rules)`` that takes already-materialised rule lists, and the rule fetch lives in the caller -- inside Phase A for the postcommit path, from ``self._bulk`` for the resync path. This removes the previous bulk-vs-postcommit branch from inside the method body and makes the consumer testable in isolation. The four tiny one-call helpers (``get_security_groups_for_port``, ``get_fixed_ips_for_port``, ``get_floating_ips_for_port``, ``get_network_properties_for_port``) are removed entirely; their bodies are inlined. Misleading docstrings on ``add_port_gateways``, ``add_port_sg_names``, and the former ``add_port_qos`` -- left over from the pre-#12930 era when an outer ``_txn_from_context`` / ``CONTEXT_WRITER.using`` wrapped the entire postcommit body -- claimed *"This method assumes it's being called from within a database transaction and does not take out another one"*. That assumption was true then; it isn't now. Updated to describe the actual Phase A / Phase B placement. Tests: 185 pass (``make tox-caracal``). No mech-driver semantics change for either the postcommit path or the resync (bulk) path. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../plugins/ml2/drivers/calico/endpoints.py | 228 +++++++++--------- 1 file changed, 118 insertions(+), 110 deletions(-) diff --git a/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py b/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py index 8db9bbd1cf0..8472f0e0608 100644 --- a/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py +++ b/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py @@ -23,6 +23,7 @@ from neutron.db.models.l3 import FloatingIP from neutron.db.qos import models as qos_models from neutron_lib import exceptions as n_exc +from neutron_lib.db import api as db_api from oslo_config import cfg @@ -563,56 +564,6 @@ def delete_live_migration(self, name, mod_revision=None): def add_port_interface_name(self, port, port_extra): port_extra.interface_name = "tap" + port["id"][:11] - def get_security_groups_for_port(self, context, port): - """Checks which security groups apply for a given port. - - Frustratingly, the port dict provided to us when we call get_port may - actually be out of date, and I don't know why. This change ensures that - we get the most recent information. - """ - filters = {"port_id": [port["id"]]} - bindings = self.db._get_port_security_group_bindings(context, filters=filters) - return [binding["security_group_id"] for binding in bindings] - - def get_fixed_ips_for_port(self, context, port): - """Obtains a complete list of fixed IPs for a port. - - Much like with security groups, for some insane reason we're given an - out of date port dictionary when we call get_port. This forces an - explicit query of the IPAllocation table to get the right data out of - Neutron. - """ - return [ - {"subnet_id": ip["subnet_id"], "ip_address": ip["ip_address"]} - for ip in context.session.query(models_v2.IPAllocation).filter_by( - port_id=port["id"] - ) - ] - - def get_floating_ips_for_port(self, context, port): - """Obtains a list of floating IPs for a port.""" - return [ - {"int_ip": ip["fixed_ip_address"], "ext_ip": ip["floating_ip_address"]} - for ip in context.session.query(FloatingIP).filter_by( - fixed_port_id=port["id"] - ) - ] - - def get_network_properties_for_port(self, context, port, port_extra): - network = ( - context.session.query(models_v2.Network) - .filter_by(id=port["network_id"]) - .first() - ) - - try: - port_extra.network_name = datamodel_v3.sanitize_label_name_value( - network["name"], - NETWORK_NAME_MAX_LENGTH, - ) - except Exception: - LOG.warning(f"Failed to find network name for port {port['id']}") - def get_extra_port_information(self, context, port): """get_extra_port_information @@ -623,16 +574,81 @@ def get_extra_port_information(self, context, port): if self._bulk is not None: return self._get_extra_port_information_from_bulk(context, port) port_extra = PortExtra() - port_extra.fixed_ips = self.get_fixed_ips_for_port(context, port) - port_extra.floating_ips = self.get_floating_ips_for_port(context, port) - port_extra.security_groups = self.get_security_groups_for_port(context, port) - self.get_network_properties_for_port(context, port, port_extra) + # Collect information that uses raw queries into the Neutron DB. These queries + # need ``session.in_transaction()`` to be True, or else SQLAlchemy drops huge + # WARNING tracebacks saying "ORM session: SQL execution without transaction in + # progress" warnings otherwise. We arrange for that by using the + # ``CONTEXT_READER`` wrapper. + with db_api.CONTEXT_READER.using(context): + # We may have an out of date or incomplete port dict at this point. + # Explicitly query the IPAllocation table to get latest fixed IP data. + port_extra.fixed_ips = [ + {"subnet_id": ip["subnet_id"], "ip_address": ip["ip_address"]} + for ip in context.session.query(models_v2.IPAllocation).filter_by( + port_id=port["id"] + ) + ] + + # Similarly for floating IPs. + port_extra.floating_ips = [ + {"int_ip": ip["fixed_ip_address"], "ext_ip": ip["floating_ip_address"]} + for ip in context.session.query(FloatingIP).filter_by( + fixed_port_id=port["id"] + ) + ] + + # And security groups. + port_extra.security_groups = [ + binding["security_group_id"] + for binding in self.db._get_port_security_group_bindings( + context, filters={"port_id": [port["id"]]} + ) + ] + + # Read the Network so we can get its name. + network = ( + context.session.query(models_v2.Network) + .filter_by(id=port["network_id"]) + .first() + ) + try: + port_extra.network_name = datamodel_v3.sanitize_label_name_value( + network["name"], + NETWORK_NAME_MAX_LENGTH, + ) + except Exception: + LOG.warning(f"Failed to find network name for port {port['id']}") + + # Read QoS rules. + qos_policy_id = port.get("qos_policy_id") or port.get( + "qos_network_policy_id" + ) + LOG.debug("QoS Policy ID = %r", qos_policy_id) + if qos_policy_id: + bw_rules = list( + context.session.query(qos_models.QosBandwidthLimitRule).filter_by( + qos_policy_id=qos_policy_id + ) + ) + pr_rules = list( + context.session.query(qos_models.QosPacketRateLimitRule).filter_by( + qos_policy_id=qos_policy_id + ) + ) + else: + bw_rules = [] + pr_rules = [] + + # Now processing that either MUST be outside of any transaction - because it + # will call @retry_if_session_inactive-decorated calls that only work correctly + # when not in an outer transaction - or that doesn't involve the DB at all and + # so doesn't care about transaction state. self.add_port_gateways(context, port_extra) self.add_port_interface_name(port, port_extra) self.add_port_project_data(port, context, port_extra) self.add_port_sg_names(context, port_extra) - self.add_port_qos(port, context, port_extra) + port_extra.qos = self.build_qos_controls(bw_rules, pr_rules) return port_extra @@ -696,18 +712,29 @@ def _get_extra_port_information_from_bulk(self, context, port): self.add_port_project_data(port, context, port_extra) # QoS — use bulk-prefetched rules. - self.add_port_qos(port, context, port_extra) + qos_policy_id = port.get("qos_policy_id") or port.get("qos_network_policy_id") + LOG.debug("QoS Policy ID = %r", qos_policy_id) + if qos_policy_id: + bw_rules = bulk["qos_bw_by_policy"].get(qos_policy_id, []) + pr_rules = bulk["qos_pr_by_policy"].get(qos_policy_id, []) + else: + bw_rules = [] + pr_rules = [] + + port_extra.qos = self.build_qos_controls(bw_rules, pr_rules) return port_extra def add_port_gateways(self, context, port_extra): """add_port_gateways - Determine the gateway IP addresses for a given port's IP addresses, and - adds them to the port dict. + Determine the gateway IP addresses for a given port's IP addresses, and adds + them to the port dict. - This method assumes it's being called from within a database - transaction and does not take out another one. + The ``self.db.get_subnet`` call is ``@retry_if_session_inactive`` + + ``@CONTEXT_READER``-decorated, so this method MUST run without any outer + transaction we own (otherwise the retry decorator would be disabled). Each call + opens and closes its own reader transaction internally. """ for ip in port_extra.fixed_ips: subnet = self.db.get_subnet(context, ip["subnet_id"]) @@ -718,14 +745,16 @@ def add_port_sg_names(self, context, port_extra): Determine and store the name of each security group that a port uses. - This method assumes it's being called from within a database - transaction and does not take out another one. + The ``self.db.get_security_groups`` call is ``@retry_if_session_inactive`` + + ``@CONTEXT_READER``-decorated, so this method MUST run without any outer + transaction we own. The retry decorator does the recovery for the + ``_ensure_default_security_group`` race that ``default_sg=True`` (below) is + meant to side-step. """ - # Oddly, get_security_groups normally tries to create the default SG - # for the current tenant, and that can hit a - # NeutronDbObjectDuplicateEntry exception - presumably if there's a - # race with multiple servers or threads trying to do this at the same - # time. Adding "default_sg=True" here suppresses that creation + # Oddly, get_security_groups normally tries to create the default SG for the + # current tenant, and that can hit a NeutronDbObjectDuplicateEntry exception - + # presumably if there's a race with multiple servers or threads trying to do + # this at the same time. Adding "default_sg=True" here suppresses that creation # attempt. filters = {"id": port_extra.security_groups} for sg in self.db.get_security_groups( @@ -736,14 +765,9 @@ def add_port_sg_names(self, context, port_extra): ) port_extra.security_group_names[sg["id"]] = sg_name - def add_port_qos(self, port, context, port_extra): - """add_port_qos - - Determine and store QoS parameters for a port. - - This method assumes it's being called from within a database - transaction and does not take out another one. - """ + @staticmethod + def build_qos_controls(bw_rules, pr_rules): + """Build QoSControls dict, from the given rules and config.""" qos = {} # Minima, maxima and defaults as specified in the WorkloadEndpoint API, @@ -767,41 +791,25 @@ def cap(setting, minmax): setting = max return setting - qos_policy_id = port.get("qos_policy_id") or port.get("qos_network_policy_id") - LOG.debug("QoS Policy ID = %r", qos_policy_id) - if qos_policy_id: - # Prefer bulk-prefetched rules when running inside resync; - # otherwise do per-port queries (e.g. on postcommit hooks). - if self._bulk is not None: - bw_rules = self._bulk["qos_bw_by_policy"].get(qos_policy_id, []) - pr_rules = self._bulk["qos_pr_by_policy"].get(qos_policy_id, []) - else: - bw_rules = context.session.query( - qos_models.QosBandwidthLimitRule - ).filter_by(qos_policy_id=qos_policy_id) - pr_rules = context.session.query( - qos_models.QosPacketRateLimitRule - ).filter_by(qos_policy_id=qos_policy_id) - - for r in bw_rules: - LOG.debug("BW rule = %r", r) - direction = r.get("direction", "egress") - if r["max_kbps"] != 0: - qos[direction + "Bandwidth"] = cap( - r["max_kbps"] * 1000, MINMAX_BANDWIDTH - ) - if r["max_burst_kbps"] != 0: - qos[direction + "Peakrate"] = cap( - r["max_burst_kbps"] * 1000, MINMAX_BW_PEAKRATE - ) + for r in bw_rules: + LOG.debug("BW rule = %r", r) + direction = r.get("direction", "egress") + if r["max_kbps"] != 0: + qos[direction + "Bandwidth"] = cap( + r["max_kbps"] * 1000, MINMAX_BANDWIDTH + ) + if r["max_burst_kbps"] != 0: + qos[direction + "Peakrate"] = cap( + r["max_burst_kbps"] * 1000, MINMAX_BW_PEAKRATE + ) - for r in pr_rules: - LOG.debug("PR rule = %r", r) - direction = r.get("direction", "egress") - if r["max_kpps"] != 0: - qos[direction + "PacketRate"] = cap( - r["max_kpps"] * 1000, MINMAX_PACKET_RATE - ) + for r in pr_rules: + LOG.debug("PR rule = %r", r) + direction = r.get("direction", "egress") + if r["max_kpps"] != 0: + qos[direction + "PacketRate"] = cap( + r["max_kpps"] * 1000, MINMAX_PACKET_RATE + ) if cfg.CONF.calico.max_ingress_connections_per_port != 0: qos["ingressMaxConnections"] = cap( @@ -852,7 +860,7 @@ def cap(setting, minmax): else: qos["egressPacketBurst"] = calico_config.DEFAULT_PR_BURST - port_extra.qos = qos + return qos def add_port_project_data(self, port, context, port_extra): """add_port_project_data From ad3e847eb3df2a70754cfdb297dc0aac4dea9286 Mon Sep 17 00:00:00 2001 From: Nell Jerram Date: Thu, 18 Jun 2026 11:34:25 +0100 Subject: [PATCH 2/5] Fix awkward wording in comment --- .../networking_calico/plugins/ml2/drivers/calico/endpoints.py | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py b/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py index 8472f0e0608..6fba93acae7 100644 --- a/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py +++ b/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py @@ -578,8 +578,7 @@ def get_extra_port_information(self, context, port): # Collect information that uses raw queries into the Neutron DB. These queries # need ``session.in_transaction()`` to be True, or else SQLAlchemy drops huge # WARNING tracebacks saying "ORM session: SQL execution without transaction in - # progress" warnings otherwise. We arrange for that by using the - # ``CONTEXT_READER`` wrapper. + # progress". We arrange for that by using the ``CONTEXT_READER`` wrapper. with db_api.CONTEXT_READER.using(context): # We may have an out of date or incomplete port dict at this point. # Explicitly query the IPAllocation table to get latest fixed IP data. From 4a5c8f367db07aedb6c59b5277500773413d80ce Mon Sep 17 00:00:00 2001 From: Nell Jerram Date: Thu, 18 Jun 2026 11:44:27 +0100 Subject: [PATCH 3/5] endpoints: build_qos_controls inside the reader, drop list() wrapping Move ``port_extra.qos = self.build_qos_controls(bw_rules, pr_rules)`` from after the ``CONTEXT_READER.using`` block to inside it. Building the QoSControls dict iterates each rule and reads its ``direction``, ``max_kbps``, ``max_burst_kbps`` / ``max_kpps`` columns; doing this after the reader exits relies on the rule ORM objects still being readable in detached state -- which holds today (the columns are eager-loaded and oslo.db's reader-mode default leaves them in place) but would tie our correctness to oslo.db's ``expire_on_commit`` / ``rollback_reader_sessions`` internals. Doing it inside the reader, while the rules are still attached, sidesteps that question. With ``build_qos_controls`` now inside the reader, the ``list(...)`` wrapping that previously materialised the queries is no longer needed -- the lazy ``Query`` objects are iterated by ``build_qos_controls`` while the transaction is still active, so their SQL fires safely. ``build_qos_controls`` becomes shape-symmetric across paths: the postcommit path passes lazy Query objects, the bulk path passes plain Python lists, and either iterates correctly. Tests: tox-caracal passes (185 tests). Co-Authored-By: Claude Opus 4.7 (1M context) --- .../plugins/ml2/drivers/calico/endpoints.py | 30 +++++++++++-------- 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py b/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py index 6fba93acae7..6bee17c4561 100644 --- a/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py +++ b/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py @@ -619,25 +619,32 @@ def get_extra_port_information(self, context, port): except Exception: LOG.warning(f"Failed to find network name for port {port['id']}") - # Read QoS rules. + # Read QoS rules. We build port_extra.qos here, inside the + # reader, so that the per-rule attribute accesses inside + # build_qos_controls happen while the rule ORM objects are + # still attached to the session. Calling build_qos_controls + # after the reader exited would work today (the columns we + # access are simple eager-loaded ones, and oslo.db's reader + # mode typically leaves detached attributes readable), but + # would tie our correctness to oslo.db's expire_on_commit / + # rollback_reader_sessions configuration. build_qos_controls + # is pure compute -- no extra SQL -- so calling it inside + # the reader is cheap and decouples us from those internals. qos_policy_id = port.get("qos_policy_id") or port.get( "qos_network_policy_id" ) LOG.debug("QoS Policy ID = %r", qos_policy_id) if qos_policy_id: - bw_rules = list( - context.session.query(qos_models.QosBandwidthLimitRule).filter_by( - qos_policy_id=qos_policy_id - ) - ) - pr_rules = list( - context.session.query(qos_models.QosPacketRateLimitRule).filter_by( - qos_policy_id=qos_policy_id - ) - ) + bw_rules = context.session.query( + qos_models.QosBandwidthLimitRule + ).filter_by(qos_policy_id=qos_policy_id) + pr_rules = context.session.query( + qos_models.QosPacketRateLimitRule + ).filter_by(qos_policy_id=qos_policy_id) else: bw_rules = [] pr_rules = [] + port_extra.qos = self.build_qos_controls(bw_rules, pr_rules) # Now processing that either MUST be outside of any transaction - because it # will call @retry_if_session_inactive-decorated calls that only work correctly @@ -647,7 +654,6 @@ def get_extra_port_information(self, context, port): self.add_port_interface_name(port, port_extra) self.add_port_project_data(port, context, port_extra) self.add_port_sg_names(context, port_extra) - port_extra.qos = self.build_qos_controls(bw_rules, pr_rules) return port_extra From 191e77e0278de9fb92fcd0888ab82bda4c5630f7 Mon Sep 17 00:00:00 2001 From: Nell Jerram Date: Thu, 18 Jun 2026 11:51:15 +0100 Subject: [PATCH 4/5] Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --- .../networking_calico/plugins/ml2/drivers/calico/endpoints.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py b/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py index 6bee17c4561..26ec36dd034 100644 --- a/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py +++ b/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py @@ -617,7 +617,7 @@ def get_extra_port_information(self, context, port): NETWORK_NAME_MAX_LENGTH, ) except Exception: - LOG.warning(f"Failed to find network name for port {port['id']}") + LOG.warning("Failed to find network name for port %s", port["id"]) # Read QoS rules. We build port_extra.qos here, inside the # reader, so that the per-rule attribute accesses inside From c088407f83d6347bcc0ff01181d0be7f5ab34fed Mon Sep 17 00:00:00 2001 From: Nell Jerram Date: Thu, 18 Jun 2026 11:55:32 +0100 Subject: [PATCH 5/5] Fix imprecise comment about build_qos_controls being pure compute --- .../plugins/ml2/drivers/calico/endpoints.py | 20 +++++++++---------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py b/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py index 26ec36dd034..a2bd1be259f 100644 --- a/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py +++ b/networking-calico/networking_calico/plugins/ml2/drivers/calico/endpoints.py @@ -619,17 +619,15 @@ def get_extra_port_information(self, context, port): except Exception: LOG.warning("Failed to find network name for port %s", port["id"]) - # Read QoS rules. We build port_extra.qos here, inside the - # reader, so that the per-rule attribute accesses inside - # build_qos_controls happen while the rule ORM objects are - # still attached to the session. Calling build_qos_controls - # after the reader exited would work today (the columns we - # access are simple eager-loaded ones, and oslo.db's reader - # mode typically leaves detached attributes readable), but - # would tie our correctness to oslo.db's expire_on_commit / - # rollback_reader_sessions configuration. build_qos_controls - # is pure compute -- no extra SQL -- so calling it inside - # the reader is cheap and decouples us from those internals. + # Read QoS rules. We build port_extra.qos here, inside the reader, so that + # the per-rule attribute accesses inside build_qos_controls happen while the + # rule ORM objects are still attached to the session. Calling + # build_qos_controls after the reader exited would work today (the columns + # we access are simple eager-loaded ones, and oslo.db's reader mode + # typically leaves detached attributes readable), but would tie our + # correctness to oslo.db's expire_on_commit / rollback_reader_sessions + # configuration. Calling build_qos_controls inside the reader is cheap and + # decouples us from those internals. qos_policy_id = port.get("qos_policy_id") or port.get( "qos_network_policy_id" )