Skip to content

sriov: add scan-cycle fallback for verify_irqbalance on managed-IRQ VMs#4553

Open
knelsonmeister wants to merge 1 commit into
mainfrom
knelsonmeister/irqbalance-scan-cycle-fallback
Open

sriov: add scan-cycle fallback for verify_irqbalance on managed-IRQ VMs#4553
knelsonmeister wants to merge 1 commit into
mainfrom
knelsonmeister/irqbalance-scan-cycle-fallback

Conversation

@knelsonmeister

Copy link
Copy Markdown
Collaborator

Problem

verify_irqbalance fails consistently on v6 SKUs with MANA NICs (e.g. RHEL 9.0 on Standard_E32ds_v6). On these VMs, all network IRQs use managed mode where the kernel controls affinity. With 32+ cores distributing IRQs evenly across cache domains, irqbalance correctly determines no rebalancing is needed and never prints Selecting irq X for rebalancing. The test only accepted that message as proof of functionality, so it failed even though irqbalance was working correctly.

Fix

Add a fallback assertion: when no active rebalancing is detected, verify that irqbalance completed at least one scan cycle by checking for:

  1. Separator lines (-----...) indicating scan output
  2. Interrupt N node_num messages showing it scanned the interrupt topology

This confirms irqbalance ran and analyzed the system, even if it decided nothing needed moving. The original Selecting irq path still works on configs where rebalancing occurs.

Testing

Config Result
RHEL 9.0 / Standard_E32ds_v6 (previously failing, run 1) PASSED (336.8s)
RHEL 9.0 / Standard_E32ds_v6 (previously failing, run 2) PASSED (345.7s)
Ubuntu 22.04 / Standard_D8ds_v5 (regression check) PASSED (345.5s)

@knelsonmeister knelsonmeister requested a review from LiliDeng as a code owner June 24, 2026 15:27
Copilot AI review requested due to automatic review settings June 24, 2026 15:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the verify_irqbalance SR-IOV test to handle managed-IRQ (e.g., MANA/v6 SKU) scenarios where irqbalance correctly decides no rebalancing is needed, by accepting evidence of a completed scan cycle when no “Selecting irq … for rebalancing” log is present.

Changes:

  • Adds a fallback validation path: if no IRQ is selected for rebalancing, assert that irqbalance emitted scan-cycle markers in --debug output.
  • Adds a debug log indicating the “managed IRQ / no rebalance needed” scenario was detected.

Comment thread lisa/microsoft/testsuites/network/sriov.py Outdated
@github-actions

Copy link
Copy Markdown

✅ AI Test Selection — PASSED

1 test case(s) selected (view run)

Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest

Count
✅ Passed 1
❌ Failed 0
⏭️ Skipped 0
Total 1
Test case details
Test Case Status Time (s) Message
verify_irqbalance (lisa_0_0) ✅ PASSED 318.076

scan_match = re.search(
r"-{5,}",
result.stdout,
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paste the sample raw string for result.stdout

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was removed as it is not a good indicator of a scan.
New code was re-tested, and the fix still works.

On v6 SKUs with MANA NICs, all network IRQs use managed mode where the
kernel controls affinity. With 32+ cores distributing IRQs evenly across
cache domains, irqbalance correctly determines no rebalancing is needed
and never prints 'Selecting irq X for rebalancing'. This causes
verify_irqbalance to fail even though irqbalance is working correctly.

Add a fallback assertion: when no active rebalancing is detected, verify
that irqbalance completed at least one scan cycle by checking for
separator lines and 'Interrupt N node_num' messages in its debug output.
This confirms irqbalance ran and scanned the interrupt topology, even if
it decided nothing needed moving.

Tested on:
- RHEL 9.0 / Standard_E32ds_v6 (previously failing): PASSED 2/2
- Ubuntu 22.04 / Standard_D8ds_v5 (regression check): PASSED 1/1
@knelsonmeister knelsonmeister force-pushed the knelsonmeister/irqbalance-scan-cycle-fallback branch from ed85185 to 4f903e3 Compare June 26, 2026 01:57
@github-actions

Copy link
Copy Markdown

✅ AI Test Selection — PASSED

1 test case(s) selected (view run)

Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest

Count
✅ Passed 1
❌ Failed 0
⏭️ Skipped 0
Total 1
Test case details
Test Case Status Time (s) Message
verify_irqbalance (lisa_0_0) ✅ PASSED 312.360

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants