OpenVMM:Add support for Baremetal Guest and VFIO Passthrough.#4564
OpenVMM:Add support for Baremetal Guest and VFIO Passthrough.#4564vyadavmsft wants to merge 10 commits into
Conversation
Add the OpenVMM guest path for baremetal runs, including TAP networking, guest lifecycle wiring, and platform integration. Harden iDRAC recovery and transient power/job handling for baremetal hosts. Make OpenVMM source installs more reliable by serializing Rust toolchain setup, resolving Cargo paths, refreshing source checkouts, and extending the rustup install timeout.
Add OpenVMM guest runbook fields for host device pools and passthrough requests. Allocate devices from the existing Linux VFIO device-pool path, pass assigned PCI BDFs to OpenVMM with --vfio, and release reservations during guest cleanup.
Bind allocated OpenVMM passthrough PCI devices to vfio-pci before launch so the expected /dev/vfio/<iommu_group> node exists for --vfio. Track and restore the original host driver during cleanup, and keep passthrough validation node-aware for guest-enabled baremetal runs where the parent platform type is baremetal.
Create an OpenVMM PCIe root complex and generated root ports before attaching VFIO passthrough devices. This matches the upstream CLI contract for --vfio port=<name> and avoids launch failures where OpenVMM cannot resolve rp0 in the topology. Also type the PCI address helper structurally so it accepts both libvirt pool devices and OpenVMM context devices.
Create an explicit VMBus SCSI controller for OpenVMM launches and attach the OS disk and cloud-init DVD to stable LUNs. This avoids the deprecated implicit default SCSI path that left UEFI with an empty boot order in passthrough functional runs. The generated command now uses --vmbus-scsi id=lisa_scsi0 with disk LUN 0 and DVD LUNs starting at 1.
|
@copilot fix all the check errors |
There was a problem hiding this comment.
Pull request overview
This PR extends the OpenVMM SUT orchestrator to better support baremetal-hosted OpenVMM guests, including VFIO device passthrough and an alternate SSH connectivity mode via host proxying. It also tightens related platform/runner behavior and adds coverage in selftests.
Changes:
- Add OpenVMM guest enhancements: host-proxy SSH mode (tap-only), VFIO passthrough allocation/binding/release, and updated disk attachment via a default SCSI controller.
- Improve robustness around baremetal orchestration: iDRAC job retry/access-denied recovery, Ready platform dirty-environment reuse behavior when guests are enabled, and runner platform-type selection when guest platforms are configured.
- Expand/adjust selftests for the new OpenVMM schema/behavior and runner/platform logic.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| selftests/test_platform.py | Updates platform selftest scaffolding and adds Ready+guest dirty-env reuse tests. |
| selftests/test_openvmm_schema.py | Adds schema tests for guest extra_args decoding and host-proxy connection mode behavior. |
| selftests/test_openvmm_node.py | Extends OpenVMM node tests for iptables behavior and host-proxy connection configuration. |
| selftests/test_openvmm_installer.py | Updates installer tests for refreshed source checkout and rustup locking behavior. |
| selftests/test_environment.py | Adds regression test ensuring guest cleanup uses a snapshot to avoid list mutation issues. |
| selftests/runners/test_lisa_runner.py | Adds runner test coverage for guest platform type requirement merging; minor slicing tweak. |
| lisa/util/shell.py | Adds support for rebuilding jump-box socks during SSH connect attempts; expands exception handling. |
| lisa/tools/openvmm.py | Adds default SCSI controller and updates disk/dvd attachment command construction. |
| lisa/tools/cargo.py | Adds rustup locking, longer timeouts, and improved cargo discovery/install flow. |
| lisa/sut_orchestrator/ready.py | Adjusts dirty env reuse behavior when guest_enabled is true. |
| lisa/sut_orchestrator/openvmm/schema.py | Adds connection modes, extra_args decoding, and device pool/passthrough schema fields. |
| lisa/sut_orchestrator/openvmm/node.py | Implements host-proxy connection mode, VFIO passthrough handling, and refactors forwarding rule management. |
| lisa/sut_orchestrator/openvmm/installer.py | Refreshes source checkout, resolves cargo path robustly, and wraps rust-src installs with rustup lock. |
| lisa/sut_orchestrator/openvmm/context.py | Adds passthrough context structs and locking for forwarding/device-pool operations. |
| lisa/sut_orchestrator/baremetal/cluster/idrac.py | Adds iDRAC job queue retrying and access-denied recovery for Redfish operations. |
| lisa/runners/lisa_runner.py | Chooses an effective platform_type for checks when guest platforms are enabled. |
| lisa/node.py | Fixes guest cleanup to avoid mutation during iteration; adds proxy jump boxes to SSH shell construction. |
| lisa/microsoft/testsuites/performance/networkperf_passthrough.py | Enables passthrough perf suite to support OPENVMM and pulls OpenVMM node context when needed. |
| lisa/microsoft/testsuites/device_passthrough/functional_tests.py | Extends passthrough functional suite to support OPENVMM and generalizes host device lookup logic. |
Harden OpenVMM SSH forwarding cleanup by tracking only inserted iptables rules and serializing updates with a reentrant host lock. Constrain forwarded SSH DNAT traffic to enter from the host default route interface and leave through the OpenVMM tap bridge. Avoid importing libvirt at OpenVMM module import time so generic selftests and startup paths do not require the optional libvirt dependency. Surface passthrough driver restore failures during guest cleanup so hosts are not silently reused after VFIO cleanup errors. Update selftests for tracked forwarding cleanup, forwarded SSH rule direction, mock-platform connectivity checks, and runner message slice formatting.
|
| Count | |
|---|---|
| ✅ Passed | 1 |
| ❌ Failed | 0 |
| ⏭️ Skipped | 2 |
| Total | 3 |
Test case details
| Test Case | Status | Time (s) | Message |
|---|---|---|---|
| perf_nested_hyperv_storage_singledisk (lisa_0_4) | ⏭️ SKIPPED | 0.000 | check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Windows'>] but VM supports [<class 'lisa.oper |
| perf_nested_kvm_storage_singledisk (lisa_0_2) | ⏭️ SKIPPED | 72.533 | skipped: <lisa.operating_system.CBLMariner object at 0x7f2910606600> is not supported. Currently the test could be run o |
| perf_epoll (lisa_0_11) | ✅ PASSED | 203.910 |
e2bb386 to
1cf3cc8
Compare
🤖 AI Test SelectionNo test cases were selected for this PR. |
Fixed the check errors in commit Updates include:
Validated with:
|
|
| Count | |
|---|---|
| ✅ Passed | 0 |
| ❌ Failed | 0 |
| ⏭️ Skipped | 0 |
| Total | 0 |
Test case details
| Test Case | Status | Time (s) | Message |
|---|
|
| Count | |
|---|---|
| ✅ Passed | 30 |
| ❌ Failed | 3 |
| ⏭️ Skipped | 5 |
| Total | 38 |
Test case details
| Test Case | Status | Time (s) | Message |
|---|---|---|---|
| perf_epoll (lisa_0_11) | ✅ PASSED | 208.108 | |
| perf_messaging (lisa_0_10) | ✅ PASSED | 2330.979 | |
| perf_resource_disk_1024k (lisa_0_43) | ✅ PASSED | 1587.376 | |
| perf_ultra_datadisks_1024k (lisa_0_36) | ✅ PASSED | 1050.385 | |
| perf_ultra_datadisks_4k (lisa_0_35) | ✅ PASSED | 4365.983 | |
| perf_resource_disk_4k (lisa_0_44) | ✅ PASSED | 4406.557 | |
| perf_premiumv2_datadisks_1024k (lisa_0_38) | ✅ PASSED | 1782.516 | |
| perf_storage_generic_fio_test (lisa_0_52) | ✅ PASSED | 4467.093 | |
| perf_premiumv2_datadisks_4k (lisa_0_37) | ✅ PASSED | 4365.209 | |
| perf_nested_hyperv_storage_singledisk (lisa_0_4) | ⏭️ SKIPPED | 0.000 | check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Windows'>] but VM supports [<class 'lisa.oper |
| perf_nested_kvm_storage_singledisk (lisa_0_2) | ⏭️ SKIPPED | 68.030 | skipped: <lisa.operating_system.CBLMariner object at 0x7fc6be380c20> is not supported. Currently the test could be run o |
| perf_nested_kvm_storage_multidisk (lisa_0_3) | ❌ FAILED | 0.000 | deployment failed. LisaException: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without respo |
| perf_nested_hyperv_storage_multidisk (lisa_0_5) | ⏭️ SKIPPED | 0.000 | check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Windows'>] but VM supports [<class 'lisa.oper |
| perf_nested_hyperv_ntttcp_different_l1_nat (lisa_0_8) | ⏭️ SKIPPED | 0.000 | check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Windows'>] but VM supports [<class 'lisa.oper |
| perf_nested_kvm_netperf_pps_nat (lisa_0_9) | ⏭️ SKIPPED | 1135.116 | skipped: <lisa.operating_system.CBLMariner object at 0x7fc6bf356e70> is not supported. Currently the test could be run o |
| perf_tcp_latency_synthetic (lisa_0_12) | ✅ PASSED | 133.284 | |
| perf_tcp_ntttcp_128_connections_synthetic (lisa_0_18) | ✅ PASSED | 72.866 | |
| perf_tcp_single_pps_sriov (lisa_0_15) | ✅ PASSED | 248.630 | |
| perf_tcp_latency_sriov (lisa_0_13) | ✅ PASSED | 144.985 | |
| perf_tcp_max_pps_sriov (lisa_0_17) | ✅ PASSED | 151.191 | |
| perf_tcp_single_pps_synthetic (lisa_0_14) | ✅ PASSED | 205.014 | |
| perf_tcp_max_pps_synthetic (lisa_0_16) | ✅ PASSED | 150.458 | |
| perf_udp_1k_ntttcp_sriov (lisa_0_22) | ✅ PASSED | 501.057 | |
| perf_tcp_ntttcp_synthetic (lisa_0_19) | ✅ PASSED | 418.674 | |
| perf_tcp_iperf_sriov (lisa_0_24) | ✅ PASSED | 177.842 | |
| perf_udp_1k_ntttcp_synthetic (lisa_0_21) | ✅ PASSED | 289.009 | |
| perf_tcp_ntttcp_sriov (lisa_0_20) | ✅ PASSED | 1209.043 | |
| perf_sockperf_latency_tcp_sriov (lisa_0_27) | ✅ PASSED | 14.126 | |
| perf_sockperf_latency_udp_sriov (lisa_0_28) | ✅ PASSED | 7.490 | |
| perf_tcp_iperf_synthetic (lisa_0_23) | ✅ PASSED | 179.473 | |
| perf_sockperf_latency_tcp_sriov_busy_poll (lisa_0_31) | ✅ PASSED | 16.691 | |
| perf_sockperf_latency_udp_sriov_busy_poll (lisa_0_32) | ✅ PASSED | 16.725 | |
| perf_udp_iperf_sriov (lisa_0_26) | ❌ FAILED | 481.772 | failed. AssertionError: fail to find json format results |
| perf_sockperf_latency_udp_synthetic (lisa_0_29) | ✅ PASSED | 30.606 | |
| perf_sockperf_latency_tcp_synthetic_busy_poll (lisa_0_34) | ✅ PASSED | 13.121 | |
| perf_sockperf_latency_tcp_synthetic (lisa_0_30) | ✅ PASSED | 37.581 | |
| perf_sockperf_latency_udp_synthetic_busy_poll (lisa_0_33) | ✅ PASSED | 37.546 | |
| perf_udp_iperf_synthetic (lisa_0_25) | ❌ FAILED | 488.596 | failed. AssertionError: fail to find json format results |
|
| Count | |
|---|---|
| ✅ Passed | 30 |
| ❌ Failed | 2 |
| ⏭️ Skipped | 6 |
| Total | 38 |
Test case details
| Test Case | Status | Time (s) | Message |
|---|---|---|---|
| perf_resource_disk_1024k (lisa_0_43) | ✅ PASSED | 1615.510 | |
| perf_ultra_datadisks_1024k (lisa_0_36) | ✅ PASSED | 1058.356 | |
| perf_ultra_datadisks_4k (lisa_0_35) | ✅ PASSED | 4373.911 | |
| perf_resource_disk_4k (lisa_0_44) | ✅ PASSED | 4405.178 | |
| perf_premiumv2_datadisks_1024k (lisa_0_38) | ✅ PASSED | 1787.300 | |
| perf_storage_generic_fio_test (lisa_0_52) | ✅ PASSED | 4458.545 | |
| perf_premiumv2_datadisks_4k (lisa_0_37) | ✅ PASSED | 4372.785 | |
| perf_epoll (lisa_0_11) | ✅ PASSED | 220.949 | |
| perf_messaging (lisa_0_10) | ✅ PASSED | 2299.908 | |
| perf_nested_hyperv_storage_singledisk (lisa_0_4) | ⏭️ SKIPPED | 0.000 | check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Windows'>] but VM supports [<class 'lisa.oper |
| perf_nested_kvm_storage_singledisk (lisa_0_2) | ⏭️ SKIPPED | 74.779 | skipped: <lisa.operating_system.CBLMariner object at 0x7f2d5b27b3e0> is not supported. Currently the test could be run o |
| perf_nested_hyperv_storage_multidisk (lisa_0_5) | ⏭️ SKIPPED | 0.000 | check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Windows'>] but VM supports [<class 'lisa.oper |
| perf_nested_kvm_storage_multidisk (lisa_0_3) | ⏭️ SKIPPED | 61.782 | skipped: <lisa.operating_system.CBLMariner object at 0x7f2d68136d20> is not supported. Currently the test could be run o |
| perf_nested_hyperv_ntttcp_different_l1_nat (lisa_0_8) | ⏭️ SKIPPED | 0.000 | check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Windows'>] but VM supports [<class 'lisa.oper |
| perf_nested_kvm_netperf_pps_nat (lisa_0_9) | ⏭️ SKIPPED | 1152.855 | skipped: <lisa.operating_system.CBLMariner object at 0x7f2d6824cef0> is not supported. Currently the test could be run o |
| perf_tcp_latency_sriov (lisa_0_13) | ✅ PASSED | 151.125 | |
| perf_tcp_latency_synthetic (lisa_0_12) | ✅ PASSED | 183.663 | |
| perf_tcp_ntttcp_128_connections_synthetic (lisa_0_18) | ✅ PASSED | 98.485 | |
| perf_tcp_single_pps_sriov (lisa_0_15) | ✅ PASSED | 214.183 | |
| perf_tcp_single_pps_synthetic (lisa_0_14) | ✅ PASSED | 265.895 | |
| perf_tcp_max_pps_sriov (lisa_0_17) | ✅ PASSED | 162.011 | |
| perf_tcp_max_pps_synthetic (lisa_0_16) | ✅ PASSED | 244.749 | |
| perf_tcp_ntttcp_synthetic (lisa_0_19) | ✅ PASSED | 751.841 | |
| perf_udp_1k_ntttcp_synthetic (lisa_0_21) | ✅ PASSED | 665.237 | |
| perf_tcp_iperf_synthetic (lisa_0_23) | ✅ PASSED | 184.429 | |
| perf_sockperf_latency_udp_synthetic (lisa_0_29) | ✅ PASSED | 14.777 | |
| perf_sockperf_latency_tcp_synthetic (lisa_0_30) | ✅ PASSED | 8.416 | |
| perf_sockperf_latency_udp_synthetic_busy_poll (lisa_0_33) | ✅ PASSED | 21.747 | |
| perf_sockperf_latency_tcp_synthetic_busy_poll (lisa_0_34) | ✅ PASSED | 21.046 | |
| perf_tcp_ntttcp_sriov (lisa_0_20) | ✅ PASSED | 1228.416 | |
| perf_udp_iperf_synthetic (lisa_0_25) | ❌ FAILED | 513.035 | failed. AssertionError: fail to find json format results |
| perf_tcp_iperf_sriov (lisa_0_24) | ✅ PASSED | 214.687 | |
| perf_sockperf_latency_udp_sriov (lisa_0_28) | ✅ PASSED | 15.870 | |
| perf_sockperf_latency_tcp_sriov_busy_poll (lisa_0_31) | ✅ PASSED | 13.945 | |
| perf_udp_1k_ntttcp_sriov (lisa_0_22) | ✅ PASSED | 312.808 | |
| perf_sockperf_latency_udp_sriov_busy_poll (lisa_0_32) | ✅ PASSED | 14.465 | |
| perf_sockperf_latency_tcp_sriov (lisa_0_27) | ✅ PASSED | 44.265 | |
| perf_udp_iperf_sriov (lisa_0_26) | ❌ FAILED | 512.203 | failed. AssertionError: fail to find json format results |
Restore the forwarded SSH filter rule so DNAT traffic to the guest is accepted from any non-OpenVMM bridge ingress interface. The cleanup refactor accidentally tied the SSH path to the default route interface, which breaks baremetal hosts where the incoming management path is not that interface. Keep cleanup precise by continuing to track the exact inserted rule, and update the OpenVMM node selftest to reject the default-interface-constrained rule shape.
1a66219 to
df3678c
Compare
|
| Count | |
|---|---|
| ✅ Passed | 30 |
| ❌ Failed | 2 |
| ⏭️ Skipped | 6 |
| Total | 38 |
Test case details
| Test Case | Status | Time (s) | Message |
|---|---|---|---|
| perf_nested_hyperv_storage_singledisk (lisa_0_4) | ⏭️ SKIPPED | 0.000 | check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Windows'>] but VM supports [<class 'lisa.oper |
| perf_nested_kvm_storage_singledisk (lisa_0_2) | ⏭️ SKIPPED | 61.549 | skipped: <lisa.operating_system.CBLMariner object at 0x7f729335dbb0> is not supported. Currently the test could be run o |
| perf_nested_hyperv_storage_multidisk (lisa_0_5) | ⏭️ SKIPPED | 0.000 | check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Windows'>] but VM supports [<class 'lisa.oper |
| perf_nested_kvm_storage_multidisk (lisa_0_3) | ⏭️ SKIPPED | 46.422 | skipped: <lisa.operating_system.CBLMariner object at 0x7f7291dabf50> is not supported. Currently the test could be run o |
| perf_nested_hyperv_ntttcp_different_l1_nat (lisa_0_8) | ⏭️ SKIPPED | 0.000 | check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Windows'>] but VM supports [<class 'lisa.oper |
| perf_nested_kvm_netperf_pps_nat (lisa_0_9) | ⏭️ SKIPPED | 1094.888 | skipped: <lisa.operating_system.CBLMariner object at 0x7f72926d2c00> is not supported. Currently the test could be run o |
| perf_epoll (lisa_0_11) | ✅ PASSED | 217.485 | |
| perf_messaging (lisa_0_10) | ✅ PASSED | 2292.000 | |
| perf_resource_disk_1024k (lisa_0_43) | ✅ PASSED | 1588.969 | |
| perf_ultra_datadisks_1024k (lisa_0_36) | ✅ PASSED | 1045.992 | |
| perf_ultra_datadisks_4k (lisa_0_35) | ✅ PASSED | 4361.326 | |
| perf_resource_disk_4k (lisa_0_44) | ✅ PASSED | 4398.060 | |
| perf_premiumv2_datadisks_1024k (lisa_0_38) | ✅ PASSED | 1779.442 | |
| perf_storage_generic_fio_test (lisa_0_52) | ✅ PASSED | 4482.781 | |
| perf_premiumv2_datadisks_4k (lisa_0_37) | ✅ PASSED | 4360.457 | |
| perf_tcp_latency_synthetic (lisa_0_12) | ✅ PASSED | 120.246 | |
| perf_tcp_latency_sriov (lisa_0_13) | ✅ PASSED | 130.196 | |
| perf_tcp_ntttcp_128_connections_synthetic (lisa_0_18) | ✅ PASSED | 54.775 | |
| perf_tcp_single_pps_sriov (lisa_0_15) | ✅ PASSED | 220.448 | |
| perf_tcp_max_pps_synthetic (lisa_0_16) | ✅ PASSED | 234.672 | |
| perf_tcp_single_pps_synthetic (lisa_0_14) | ✅ PASSED | 236.984 | |
| perf_tcp_max_pps_sriov (lisa_0_17) | ✅ PASSED | 156.229 | |
| perf_tcp_ntttcp_synthetic (lisa_0_19) | ✅ PASSED | 400.628 | |
| perf_udp_1k_ntttcp_synthetic (lisa_0_21) | ✅ PASSED | 378.811 | |
| perf_tcp_iperf_synthetic (lisa_0_23) | ✅ PASSED | 175.649 | |
| perf_sockperf_latency_udp_synthetic (lisa_0_29) | ✅ PASSED | 13.486 | |
| perf_sockperf_latency_tcp_synthetic (lisa_0_30) | ✅ PASSED | 6.534 | |
| perf_tcp_ntttcp_sriov (lisa_0_20) | ✅ PASSED | 525.258 | |
| perf_sockperf_latency_udp_synthetic_busy_poll (lisa_0_33) | ✅ PASSED | 14.099 | |
| perf_sockperf_latency_tcp_synthetic_busy_poll (lisa_0_34) | ✅ PASSED | 14.173 | |
| perf_udp_iperf_synthetic (lisa_0_25) | ❌ FAILED | 482.550 | failed. AssertionError: fail to find json format results |
| perf_tcp_iperf_sriov (lisa_0_24) | ✅ PASSED | 192.080 | |
| perf_sockperf_latency_tcp_sriov (lisa_0_27) | ✅ PASSED | 11.522 | |
| perf_sockperf_latency_udp_sriov (lisa_0_28) | ✅ PASSED | 6.491 | |
| perf_sockperf_latency_tcp_sriov_busy_poll (lisa_0_31) | ✅ PASSED | 10.069 | |
| perf_sockperf_latency_udp_sriov_busy_poll (lisa_0_32) | ✅ PASSED | 10.007 | |
| perf_udp_1k_ntttcp_sriov (lisa_0_22) | ✅ PASSED | 536.677 | |
| perf_udp_iperf_sriov (lisa_0_26) | ❌ FAILED | 598.762 | failed. AssertionError: fail to find json format results |
Description
Summary
Add OpenVMM baremetal guest support and VFIO passthrough support for LISA, including guest provisioning, networking, forwarded SSH, storage attachment, and passthrough cleanup.
Changes
Related Issue
Type of Change
Checklist
Test Validation
Key Test Cases:
Impacted LISA Features:
Tested Azure Marketplace Images:
Test Results