Skip to content

[AI Generated] BugFix: vm_resize security_profile filter and clearer skip#4572

Open
rabdulfaizy wants to merge 1 commit into
mainfrom
bugfix/vm-resize-cvm-and-impossible-decrease_300626_112153
Open

[AI Generated] BugFix: vm_resize security_profile filter and clearer skip#4572
rabdulfaizy wants to merge 1 commit into
mainfrom
bugfix/vm-resize-cvm-and-impossible-decrease_300626_112153

Conversation

@rabdulfaizy

Copy link
Copy Markdown
Collaborator

What

Two related fixes in the VM resize test path.

1. azure/features.py -- Resize._select_vm_size
When picking a candidate VM size to resize to, also filter out SKUs whose advertised security_profile SetSpace does not include the source VM''s deployed security profile (Standard / SecureBoot / CVM).

  • New _get_actual_security_profile() reads the deployed VM''s security_profile.security_type via the compute client and maps TrustedLaunch -> SecureBoot, ConfidentialVM -> CVM, else -> Standard.
  • New _compare_security_profile(candidate_size, actual_security_profile) and its use in _is_candidate_size_valid skip any candidate that cannot host the source VM''s security profile.

Without this filter the random picker can choose a non-CVM SKU for a CVM-deployed VM and every Azure update call is rejected with PropertyChangeNotAllowed, exhausting all retries.

2. microsoft/testsuites/core/vm_resize.py
When every retry hits retryable Azure errors (SkuNotAvailable, AllocationFailed, capacity, ...), raise SkippedException including the last Azure error instead of a bare assert that hid the cause.

  • Pre-declare expected_vm_capability, origin_vm_size, final_vm_size, last_error before the loop.
  • Capture last_error = str(e) in the retryable-error branch.
  • Replace the terminal assert with raise SkippedException(...) including last_error.

Why

Before this change, vm_resize on VMs with SecureBoot/TrustedLaunch or CVM security profiles would routinely:

  1. Pick a candidate target size that fundamentally cannot host that security profile, and
  2. When every candidate failed for legitimate Azure capacity reasons, fail with a bare AssertionError that hid the real Azure error -- misclassifying an environment issue as a test bug.

How tested

Local runs of verify_vm_resize across all three security profiles on Azure:

Security profile Source SKU Resized to Result Duration
Standard Standard_F1s Standard_A1_v2_Gen2 PASSED 6m56s
CVM Standard_DC2as_v5 Standard_EC2es_v6 PASSED 3m17s
SecureBoot Standard_D2ds_v5 Standard_A2m_v2_Gen2 PASSED 5m41s

Lint clean on changed files: black (23.1.x, pinned), flake8, pylint 10/10, mypy (no new errors introduced).

Notes

  • Opened as Draft for reviewer visibility while collecting a second-pass run in a different region.
  • Backwards compatible: actual_security_profile is Optional[SecurityProfileType]; when None the filter is a no-op.

…skip

Two related fixes in the VM resize test path:

1. azure Resize feature: when selecting a candidate VM size to resize to, also filter out SKUs whose advertised security_profile SetSpace does not include the source VM's deployed security profile (CVM / TrustedLaunch / Standard). Without this filter the random picker can choose a non-CVM SKU for a CVM-deployed VM and every Azure update call is rejected with PropertyChangeNotAllowed, exhausting all retries.

2. VmResize test suite: when the retry loop ends without a successful resize, raise SkippedException including the last Azure error instead of asserting 'fail to find proper vm size'. This is an environment limitation (no compatible candidate was picked across retries), not a defect in the system under test.
Copilot AI review requested due to automatic review settings July 1, 2026 04:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves reliability and diagnosability of the Azure VM resize path by filtering resize targets based on the VM’s security profile and by surfacing the last retryable Azure error when resize retries are exhausted.

Changes:

  • Add security-profile compatibility filtering when selecting candidate Azure VM sizes for resize.
  • Improve vm_resize retry exhaustion behavior by skipping with the last observed Azure error instead of failing with a generic assertion.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
lisa/sut_orchestrator/azure/features.py Adds security-profile-aware candidate VM size validation during resize selection.
lisa/microsoft/testsuites/core/vm_resize.py Improves retry-loop failure reporting by raising SkippedException with the last Azure error.
Comments suppressed due to low confidence (1)

lisa/microsoft/testsuites/core/vm_resize.py:160

  • The retry loop uses time.sleep(1), which is discouraged in this repo’s test code because it adds fixed delays and can cause flakiness. Prefer a bounded retry helper (e.g. the retry decorator / retry_without_exceptions) with a controlled delay/backoff strategy.
                    last_error = str(e)
                    retry = retry + 1
                else:
                    raise e
                time.sleep(1)

Comment on lines +2583 to +2596
if not actual_security_profile:
return True
assert candidate_size.capability
assert candidate_size.capability.features
candidate_sp = next(
(
feature
for feature in candidate_size.capability.features
if feature.type == SecurityProfileSettings.type
),
None,
)
if not isinstance(candidate_sp, SecurityProfileSettings):
return True
Comment on lines +2550 to +2555
# Read the security profile the source VM was actually deployed with
# (a concrete value), as opposed to what the SKU advertises in its
# capability SetSpace. Resizing a CVM-deployed VM to a SKU whose
# advertised SetSpace includes CVM AND Standard is fine; resizing
# to a SKU that only advertises Standard will be rejected by Azure
# with PropertyChangeNotAllowed.
Comment on lines 158 to 159
else:
raise e
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

❌ AI Test Selection — FAILED

88 test case(s) selected (view run)

Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest

Count
✅ Passed 70
❌ Failed 4
⏭️ Skipped 14
Total 88
Test case details
Test Case Status Time (s) Message
smoke_test (lisa_0_1) ✅ PASSED 37.626
verify_deployment_provision_synthetic_nic (lisa_0_3) ✅ PASSED 32.422
verify_deployment_provision_standard_ssd_disk (lisa_0_4) ✅ PASSED 41.024
smoke_test_check_serial_console_pattern (lisa_0_2) ✅ PASSED 40.807
verify_deployment_provision_premium_disk (lisa_0_6) ✅ PASSED 45.061
verify_deployment_provision_premiumv2_disk (lisa_0_7) ✅ PASSED 41.712
verify_deployment_provision_ephemeral_managed_disk (lisa_0_5) ✅ PASSED 51.352
verify_deployment_provision_sriov (lisa_0_8) ✅ PASSED 47.604
verify_reboot_in_platform (lisa_0_9) ✅ PASSED 44.348
verify_deployment_provision_ultra_datadisk (lisa_0_10) ✅ PASSED 57.085
verify_deployment_provision_swiotlb_force (lisa_0_13) ✅ PASSED 71.780
verify_stop_start_in_platform (lisa_0_11) ✅ PASSED 200.950
stress_reboot (lisa_0_12) ✅ PASSED 576.380
verify_sched_core_basic (lisa_0_14) ⏭️ SKIPPED 0.000 check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.CBLMariner'>] but VM supports [<class 'lisa.o
verify_vmbus_devices_channels_bsd (lisa_0_18) ⏭️ SKIPPED 0.000 check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.BSD'>] but VM supports [<class 'lisa.operatin
verify_vmbus_devices_channels (lisa_0_19) ✅ PASSED 15.581
verify_vmbus_heartbeat_properties (lisa_0_20) ✅ PASSED 16.879
verify_network_manager_not_installed (lisa_0_41) ⏭️ SKIPPED 0.000 check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Fedora'>] but VM supports [<class 'lisa.opera
verify_network_file_configuration (lisa_0_42) ⏭️ SKIPPED 0.000 check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Fedora'>, <class 'lisa.operating_system.CBLMa
verify_ifcfg_eth0 (lisa_0_43) ⏭️ SKIPPED 0.000 check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Fedora'>] but VM supports [<class 'lisa.opera
verify_udev_rules_moved (lisa_0_44) ⏭️ SKIPPED 0.000 check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.CoreOs'>, <class 'lisa.operating_system.Fedor
verify_dhcp_file_configuration (lisa_0_45) ⏭️ SKIPPED 0.000 check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Suse'>, <class 'lisa.operating_system.CBLMari
verify_yum_conf (lisa_0_46) ⏭️ SKIPPED 0.000 check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Fedora'>] but VM supports [<class 'lisa.opera
verify_cloud_init_error_status (lisa_0_53) ⏭️ SKIPPED 0.000 check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.CBLMariner'>] but VM supports [<class 'lisa.o
verify_default_targetpw (lisa_0_39) ✅ PASSED 2.189
verify_grub (lisa_0_40) ✅ PASSED 1.359
verify_repository_installed (lisa_0_49) ✅ PASSED 8.323
verify_serial_console_is_enabled (lisa_0_50) ✅ PASSED 1.325
verify_no_pre_exist_users (lisa_0_55) ✅ PASSED 2.751
verify_resource_disk_file_system (lisa_0_57) ✅ PASSED 5.641
verify_waagent_version (lisa_0_58) ✅ PASSED 0.202
verify_openssl_version (lisa_0_60) ✅ PASSED 1.437
verify_azure_64bit_os (lisa_0_61) ✅ PASSED 1.335
verify_omi_version (lisa_0_62) ✅ PASSED 2.310
verify_no_swap_on_osdisk (lisa_0_63) ✅ PASSED 1.394
verify_essential_kernel_modules (lisa_0_64) ✅ PASSED 1.997
verify_hv_kvp_daemon_installed (lisa_0_48) ✅ PASSED 1.566
verify_client_active_interval (lisa_0_54) ✅ PASSED 1.581
verify_resource_disk_readme_file (lisa_0_56) ✅ PASSED 5.458
verify_os_update (lisa_0_47) ✅ PASSED 72.499
verify_boot_error_fail_warnings (lisa_0_52) ❌ FAILED 4.321 failed. AssertionError: [unexpected error/failure/warnings shown up in bootup log of distro Ubuntu 22.4.0] Expected <['J
verify_python_version (lisa_0_59) ✅ PASSED 1.380
verify_bash_history_is_empty (lisa_0_51) ✅ PASSED 8.010
verify_boot_with_debug_kernel (lisa_0_87) ⏭️ SKIPPED 0.000 check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.Redhat'>, <class 'lisa.operating_system.CentO
verify_gdb (lisa_0_0) ✅ PASSED 63.399
verify_l3_cache (lisa_0_21) ✅ PASSED 1.595
verify_cpu_count (lisa_0_22) ✅ PASSED 0.202
verify_vmbus_interrupts (lisa_0_23) ❌ FAILED 3.736 failed. AssertionError: [Hypervisor synthetic timer interrupt should be processed by all vCPU's] Expected to be
verify_dhcp_client_timeout (lisa_0_25) ✅ PASSED 1.760
verify_dns_name_resolution (lisa_0_26) ✅ PASSED 2.391
verify_dns_name_resolution_after_upgrade (lisa_0_27) ✅ PASSED 92.207
verify_vdso (lisa_0_15) ✅ PASSED 173.031
verify_floppy_module_is_blacklisted (lisa_0_16) ✅ PASSED 7.422
verify_serial_console (lisa_0_17) ✅ PASSED 47.315
verify_hyperv_modules (lisa_0_37) ✅ PASSED 5.649
verify_initrd_modules (lisa_0_36) ✅ PASSED 29.093
verify_lis_modules_version (lisa_0_35) ⏭️ SKIPPED 0.193 skipped: Ubuntu not supported. This test case only supports Redhat distros.
verify_reload_hyperv_modules (lisa_0_38) ✅ PASSED 185.040
verify_enable_kprobe (lisa_0_67) ✅ PASSED 4.287
verify_kvp (lisa_0_24) ✅ PASSED 7.493
verify_hyperv_platform_id (lisa_0_68) ✅ PASSED 33.317
verify_resource_disk_mounted (lisa_0_70) ✅ PASSED 4.489
verify_swap (lisa_0_71) ✅ PASSED 2.381
verify_resource_disk_io (lisa_0_72) ✅ PASSED 8.010
verify_scsi_disk_controller_type (lisa_0_73) ✅ PASSED 0.303
verify_os_partition_identifier (lisa_0_75) ✅ PASSED 2.139
verify_disks_device_timeout_setting (lisa_0_69) ✅ PASSED 2.984
verify_hot_add_disk_serial_premium_ssd (lisa_0_78) ✅ PASSED 102.290
verify_hot_add_disk_serial (lisa_0_76) ✅ PASSED 183.260
verify_hot_add_disk_serial_standard_ssd (lisa_0_77) ✅ PASSED 191.479
verify_hot_add_disk_parallel_standard_ssd (lisa_0_80) ✅ PASSED 88.517
verify_hot_add_disk_parallel (lisa_0_79) ✅ PASSED 248.115
verify_hot_add_disk_serial_random_lun_standard_ssd (lisa_0_81) ✅ PASSED 192.489
verify_hot_add_disk_parallel_premium_ssd (lisa_0_83) ✅ PASSED 202.714
verify_hot_add_disk_serial_random_lun_premium_ssd (lisa_0_82) ✅ PASSED 220.859
verify_nfsv4_basic (lisa_0_84) ✅ PASSED 162.871
verify_cifs_basic (lisa_0_86) ❌ FAILED 26.848 failed. ResourceNotFoundError: (ResourceNotFound) The Resource 'Microsoft.Storage/storageAccounts/lisafs5qeft3w60i' unde
verify_nvme_disk_controller_type (lisa_0_74) ✅ PASSED 8.092
verify_smb_linux (lisa_0_85) ✅ PASSED 73.669
verify_pmu_disabled_for_arm64 (lisa_0_33) ⏭️ SKIPPED 0.211 skipped: This test case does not support CpuArchitecture.X64. This validation is only for ARM64.
verify_timedrift_corrected (lisa_0_34) ✅ PASSED 74.324
verify_timesync_ptp (lisa_0_28) ✅ PASSED 8.603
verify_timesync_unbind_clocksource (lisa_0_29) ✅ PASSED 50.822
verify_timesync_unbind_clockevent (lisa_0_30) ❌ FAILED 1.527 failed. AssertionError: [Expected clockevent name is Hyper-V clockevent, but actual it is lapic.] Expected to be
verify_timesync_ntp (lisa_0_31) ✅ PASSED 48.221
verify_timesync_chrony (lisa_0_32) ✅ PASSED 34.850
verify_zram_crypto_zstd (lisa_0_65) ⏭️ SKIPPED 1.088 before_case skipped: Unsupported system: 'Ubuntu 22.04.5 LTS'. zram compression test requires Azure Linux 3.0+.
verify_zram_crypto_lz4 (lisa_0_66) ⏭️ SKIPPED 0.201 before_case skipped: Unsupported system: 'Ubuntu 22.04.5 LTS'. zram compression test requires Azure Linux 3.0+.

@rabdulfaizy rabdulfaizy marked this pull request as ready for review July 1, 2026 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants