-
Notifications
You must be signed in to change notification settings - Fork 301
CA-428620: fix occasional vm suspend error #7134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -205,7 +205,7 @@ let check_op_for_feature ~__context ~vmr:_ ~vmmr ~vmgmr ~op ~ref ~strict = | |
| let some_err e = Some (e, [Ref.string_of ref]) in | ||
| let lack_feature feature = not (has_feature ~vmgmr ~feature) in | ||
| match op with | ||
| | (`suspend | `checkpoint | `pool_migrate | `migrate_send) when is_live -> ( | ||
| | (`pool_migrate | `migrate_send) when is_live -> ( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With this, migrations are still blocked in xapi, while suspend is allowed. There's already an |
||
| match get_feature ~vmgmr ~feature:"data-cant-suspend-reason" with | ||
| | Some reason -> | ||
| Some (Api_errors.vm_non_suspendable, [Ref.string_of ref; reason]) | ||
|
|
@@ -215,6 +215,18 @@ let check_op_for_feature ~__context ~vmr:_ ~vmmr ~vmgmr ~op ~ref ~strict = | |
| | None -> | ||
| None | ||
| ) | ||
| | (`suspend | `checkpoint) when is_live -> | ||
| (* Cannot gate suspend/checkpoint on the cached data-cant-suspend-reason: | ||
| that xenstore key is set by the QMP event thread whenever a QMP event | ||
| arrives and Query_migratable returns an error (e.g. NVMe has transient | ||
| in-flight I/O). By the time the actual suspend executes, the device may | ||
| be idle. xenopsd performs a fresh Query_migratable check via | ||
| assert_can_save before committing to the save sequence, which is the | ||
| authoritative check. *) | ||
|
Comment on lines
+219
to
+225
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems to me that originally the xapi uses the This change works only for the value in xenstore being cleaned up between xapi gate and xenopsd gate? If this is true, I would think a retry on client side is fine. |
||
| if (not implicit_support) && strict && lack_feature "feature-suspend" then | ||
| some_err Api_errors.vm_lacks_feature | ||
| else | ||
| None | ||
| | _ when implicit_support -> | ||
| None | ||
| | `clean_shutdown | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -598,12 +598,16 @@ module VM = struct | |
|
|
||
| let wait_shutdown _ _vm _reason _timeout = true | ||
|
|
||
| let assert_can_save _vm = () | ||
|
|
||
| let save _ _cb vm flags data vgpu_data _pre_suspend_callback = | ||
| with_lock m (save_nolock vm flags data vgpu_data) | ||
|
|
||
| let restore _ _cb vm vbds vifs data vgpu_data extras = | ||
| with_lock m (restore_nolock vm vbds vifs data vgpu_data extras) | ||
|
|
||
| let resume _ _vm = () | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do you add this? |
||
|
|
||
| let s3suspend _ _vm = () | ||
|
|
||
| let s3resume _ _vm = () | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand the split but don't we have the same problem of false negatives here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the migration will do the suspend also.