[FR-126] Rollback#274
Conversation
| } | ||
|
|
||
| if config.RollbackStrategy.MaxVersionAge != nil && time.Since(*targetVersionInfo.LastCurrentTime) > config.RollbackStrategy.MaxVersionAge.Duration { | ||
| l.Info("Skipping rollback: the version's last current time exceeds MaxVersionAge", |
There was a problem hiding this comment.
| l.Info("Skipping rollback: the version's last current time exceeds MaxVersionAge", | |
| l.Debug("Skipping rollback: the version's last current time exceeds MaxVersionAge", |
There was a problem hiding this comment.
The logger has no Debug level, only Info and Error. (If I remember correctly, it was related to K8s events?)
| } | ||
|
|
||
| allErrs = append(allErrs, validateRolloutStrategy(new.Spec.RolloutStrategy)...) | ||
| allErrs = append(allErrs, validateRollbackStrategy(*new.Spec.RollbackStrategy)...) |
There was a problem hiding this comment.
Could new.Spec.RollbackStrategy ever be nil? Should probably add a nil check here or update the type signature for validateRollbackStrategy instead of dereferencing.
There was a problem hiding this comment.
The rollback strategy is a pointer, because it's not mandatory in the spec. At the point of the validation it should not be nil, because of the default we set, but just in case, I added a nil check.
| if s.RollbackStrategy == nil { | ||
| s.RollbackStrategy = &RollbackStrategy{Strategy: RollbackAllAtOnce} | ||
| } else if s.RollbackStrategy.Strategy == "" { | ||
| s.RollbackStrategy.Strategy = RollbackAllAtOnce | ||
| } |
There was a problem hiding this comment.
Nit: this is entirely a matter of taste but I think two if statements is easier to read than the if/else flow (more clear that the strategy is always all at once if left unset).
| if s.RollbackStrategy == nil { | |
| s.RollbackStrategy = &RollbackStrategy{Strategy: RollbackAllAtOnce} | |
| } else if s.RollbackStrategy.Strategy == "" { | |
| s.RollbackStrategy.Strategy = RollbackAllAtOnce | |
| } | |
| if s.RollbackStrategy == nil { | |
| s.RollbackStrategy = &RollbackStrategy{} | |
| } | |
| if s.RollbackStrategy.Strategy == "" { | |
| s.RollbackStrategy.Strategy = RollbackAllAtOnce | |
| } |
There was a problem hiding this comment.
I've done the change, but I remember the times, where I had to optimize even the number of if calls... :P
| default: | ||
| strategy = temporaliov1alpha1.UpdateAllAtOnce |
There was a problem hiding this comment.
🤔 I think this ought to be an error condition; if the controller sees an unknown strategy it should fall back to manual mode and stop attempting updates to the default version.
There was a problem hiding this comment.
Before this function is used, there is a check to see if the rollout strategy is set to manual, and if it is, rollback is disabled. In any case, I updated the default to be manual.
| }, | ||
| } | ||
|
|
||
| testCases := []struct { |
There was a problem hiding this comment.
I think there should be a test cases for
1.RollbackStrategy == nil (even though the webhook should set a default, it's still possible to run the controller without the webhook enabled). When nil I would assert that the behavior matches the default rollback strategy configured by the webhook.
2. RollbackStrategy.MaxVersionAge == nil (assert default max version age applied, again to match the default webhook behavior)
3. RollbackStrategy.MaxVersionAge == 0 (assert false, even when the rest of the current status would normally have resulted in a rollback)
There was a problem hiding this comment.
I changed the Config to have rollback strategy at all times, so the first two are no longer needed and the third existed already.
| @@ -23,8 +24,9 @@ func TestTemporalWorkerDeployment_ValidateCreate(t *testing.T) { | |||
| tests := map[string]struct { | |||
| obj runtime.Object | |||
| errorMsg string | |||
| warnMsg string | |||
There was a problem hiding this comment.
Should there be a warning emitted for a combination of Manual rollout + Progressive rollback? That seems in the same vein as rollout steps being faster than rollback steps. Please add a test case for that if so.
There was a problem hiding this comment.
When rollback is Manual, rollback is disabled in the planner, so I don't think we need a warning.
371578c to
a5d1e33
Compare
a5d1e33 to
4b0bd82
Compare
jaypipes
left a comment
There was a problem hiding this comment.
@eniko-dif thanks very much for this! I think the UX can be simplified quite a bit by just making the rollback strategy match the rollout strategy. Also, I recommend removing the MaxVersionAge configuration option. Instead, simply roll back to the previous WDV's build ID. I really don't see a use case for configuring a rollback target version of anything other than the previous version...
| // not affected by the rollback. Only new workflow executions will be routed to the rollback | ||
| // target version. |
There was a problem hiding this comment.
| // not affected by the rollback. Only new workflow executions will be routed to the rollback | |
| // target version. | |
| // not affected by the rollback. Only new workflow executions that are not pinned | |
| // to a specific build ID will be routed to the rollback target version. |
| // not affected by the rollback. Only new workflow executions will be routed to the rollback | ||
| // target version. | ||
| // | ||
| // Rollback is suppressed when the rollout strategy is Manual. |
There was a problem hiding this comment.
I'm just curious, but why is rollback suppressed when the rollout stategy is Manual?
| // +optional | ||
| Steps []RolloutStep `json:"steps,omitempty"` | ||
|
|
||
| // MaxVersionAge limits which versions are eligible as rollback targets. |
There was a problem hiding this comment.
I have a suspicion this is overly complicated configuration for most users. Would a simpler "just roll back to the previous working (i.e. passes gates) version" behaviour be easier for users to grok? Where the controller simply would attempt rolling back to N-1, then N-2, etc... until landing on a working version?
| type RollbackStrategy struct { | ||
| // Strategy for rollback. Valid values are "AllAtOnce" or "Progressive". | ||
| // Defaults to "AllAtOnce" for fast recovery. | ||
| Strategy VersionRollbackStrategy `json:"strategy"` |
There was a problem hiding this comment.
Is there really a valid reason or use case for having a rollback strategy not match the rollout strategy? Why not just have the rollback strategy always be the same as rollout?
| func warnRollbackSlowerThanRollout(rollout RolloutStrategy, rollback RollbackStrategy) admission.Warnings { | ||
| switch rollout.Strategy { | ||
| case UpdateAllAtOnce: | ||
| if rollback.Strategy != RollbackAllAtOnce { | ||
| return admission.Warnings{"rollback strategy is slower than rollout: rollout is AllAtOnce, but rollback is Progressive — is that intended?"} | ||
| } | ||
| case UpdateProgressive: | ||
| if rollback.Strategy == RollbackProgressive { | ||
| var rolloutTotal, rollbackTotal time.Duration | ||
| for _, s := range rollout.Steps { | ||
| rolloutTotal += s.PauseDuration.Duration | ||
| } | ||
| for _, s := range rollback.Steps { | ||
| rollbackTotal += s.PauseDuration.Duration | ||
| } | ||
| if rollbackTotal > rolloutTotal { | ||
| return admission.Warnings{fmt.Sprintf( | ||
| "rollback strategy is slower than rollout: progressive rollback total duration (%s) exceeds progressive rollout total duration (%s) — is that intended?", | ||
| rollbackTotal, rolloutTotal, | ||
| )} | ||
| } | ||
| } | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
see my comment above... I suspect there's not really a valid use case for having a rollback strategy not match the rollout strategy... if we can just say the rollback == rollout, we can simplify things quite a bit and remove the need for these warnings.
What was changed
Add a separate rollback config to the deployment specification that can be used to have different rollback behavior from the rollout one. It supports AllAtOnce or Progressive and does not have a gate. Uses the LastCurrentTime of a deployment to decide if the target version supposed to be rolled out or rolled back.
Why?
Open feature issue: #126
Checklist
Closes [Feature Request] Rollback mode #126
How was this tested:
Unit and integration tests (ongoing actual test with local setup).
Yes, but needs to be decided where.