Make e2e ConfigureWithCleanup retry outlast a calico-apiserver restart#13044
Open
caseydavenport wants to merge 1 commit into
Open
Make e2e ConfigureWithCleanup retry outlast a calico-apiserver restart#13044caseydavenport wants to merge 1 commit into
caseydavenport wants to merge 1 commit into
Conversation
The retry uses a fixed 4s interval spanning ~2 minutes so a rolling calico-apiserver is ridden out. A capped exponential backoff stops early: wait.Backoff zeroes Steps once Duration*Factor exceeds Cap, so the prior budget gave up after ~13s.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
e2e specs that tune cluster config through ConfigureWithCleanup intermittently fail in a [BeforeEach] with "the server is currently unable to handle the request" when calico-apiserver is rolling. The helper already retries that 503, but the budget gave up after ~13s, well short of a real apiserver restart.
The retry budget had a bug: wait.Backoff zeroes its remaining Steps the moment Duration*Factor exceeds Cap, so the "8 steps / 10s cap" exponential backoff actually quit after ~7 attempts over ~13s. This switches to a fixed 4s interval spanning ~2 minutes so the full budget is spent, and adds a test that pins the budget so it can't silently shrink again.
This only covers the ConfigureWithCleanup path. The same apiserver-outage window also breaks specs that hit the v3 API via discovery/RESTMapper or via Voltron; hardening those is a follow-up.