OCPBUGS-90541: Don't set Progressing=False until all pods available#3034
OCPBUGS-90541: Don't set Progressing=False until all pods available#3034mdbooth wants to merge 1 commit into
Conversation
|
@mdbooth: This pull request references Jira Issue OCPBUGS-90541, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (core-networking-bot@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mdbooth The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughThe PR modifies ChangeshadState-aware rollout detection and tests
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (13 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Masterminds/semver@v1.5.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Masterminds/sprig/v3@v3.2.3: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/containernetworking/cni@v0.8.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/yaml@v1.0.1-0.20190212211648-25d852aebe32: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-bindata/go-bindata@v3.1.2+incompatible: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/gomega@v1.39.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ope ... [truncated 17357 characters] ... red in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/gengo/v2@v2.0.0-20251215205346-5ee0d033ba5b: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kms@v0.35.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kube-aggregator@v0.35.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v6@v6.3.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n" Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/controller/statusmanager/pod_status.go`:
- Around line 172-188: The condition on line 172 requires ReadyReplicas > 0 to
treat a StatefulSet as having an active rollout, but when hadState is true
(indicating tracked state from a previous rollout) and ReadyReplicas is 0 while
UpdatedReplicas equals Replicas, the StatefulSet falls out of the
rollout-in-progress path prematurely, clearing tracked state before pods
recover. Modify the condition to also handle the zero-ready availability case by
checking if we have tracked state or an active rollout regardless of whether
ReadyReplicas is greater than zero, ensuring that pending pods waiting to become
ready are properly tracked as progressing rather than being reported as
non-progressing.
In `@pkg/controller/statusmanager/status_manager_test.go`:
- Around line 2388-2390: The test file hardcodes the release version annotation
as "v1.0.0" while the rollout logic reads the actual RELEASE_VERSION environment
variable. This mismatch causes test assertions to fail when the environment
variable differs from the hardcoded value. Replace all hardcoded "v1.0.0" values
in the Annotations map with the "release.openshift.io/version" key at all
locations (lines 2388-2390, 2562-2564, 2653-2655, 2766-2768, and 2882-2884) by
retrieving the actual RELEASE_VERSION environment variable value using os.Getenv
and using that value instead. This ensures the test annotations match what the
rollout logic actually reads from the environment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4d9ebdf1-4112-4d54-a0f5-168c3667049c
📒 Files selected for processing (2)
pkg/controller/statusmanager/pod_status.gopkg/controller/statusmanager/status_manager_test.go
|
/retest-required |
|
Pods are crashlooping due to: |
We don't want the ClusterOperator to report Progressing=True whenever a Node reboots. However, we also don't want to report Progressing=False during a CNI rollout until all pods are available. This change ensures both are covered.
d343fbc to
6812a1e
Compare
|
@mdbooth: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
We don't want the ClusterOperator to report Progressing=True whenever a Node reboots. However, we also don't want to report Progressing=False during a CNI rollout until all pods are available. This change ensures both are covered.
Related to openshift/origin#31320
Confirmed with a multi-pr test that it fixes the origin test.
Summary by CodeRabbit
Bug Fixes
ProgressingandDegradedbehavior when pods are temporarily unavailable.Tests
StatusManagerrestarts.