Skip to content

🐛 Fix machine loop block when backing node does not exist#1109

Open
viragvoros wants to merge 3 commits into
gardener:masterfrom
viragvoros:fix/preservation-missing-node-stall
Open

🐛 Fix machine loop block when backing node does not exist#1109
viragvoros wants to merge 3 commits into
gardener:masterfrom
viragvoros:fix/preservation-missing-node-stall

Conversation

@viragvoros

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:
This PR fixes a bug in the machine preservation logic (manageMachinePreservation). Currently, the controller returns an error if the backing Node object does not exist yet in the cluster.
When a machine is newly created or replacing an old instance, it takes time for the node to register. Returning an error here completely stops the machine's main update loop. This prevents the machine from changing its status phase, which causes downstream components to miss other checks.
By ignoring the NotFound error, the controller can assume there are no node annotations yet and continue updating the machine normally.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
I added a new test case to the unit test table. It checks that when a node is missing from the cluster, the controller ignores the error and returns a normal LongRetry instead of failing.

Release note:

bugfix operator
Fixed a bug where a missing Node object stopped the machine controller loop from running, causing machine status phase updates to get stuck.

@viragvoros viragvoros requested a review from a team as a code owner June 23, 2026 12:32
@gardener-prow gardener-prow Bot added do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 23, 2026
@gardener-prow

gardener-prow Bot commented Jun 23, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign gagan16k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow Bot added the cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. label Jun 23, 2026
@gagan16k

gagan16k commented Jun 24, 2026

Copy link
Copy Markdown
Member

Was this guard intended to be on uncordonNodeIfCordoned(), as getNodePreserveAnnotation()already has a check for IsNotFound errors in the function?

We are aware of that particular case, and are working on a fix for it. If this PR is intended to solve the same bug, we would prefer to close this in favor of the other one.
Refer to #1110 for tracking
Thanks!

@Kumm-Kai

Copy link
Copy Markdown
Contributor

@gagan16k

If this PR is intended to solve the same bug, we would prefer to close this in favor of the other one.

What/where is the "other one" PR?

@thiyyakat

Copy link
Copy Markdown
Member

@Kumm-Kai : The PR was raised only recently: #1111. @gagan16k was unable to provide a link to it earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants