Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need to be able to control Machine.spec.nodeDeletionTimeout #574

Open
tmmorin opened this issue Feb 18, 2025 · 1 comment
Open

need to be able to control Machine.spec.nodeDeletionTimeout #574

tmmorin opened this issue Feb 18, 2025 · 1 comment
Labels
kind/feature New feature or request needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@tmmorin
Copy link

tmmorin commented Feb 18, 2025

Describe the solution you'd like:

It is important to be able to set Machine.spec.nodeDeletionTimeout to zero in some cases (see below), but contrarily to KubeadmControlPlane, which offers KubeadmControlPlane.spec.machineTemplate.nodeDeletionTimeout, RKE2ControlPlane does not allow controlling this value.

Why do you want this feature:

A specific case where it is important to set Machine.spec.nodeDeletionTimeout to zero is the following:

  • consider a deployment where node names are reused during node rolling updates (eg. Longhorn/baremetal deployment require this so that the node identity does not change during a node rolling update and on disk-data can be preserved/reused)
  • when a Node is deleted, this will be followed by the deletion of the Machine and the subsequent recreation on the same baremetal server of a Node having the same name
  • with the default value of Machine.spec.nodeDeletionTimeout (10s), if for some reason the first attempts at deleting the Node fail (e.g because k8s API is flaky at that time), no further attempt at deleting the Node object will be done
  • if this happens, then what should be a creation of a new Node object will not happen, and the newly created Machine with RKE2 will reuse the pre-existing Node object, which will result in a lot of side-effects because this object has stale information (in particular calico annotations, from the old node)

Details here about the bug as we hit it in Sylva: https://gitlab.com/sylva-projects/sylva-core/-/issues/1431

Anything else you would like to add:

Generally speaking, it would seem useful to have RKE2ControlPlane.spec.machineTemplate have all the fields that KubeadmControlPlane.spec.machineTemplate has, including nodeDeletionTimeout, but also nodeVolumeDetachTimeout.

@tmmorin tmmorin added kind/feature New feature or request needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 18, 2025
@tmmorin
Copy link
Author

tmmorin commented Feb 18, 2025

In the context of Sylva, as a workaround, we have a Kyverno policy in place to patch the Machine resources to fix this value. This is of course not a satisfying solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

1 participant