Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a webhook to prevent eviction of pods on kosmos NotReady nodes #343

Open
wangyizhi1 opened this issue Dec 22, 2023 · 0 comments · May be fixed by #342
Open

Add a webhook to prevent eviction of pods on kosmos NotReady nodes #343

wangyizhi1 opened this issue Dec 22, 2023 · 0 comments · May be fixed by #342

Comments

@wangyizhi1
Copy link
Member

What would you like to be added:
Add a webhook to prevent eviction of pods on kosmos NotReady nodes

Why is this needed:
When the kosmos node remains not ready for more than 5 minutes, the node-controller of the controller-manager initiates eviction, which is equivalent to deleting pods. However, this approach may not always be appropriate because when the cluster reconnects, it leads to pod restarts.

The NotReady state of a node is more likely due to a kosmos service outage or cross-cluster network issues rather than a physical node failure. Therefore, there is a need for a mechanism to prevent the node-controller from deleting pods.

Since deletion is irreversible, one proposed solution is to intercept the pod deletion operation for the system:serviceaccount:kube-system:node-controller. Certain conditions need to be met before interception, such as utils.IsKosmosNode(node) && utils.IsNotReady(node) && v.needToPrevent(req.UserInfo.Username).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant