Pod eviction will cause service interruption #1674

andyblog · 2024-09-17T16:21:39Z

Description

Observed Behavior:

When all replicas of a Deployment are on the same node, for example, a deployment has 2 pods on this node, and the 2 pods are evicted when the node is terminated. From the time the 2 pods are evicted to the time the 2 pods are created and run successfully on the new node, the deployment has no pods to provide services.
This also happens when a Deployment has only one replica.

Expected Behavior:

During Evicting, a judgment will be made here. If all replicas of the Deployment are on this node, or the Deployment has only one replica, restarting the Deployment is more elegant than evicting. This operation will first create a pod on the new node, wait for the new pod to run successfully, and then terminate the old pod, which will reduce service interruption time.

Reproduction Steps (Please include YAML):

Versions:

Chart Version:
Kubernetes Version (kubectl version):

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-09-17T16:21:46Z

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

njtran · 2024-09-17T16:52:15Z

This operation will first create a pod on the new node, wait for the new pod to run successfully, and then terminate the old pod, which will reduce service interruption time.

As I understand it, this is how it currently works. Can you show reproduction steps to show what you're talking about?

andyblog · 2024-09-18T01:51:33Z

The current working method is:

Spot instance or ondemand instance is terminated for various reasons
Node starts to be deleted and finalizer method starts to be executed
Pods on the current node start to be evicted

I think when all replicas of the deployment are on this node, restarting is more elegant than evicting, because the service will not be interrupted during the restart

andyblog added the kind/bug Categorizes issue or PR as related to a bug. label Sep 17, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 17, 2024

andyblog mentioned this issue Sep 17, 2024

fix: Pod eviction will cause service interruption #1675

Closed

andyblog linked a pull request Sep 18, 2024 that will close this issue

fix: when all replicas of a deployment are on one node, restart the deployment instead of evicting it #1685

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod eviction will cause service interruption #1674

Pod eviction will cause service interruption #1674

andyblog commented Sep 17, 2024 •

edited

Loading

k8s-ci-robot commented Sep 17, 2024

njtran commented Sep 17, 2024

andyblog commented Sep 18, 2024

Pod eviction will cause service interruption #1674

Pod eviction will cause service interruption #1674

Comments

andyblog commented Sep 17, 2024 • edited Loading

Description

k8s-ci-robot commented Sep 17, 2024

njtran commented Sep 17, 2024

andyblog commented Sep 18, 2024

andyblog commented Sep 17, 2024 •

edited

Loading