-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EKS] [request]: Systemd upgrade to > v239 for enabling node graceful shutdown #2057
Comments
All these comments about "a known BUG in upstream Systemd" seems to be quite misleading. After spending some time trying to get this to work on Amazon Linux 2, I found the culprit that prevents systemd from doing anything:
So we just need a I am not sure how well the graceful node shutdown feature works yet, but any issue there shall be blamed on kubelet, not systemd. |
@sayap I am having trouble getting inhibitors to work with Amazon Linux 2 and found this thread. I was going to try this suggestion, but this package is not installed on my EKS worker node (and my What did you do to troubleshoot getting this to work on Amazon Linux 2 so that I might try out similar steps? |
I followed https://www.skouf.com/posts/enabling-graceful-node-shutdown-on-eks-in-kubernetes-1-21/ and added the following to the user data of the EC2 instance and was able to get graceful shutdown to work.
As @sayap has mentioned, it is probably not related to systemd. I am not a linux expert to understand what component is causing this though. |
I followed those steps as well and graceful shutdown is not working for me so I'm wondering what I'm missing. When I reboot the instance, the inhibitor does not block the shutdown and executes right away. |
@shivaprasad-balaji Indeed we need to follow https://www.skouf.com/posts/enabling-graceful-node-shutdown-on-eks-in-kubernetes-1-21/, to avoid triggering the if-block in https://github.com/kubernetes/kubernetes/blob/v1.23.17/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go#L182-L202, which would fail with error:
because @aberle You are right, the EKS worker node doesn't come with acpid, so my finding above was just a red herring, but it can probably explain why the k8s developer misattributed this as a systemd bug (kubernetes/kubernetes#107043 (comment)) Anyway, if it still doesn't work after following the blog post, can you check:
Note that the configured shutdown grace period is just the upperbound. As soon as kubelet has successfully terminated all the pods, it will remove the inhibitor and allow the system to shutdown. |
Just realize that the We can replace the echo "$(jq '.shutdownGracePeriod="60s" | .shutdownGracePeriodCriticalPods="20s"' /etc/kubernetes/kubelet/kubelet-config.json)" > /etc/kubernetes/kubelet/kubelet-config.json |
@shivaprasad-balaji I wonder why we need graceful node shutdown at all for Karpenter. Karpenter Termination Controller is calling the Kubernetes Eviction API and properly drain the node before terminating it via cloud provider. For "Instance Terminating Events" related to Spot instances Karpenter needs to be integrated with build-in Node Termination Handling which ensures a proper drainign as well. |
Out of all the options, I think only graceful node shutdown can gracefully terminate the normal pods first, and then gracefully terminate the daemonset pods. |
@youwalther65 : As per my understanding and also observation on how it works, karpenter drains all the non-daemonset pods on a node and then terminates the instance. However the node terminates immediately after the cloudprovider API is called and there is no inhibitor or wait, which waits for the daemonset pods to gracefully shutdown. Setting up the kubernetes graceful shutdown adds inhibitor to the node, so that the daemonsets are given sufficient time to gracefully shutdown. |
@shivaprasad-balaji I agree now, I wasn't completely aware of that. But there are still other problems in the kubelet itself where even graceful node shutdown wont help, see Graceful node shutdown doesn't wait for volume teardown #115148. |
I think this should be closed. |
@mikestef9 @cartermckinnon I'm not sure closing this issue without a replacement (#1651 is currently tagged for MNGs) to track the availability of graceful node shutdown in EKS is in the interest of the community. Could we either get EKS documentation on where/when graceful shutdown is supported, reopen this issue (maybe updated), or open a new issue to track graceful node shutdown? |
Community Note
Tell us about your request
What do you want us to build?
Upgrade systemd version in AWS optimized EKS AMI to version greater than v239
Which service(s) is this request for?
This is for EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.
We were looking into configuring graceful shutdown for kubernetes nodes. The feature is enabled by default from kubernetes version 1.21.(https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/ ). Our clusters are running v1.24.
After enabling it and configuring the ShutdownGracePeriod and ShutdownGracePeriodCriticalPods - using the kubelet configuration options, we see that the graceful shutdown is not working as expected. When karpenter(we use karpenter for cluster scaling) detects a node is empty, it terminates the node and the node is terminated immediately without any grace period.
We checked for the issue and we found out few references which indicate there is an issue with the systemd version on the node. We use the AWS EKS optimized linux AMI for the nodes and we see that the systemd version is v219.
As per the links below, it seems this is fixed after v239 of systemd.
Are you currently working around this issue?
How are you currently solving this problem?
No known workaround is known at this point.
Additional context
Anything else we should know?
Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)
The text was updated successfully, but these errors were encountered: