Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug( containerd): Failed to create containerd task: failed to create shim task: OCI runtime create failed #2126

Open
Sandeepsac opened this issue Jan 27, 2025 · 7 comments
Labels
bug Something isn't working

Comments

@Sandeepsac
Copy link

Hi

What happened:

  1. When attempting to create a container task using containerd, the process fails with the following error:
    Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: unable to freeze: unknown
    Additionally, the sshd service on the node is repeatedly restarting. Logs show the following:
    SshdRepeatedRestart: Systemd unit "sshd.service" has restarted (NRestarts 526 -> 533)

Image

What you expected to happen:

1 The container should have been created successfully, and the specified command should have executed without errors.
2 .The sshd service should remain active and stable without repeated restarts in node

How to reproduce it (as minimally and precisely as possible):

Environment:

  • AWS Region: US-west-1
  • Instance Type(s): m5.large
  • Cluster Kubernetes version: v1.31.4-eks-2d5f260
  • Node Kubernetes version: v1.31.3-eks-59bf375
  • AMI Version: 1.31.3-20250103
@Sandeepsac Sandeepsac added the bug Something isn't working label Jan 27, 2025
@cartermckinnon
Copy link
Member

@Sandeepsac how do you reproduce this issue?

@Sandeepsac
Copy link
Author

@cartermckinnon , we getting in random node not for all nodes , for reference will share some error node events logs

Image

@venkatamutyala
Copy link

venkatamutyala commented Feb 3, 2025

Getting this issue as well with: v1.30.8-eks-aeac579 -1.30.8-20250116

3m16s       Warning   Failed                    pod/alpine                                               Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: unable to freeze: unknown
8m43s       Warning   FailedCreatePodSandBox    pod/debug-pod                                            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: unable to freeze: unknown

@venkatamutyala
Copy link

@cartermckinnon to recreate run a pod like this:

kubectl run -i --tty --rm debug-pod --image=alpine --restart=Never -- sh

@mselim00
Copy link
Contributor

mselim00 commented Feb 12, 2025

@Sandeepsac / @venkatamutyala can you open an AWS support case for this if the impact is ongoing? I tried to reproduce it with the provided information but could not. You also initially mentioned it happened on random nodes so it would be helpful to get a broader sense of the environment this is being run in.

@venkatamutyala
Copy link

@mselim00 we noticed this as well and aren't sure how this happened. I'll open a ticket and see what i can find. Maybe some upstream dependency was bad during our node provisioning?

@siolta
Copy link

siolta commented Feb 19, 2025

@venkatamutyala Do you know if the nodes running this are using containerd v1.7.23? There was a bug with leaking TTYs that was fixed in v1.7.25.

containerd/containerd#11161

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants