-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployment Issue: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "spin" is configured #289
Comments
Hi @chokosabe, I'm not experienced with Rocky Linux but I do have a working KinD cluster and the one thing that stands out to me is the difference between /etc/containerd/config.toml entries. For my cluster, I have:
And, like you, Can you point me to the steps you followed to install SpinKube? Was it https://www.spinkube.dev/docs/install/installing-with-helm/? The guides on the SpinKube docs site use the node-installer image from spinkube/containerd-shim-spin, eg |
Hi @vdice , thanks for the reply. I've gone into the boxes and made the changes you outlined above. I think Rocky Linux and Containerd dont work well together straight out of the box and that affected the containerd service. With the changes having been made, is there any way to test that the shim is callable? Ideally the install script would include this. |
One test is to call it by its path when on the node:
So, after updating |
Hi, Yes I can see it when called on the node as well (cped the binary to /usr/bin since all the other shims are there): [root@staging-master-01 ~]# /usr/bin/containerd-shim-spin-v2 -v I am still getting the error: "Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "spin" is configured" I'm going to try removing the runtimeclass and spin-operator and re-installing again |
You could also use |
Yep - Still getting the error. Deleted the Runtime class and Spin-Operator and then reinstalled. Exact same Error - i.e no change. It'd be great to know at which point the error is generated. i.e what the request to the node looks like to generate the error. For a test using ctr is there a public spin image that we could pull down. I can try with standard docker images but those will clearly fail |
Can you try the hello-world sample app? On my (kind) node, I did have to
|
Image Pull and ctr run both worked fine:
|
Its like the nodes are unreachable by whatever is generating the error - hence the need to recreate how the shim is called by what I guess is the RuntimeClass |
The executor in the scaffolded SpinApp is called: executor: containerd-shim-spin Dont know if that might be having an effect. |
That's fine/correct 👍 Are you able to tail the containerd logs when you create the SpinApp? Any errors/hints? Or is it not logging at all, i.e. not getting invoked? |
Good idea! ctr not being called at all - (the dead shim message is from me closing down the hello world).
|
Think might have found the culprit. The cluster is running rke2. trying the helloworld example:
These are the logs - same message we see
|
I tried the deploy, using the hello world image. Same error still coming through.
|
Nice find! Okay, so the other containerd process must be using a config.toml from a different location, right? It seems like the node-installer script already has logic for rke2 here and here. I wonder where it is breaking down... |
Hi @vdice, yeah #289 was me testing the rke2 ctr binary. Locally it can run the helloworld example. Same as the main containerd setup. Restarted the service and tried again. Results below - basically the same thing.
I think the key to this is why would a ctr call work locally but (assuming its being called properly) it can't be called by the RuntimeClass. |
@chokosabe just to triple-check,
|
Yes |
Going back, I think this might be the issue. The ctr binary managed by rke2 can pull the image down (with an error) but cant seem to run it. First sight it looks like image was corrupted getting pulled down but its not. Might be this bug: When trying to run the (spin) image thats been pulled down, you get the same error: ctr: mismatched image rootfs and manifest layers
|
I just tried on a fresh ubuntu server, and it seems to have worked for me. Here are the steps I took:
Here is my config.toml from the system:
I am keeping my machine around incase you want to double check any config on this working instance. |
I am also going to try with Rocky Linux next, and will report back. |
this is strange, it works for me on rocky-linux too. I apologize in advance if I missed any of the required config to reproduce this issue.
here is what my config.toml looks like: [root@rocky-4gb-nbg1-2 /]# cat ./var/lib/rancher/rke2/agent/etc/containerd/config.toml
# File generated by rke2. DO NOT EDIT. Use config.toml.tmpl instead.
version = 2
[plugins."io.containerd.internal.v1.opt"]
path = "/var/lib/rancher/rke2/agent/containerd"
[plugins."io.containerd.grpc.v1.cri"]
stream_server_address = "127.0.0.1"
stream_server_port = "10010"
enable_selinux = true
enable_unprivileged_ports = true
enable_unprivileged_icmp = true
sandbox_image = "index.docker.io/rancher/mirrored-pause:3.6"
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "overlayfs"
disable_snapshot_annotations = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/var/lib/rancher/rke2/agent/etc/containerd/certs.d"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.spin]
runtime_type = "/opt/kwasm/bin/containerd-shim-spin-v2"
|
Upgrading the version of rke2 on the cluster to try this - thanks. To be clear, in both cases you disabled the RKE2 agent? Thanks. |
yes, because I was trying this in the standalone mode. Also happy to jump on a call to compare notes if that is easier. |
could you check the containerd version on the system where this is not working? We need a relatively new version of containerd for this to work. specifically:
|
Many many thanks for all the help on this. Finally got this to a spot where its now working. Can confirm the initial issues were down to using an older version of rke2 (v1.25). Fixed by wiping everything and rebuilding using rke2 version 1.28. There were a couple of hiccups around applying the kwasm helm chart. Had to run it again after installing the Spin Operator helm chart (but this is a separate issue). The only thing I'd add is The different scripts we currently have to run can probably be condensed into 2 simple helm charts. Also maybe some tips on debugging if things don't work straight away being added to the install instructions. Again, many thanks. |
Thank you for the feedback. That's good information that can help us improve the docs. RE: consolidation, we have a ticket in the docs repo tracking this: spinkube/documentation#122 If you have specific suggestions on how to troubleshoot this situation, I'll see if I can contribute updates to the QuickStart guide. |
Trying to deply a SpinApp to a k3s cluster running on Rocky Linux nodes.
Deployed this SpinApp
Started getting this Error:
I assumed it was related to this and applied the change:
deislabs/containerd-wasm-shims#165
But issue still persisted.
At this point just trying to find ways to debug this. i.e What should the /etc/containerd/config.toml look like?
After applying the fix above, should I have re-installed The runtimeclassManager or the spin Operator again?
Also, what is the check that is being run that generates the error:
For reference, entry to /etc/containerd/config.toml looks like this
Failing all that, is there a Safe distro for kubernetes nodes that Spinkube works well with. Noticed that the shim is installed to
Which is not on the $PATH
The text was updated successfully, but these errors were encountered: