You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I install Keel using the latest Helm chart, it is repeatedly killed by kubelet because it does not respond to liveness probes (and also fails readiness probes).
Pod events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m57s default-scheduler Successfully assigned kube-system/keel-bd5775d8b-kwjwd to blackbox
Normal Pulled 5m54s kubelet Successfully pulled image "keelhq/keel:0.20.0" in 1.777s (1.777s including waiting). Image size: 59399804 bytes.
Normal Pulling 4m56s (x2 over 5m56s) kubelet Pulling image "keelhq/keel:0.20.0"
Normal Created 4m55s (x2 over 5m54s) kubelet Created container keel
Normal Pulled 4m55s kubelet Successfully pulled image "keelhq/keel:0.20.0" in 1.741s (1.741s including waiting). Image size: 59399804 bytes.
Normal Started 4m54s (x2 over 5m54s) kubelet Started container keel
Warning Unhealthy 3m57s (x6 over 5m17s) kubelet Liveness probe failed: Get "http://10.42.3.186:9300/healthz": dial tcp 10.42.3.186:9300: connect: connection refused
Normal Killing 3m57s (x2 over 4m57s) kubelet Container keel failed liveness probe, will be restarted
Warning Unhealthy 47s (x19 over 5m17s) kubelet Readiness probe failed: Get "http://10.42.3.186:9300/healthz": dial tcp 10.42.3.186:9300: connect: connection refused
helm repo add keel https://keel-hq.github.io/keel/
helm repo update
helm upgrade --install keel --namespace=kube-system keel/keel
Other notes
To check whether this could be caused by mystery broken networking on my node/cluster, I ran an nginx deployment with 1 replica and a similar liveness config as the Helm chart configures for Keel (probing /), and it started and ran successfully.
The text was updated successfully, but these errors were encountered:
@DrJosh9000running the exact same versions here without issues.
Can you try to manually remove the liveness probe in the deployment so that the pod keeps running, and then port forward 9300 to locally test the probe?
You should be getting something like:
Also try enabling the admin UI and see if you can connect to it (sample values.yaml for the helm chart):
I tried experimenting again. Enabling the admin UI didn't seem to be enough, but I noticed that it was hitting resource limits. Bumping resources helped it enough to start up quickly enough for the probe:
With these limits it used around 650m CPU starting up, and nearly all 128Mi of the request. I started looking for the reason why, and that's when I noticed that keel container image is only built for amd64 - it's probably too slow to start up with the default 100m CPU limit because I'm on ARM64, and have qemu-user-static + binfmt-support for transparent emulation!
What happens
When I install Keel using the latest Helm chart, it is repeatedly killed by kubelet because it does not respond to liveness probes (and also fails readiness probes).
Pod events:
Container log:
With debug logging enabled:
How to replicate
I ran the following:
Other notes
To check whether this could be caused by mystery broken networking on my node/cluster, I ran an nginx deployment with 1 replica and a similar liveness config as the Helm chart configures for Keel (probing
/
), and it started and ran successfully.The text was updated successfully, but these errors were encountered: