Skip to content

Startup probe failed: HTTP probe failed with statuscode: 500 #291

Open
@robcharlwood

Description

@robcharlwood

Hi

We are seeing a problem with the latest version of the rancher-webhook (0.3.5) when running alongside the latest rancher (2.7.6). In both the Rancher HA cluster and imported K3S and GKE downstream clusters, the webhook pod has a warning about startup probe checks failing with status code 500.

Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  15s               default-scheduler  Successfully assigned cattle-system/rancher-webhook-998454b77-nvch5 to <redacted>
  Normal   Pulled     14s               kubelet            Container image "rancher/rancher-webhook:v0.3.5" already present on machine
  Normal   Created    14s               kubelet            Created container rancher-webhook
  Normal   Started    14s               kubelet            Started container rancher-webhook
  Warning  Unhealthy  5s (x2 over 10s)  kubelet            Startup probe failed: HTTP probe failed with statuscode: 500

If left for long enough, it eventually starts failing with a liveness probe error:

Events:
  Type     Reason     Age                 From     Message
  ----     ------     ----                ----     -------
  Warning  Unhealthy  41m (x52 over 19h)  kubelet  Liveness probe failed: Get "https://XXX.XXX.XXX.XXX:9443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

This is only ever generated as a warning and the pod itself never becomes unhealthy. The pod itself also does not give any useful logs:

time="2023-09-13T10:22:52Z" level=info msg="Rancher-webhook version v0.3.5 (2e89c65) is starting"
time="2023-09-13T10:22:52Z" level=info msg="Active TLS secret cattle-system/cattle-webhook-tls (ver=5511970) (count 1): map[listener.cattle.io/cn-rancher-webhook.cattle-system.svc:rancher-webhook.cattle-system.svc listener.cattle.io/fingerprint:SHA1=XXXXXXXXXXXXXXXXXXXXXXXXXXXX]"
time="2023-09-13T10:22:52Z" level=info msg="Listening on :9443"
time="2023-09-13T10:22:52Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=ClusterRole controller"
time="2023-09-13T10:22:52Z" level=info msg="Starting management.cattle.io/v3, Kind=Cluster controller"
time="2023-09-13T10:22:52Z" level=info msg="Starting management.cattle.io/v3, Kind=ClusterRoleTemplateBinding controller"
time="2023-09-13T10:22:52Z" level=info msg="Starting management.cattle.io/v3, Kind=GlobalRole controller"
time="2023-09-13T10:22:52Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2023-09-13T10:22:52Z" level=info msg="Sleeping for 15 seconds then applying webhook config"
time="2023-09-13T10:22:52Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=RoleBinding controller"
time="2023-09-13T10:22:52Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding controller"
time="2023-09-13T10:22:52Z" level=info msg="Starting management.cattle.io/v3, Kind=PodSecurityAdmissionConfigurationTemplate controller"
time="2023-09-13T10:22:52Z" level=info msg="Starting provisioning.cattle.io/v1, Kind=Cluster controller"
time="2023-09-13T10:22:53Z" level=info msg="Starting management.cattle.io/v3, Kind=ProjectRoleTemplateBinding controller"
time="2023-09-13T10:22:53Z" level=info msg="Starting apiregistration.k8s.io/v1, Kind=APIService controller"
time="2023-09-13T10:22:53Z" level=info msg="Starting apiextensions.k8s.io/v1, Kind=CustomResourceDefinition controller"
time="2023-09-13T10:22:53Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=Role controller"
time="2023-09-13T10:22:53Z" level=info msg="Starting management.cattle.io/v3, Kind=RoleTemplate controller"
time="2023-09-13T10:22:53Z" level=info msg="Updating TLS secret for cattle-system/cattle-webhook-tls (count: 1): map[listener.cattle.io/cn-rancher-webhook.cattle-system.svc:rancher-webhook.cattle-system.svc listener.cattle.io/fingerprint:SHA1=XXXXXXXXXXXXXXXXXXXXXXXXXXXX]"

This rancher is deployed in the following manner:

  • Private GKE cluster running in Google Cloud with etc encryption using custom KMS key
  • Cluster is running 1.26.4-gke.500 of Kubernetes
  • We allow GKE control plane ingress to the webhook on port 9443 via TCP in our firewall rules as per the docs

Any help or advice on this issue would be appreciated.

Many thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions