Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validating webhook times out #1054

Open
2 tasks done
twuyts opened this issue Aug 29, 2023 · 8 comments
Open
2 tasks done

Validating webhook times out #1054

twuyts opened this issue Aug 29, 2023 · 8 comments

Comments

@twuyts
Copy link

twuyts commented Aug 29, 2023

Description

After upgrading from koperator v0.23.1 to v0.25.1, the validating webhook for kafkaclusters fails:

{"level":"info","ts":"2023-08-29T14:33:08.513Z","msg":"Internal error occurred: failed calling webhook \"kafkaclusters.kafka.banzaicloud.io\": failed to call webhook: Post \"https://kafka-operator-operator.kafka.svc:443/validate-kafka-banzaicloud-io-v1beta1-kafkacluster?timeout=10s\": context deadline exceeded","controller":"KafkaCluster","controllerGroup":"kafka.banzaicloud.io","controllerKind":"KafkaCluster","KafkaCluster":{"name":"tt","namespace":"kafka"},"namespace":"kafka","name":"tt","reconcileID":"e000ec40-3da0-4f98-b202-d62126d22a10"}

The same error is thrown when manually updating a kafkacluster resource

Expected Behavior

The error should not occur.

Actual Behavior

A timeout error is thrown.

Affected Version

v0.25.1

Steps to Reproduce

  1. kubectl -n kafka apply -f config/samples/simplekafkacluster.yaml

Error from server (InternalError): error when creating "tmp/cluster.yaml": Internal error occurred: failed calling webhook "kafkaclusters.kafka.banzaicloud.io": failed to call webhook: Post "https://kafka-operator-operator.kafka.svc:443/validate-kafka-banzaicloud-io-v1beta1-kafkacluster?timeout=10s": context deadline exceeded

  • I've checked the webhooks config, and that looks fine:
❯ kubectl describe validatingwebhookconfigurations/kafka-operator-validating-webhook
Name:         kafka-operator-validating-webhook
Namespace:
Labels:       app.kubernetes.io/component=webhook
              app.kubernetes.io/instance=kafka-operator
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=kafka-operator
              app.kubernetes.io/version=v0.25.1
              helm.sh/chart=kafka-operator-0.25.1
              helm.toolkit.fluxcd.io/name=kafka-operator
              helm.toolkit.fluxcd.io/namespace=flux-system
Annotations:  meta.helm.sh/release-name: kafka-operator
              meta.helm.sh/release-namespace: kafka
API Version:  admissionregistration.k8s.io/v1
Kind:         ValidatingWebhookConfiguration
Metadata:
  Creation Timestamp:  2023-08-29T14:31:26Z
  Generation:          1
  Managed Fields:
    API Version:  admissionregistration.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:meta.helm.sh/release-name:
          f:meta.helm.sh/release-namespace:
        f:labels:
          .:
          f:app.kubernetes.io/component:
          f:app.kubernetes.io/instance:
          f:app.kubernetes.io/managed-by:
          f:app.kubernetes.io/name:
          f:app.kubernetes.io/version:
          f:helm.sh/chart:
          f:helm.toolkit.fluxcd.io/name:
          f:helm.toolkit.fluxcd.io/namespace:
      f:webhooks:
        .:
        k:{"name":"kafkaclusters.kafka.banzaicloud.io"}:
          .:
          f:admissionReviewVersions:
          f:clientConfig:
            .:
            f:caBundle:
            f:service:
              .:
              f:name:
              f:namespace:
              f:path:
              f:port:
          f:failurePolicy:
          f:matchPolicy:
          f:name:
          f:namespaceSelector:
          f:objectSelector:
          f:rules:
          f:sideEffects:
          f:timeoutSeconds:
        k:{"name":"kafkatopics.kafka.banzaicloud.io"}:
          .:
          f:admissionReviewVersions:
          f:clientConfig:
            .:
            f:caBundle:
            f:service:
              .:
              f:name:
              f:namespace:
              f:path:
              f:port:
          f:failurePolicy:
          f:matchPolicy:
          f:name:
          f:namespaceSelector:
          f:objectSelector:
          f:rules:
          f:sideEffects:
          f:timeoutSeconds:
    Manager:         helm-controller
    Operation:       Update
    Time:            2023-08-29T14:31:26Z
  Resource Version:  2278161
  UID:               ee7b82ba-9cb8-4ea6-b975-235be18958d6
Webhooks:
  Admission Review Versions:
    v1
  Client Config:
    Ca Bundle:  LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURJekNDQWd1Z0F3SUJBZ0lRZGQ3N0pmNFlPV3FiVEk2aEFITDQ5ekFOQmdrcWhraUc5dzBCQVFzRkFEQWMKTVJvd0dBWURWUVFERXhGcllXWnJZUzF2Y0dWeVlYUnZjaTFqWVRBZUZ3MHlNekE0TWpreE5ETXhNak5hRncwegpNekE0TWpZeE5ETXhNak5hTUJ3eEdqQVlCZ05WQkFNVEVXdGhabXRoTFc5d1pYSmhkRzl5TFdOaE1JSUJJakFOCkJna3Foa2lHOXcwQkFRRUZBQU9DQVE4QU1JSUJDZ0tDQVFFQTFEeDJiSzY4ZmR4Q0NLTFJRYnZaYWd4QmV1ZTQKNUhqSHdzMWRIWnp0bng1UEN2MlZRalhKNjJXU1hSbGxUdVEydWxwMDJaYTBIWFJaN3lUMWxPd3RueUVqUml4ZwpOcG9YelkzMzlCTnR6cTBTaHJueHFLWHVLbzh5Qk9rVU90MVdtbW1XUTExN1o3NzJyV1doc2NDSm1SNmE4U1FyCkpBd1NET21HUmhJTnVBZUpKakVzTTFKS3NWU3BjUTBjZkxvM0E4QjhFVURXQm5MNTlySXBKaUdGVnpYQ3gzNzUKeVh5ZUFMcnVhdkZERUE3RjRvcWl5dE5BWHRwQXI4ZzJDbTVLcDFxYUFNdDZMaElwYUVsek54eDlCL0g2QWYrcAo5OU12VTltVmNHY1B1RDljWlEyaGtCb2ZpVkdZZUZqNVJLU3Z5cmVkYzE3KzZRVXVkM3MwMWsrQXlRSURBUUFCCm8yRXdYekFPQmdOVkhROEJBZjhFQkFNQ0FxUXdIUVlEVlIwbEJCWXdGQVlJS3dZQkJRVUhBd0VHQ0NzR0FRVUYKQndNQ01BOEdBMVVkRXdFQi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZKVERHUmYzSVQvdDA2Z3BQTFhmMzNlQwpKVFQyTUEwR0NTcUdTSWIzRFFFQkN3VUFBNElCQVFDaWhYMzZrcjNrb01KVXIzaGhjUjZoaXM4VHlESTVNaGUxCjVjRnB3bHYzV2sybWxmYU9kWXRRbzV6Ykd2eDJoVjcyZFl1WlM0S3Y2anMzTHJLVnBKR3ZBN3BCTXpkeHlpYjgKcE8rWkRaU0NjSzF6OVpmWTN4TXI3SXZHUWZXdWZnS1FnSStlY1ZsUzBUL3EwbndqZ2tleUtCSXg0VWtCd3Izdgp2dHlzRHdPU2JkZjFXQWRVOVVQTEJnVlUwai9wU3k1aUpyQkhMWEppWkc0RllXRnNUZ080dWgwQ0VONDB0QlQ4CkVMSHZlSTZ6UGVoTW5mejljdVVWTzVrQ2tGZFdyQ25renQxSkc1NVJqK2ZkaW5oMHozaEJqQlJhRSt0MmpnaTMKcFRwOFdKOXZ2NFdiQSs2MlY2cUlKaUkrOGMwSCtUUjRYTkdyakZkaGIzRTFhRGNYekg5agotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    Service:
      Name:        kafka-operator-operator
      Namespace:   kafka
      Path:        /validate-kafka-banzaicloud-io-v1alpha1-kafkatopic
      Port:        443
  Failure Policy:  Fail
  Match Policy:    Equivalent
  Name:            kafkatopics.kafka.banzaicloud.io
  Namespace Selector:
  Object Selector:
  Rules:
    API Groups:
      kafka.banzaicloud.io
    API Versions:
      v1alpha1
    Operations:
      CREATE
      UPDATE
    Resources:
      kafkatopics
    Scope:          *
  Side Effects:     None
  Timeout Seconds:  10
  Admission Review Versions:
    v1
  Client Config:
    Ca Bundle:  LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURJekNDQWd1Z0F3SUJBZ0lRZGQ3N0pmNFlPV3FiVEk2aEFITDQ5ekFOQmdrcWhraUc5dzBCQVFzRkFEQWMKTVJvd0dBWURWUVFERXhGcllXWnJZUzF2Y0dWeVlYUnZjaTFqWVRBZUZ3MHlNekE0TWpreE5ETXhNak5hRncwegpNekE0TWpZeE5ETXhNak5hTUJ3eEdqQVlCZ05WQkFNVEVXdGhabXRoTFc5d1pYSmhkRzl5TFdOaE1JSUJJakFOCkJna3Foa2lHOXcwQkFRRUZBQU9DQVE4QU1JSUJDZ0tDQVFFQTFEeDJiSzY4ZmR4Q0NLTFJRYnZaYWd4QmV1ZTQKNUhqSHdzMWRIWnp0bng1UEN2MlZRalhKNjJXU1hSbGxUdVEydWxwMDJaYTBIWFJaN3lUMWxPd3RueUVqUml4ZwpOcG9YelkzMzlCTnR6cTBTaHJueHFLWHVLbzh5Qk9rVU90MVdtbW1XUTExN1o3NzJyV1doc2NDSm1SNmE4U1FyCkpBd1NET21HUmhJTnVBZUpKakVzTTFKS3NWU3BjUTBjZkxvM0E4QjhFVURXQm5MNTlySXBKaUdGVnpYQ3gzNzUKeVh5ZUFMcnVhdkZERUE3RjRvcWl5dE5BWHRwQXI4ZzJDbTVLcDFxYUFNdDZMaElwYUVsek54eDlCL0g2QWYrcAo5OU12VTltVmNHY1B1RDljWlEyaGtCb2ZpVkdZZUZqNVJLU3Z5cmVkYzE3KzZRVXVkM3MwMWsrQXlRSURBUUFCCm8yRXdYekFPQmdOVkhROEJBZjhFQkFNQ0FxUXdIUVlEVlIwbEJCWXdGQVlJS3dZQkJRVUhBd0VHQ0NzR0FRVUYKQndNQ01BOEdBMVVkRXdFQi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZKVERHUmYzSVQvdDA2Z3BQTFhmMzNlQwpKVFQyTUEwR0NTcUdTSWIzRFFFQkN3VUFBNElCQVFDaWhYMzZrcjNrb01KVXIzaGhjUjZoaXM4VHlESTVNaGUxCjVjRnB3bHYzV2sybWxmYU9kWXRRbzV6Ykd2eDJoVjcyZFl1WlM0S3Y2anMzTHJLVnBKR3ZBN3BCTXpkeHlpYjgKcE8rWkRaU0NjSzF6OVpmWTN4TXI3SXZHUWZXdWZnS1FnSStlY1ZsUzBUL3EwbndqZ2tleUtCSXg0VWtCd3Izdgp2dHlzRHdPU2JkZjFXQWRVOVVQTEJnVlUwai9wU3k1aUpyQkhMWEppWkc0RllXRnNUZ080dWgwQ0VONDB0QlQ4CkVMSHZlSTZ6UGVoTW5mejljdVVWTzVrQ2tGZFdyQ25renQxSkc1NVJqK2ZkaW5oMHozaEJqQlJhRSt0MmpnaTMKcFRwOFdKOXZ2NFdiQSs2MlY2cUlKaUkrOGMwSCtUUjRYTkdyakZkaGIzRTFhRGNYekg5agotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    Service:
      Name:        kafka-operator-operator
      Namespace:   kafka
      Path:        /validate-kafka-banzaicloud-io-v1beta1-kafkacluster
      Port:        443
  Failure Policy:  Fail
  Match Policy:    Equivalent
  Name:            kafkaclusters.kafka.banzaicloud.io
  Namespace Selector:
  Object Selector:
  Rules:
    API Groups:
      kafka.banzaicloud.io
    API Versions:
      v1beta1
    Operations:
      CREATE
      UPDATE
    Resources:
      kafkaclusters
    Scope:          *
  Side Effects:     None
  Timeout Seconds:  10
Events:             <none>
  • I tried to call the webhook from within a debug container attached to the koperator pod, using the CA shown above:
❯ kubectl debug -it --image=redhat/ubi8 --target manager kafka-operator-operator-584c6df4b7-hd7sl -n kafka -- bash
Targeting container "manager". If you don't see processes from this container it may be because the container runtime doesn't support this feature.
Defaulting debug container name to debugger-9z59f.
If you don't see a command prompt, try pressing enter.
[root@kafka-operator-operator-584c6df4b7-hd7sl /]# curl --cacert /tmp/ca.crt https://kafka-operator-operator.kafka.svc:443/validate-kafka-banzaicloud-io-v1beta1-kafkacluster?timeout=10s
{"response":{"uid":"","allowed":false,"status":{"metadata":{},"message":"contentType=, expected application/json","code":400}}}
  • for the time being, I disabled the webhook in the helm chart, so I am not blocked.

Checklist

@twuyts twuyts changed the title Admission webhook times out Validating webhook times out Aug 29, 2023
@panyuenlau
Copy link
Member

panyuenlau commented Aug 29, 2023

@twuyts - The default webhook server port has been changed from 443 to 9443 in the Koperator implementation by #912, therefore, to successfully upgrade it to the latest version, the webhook port in the validatingwebhookconfigurations/kafka-operator-validating-webhook should be updated accordingly.

@twuyts
Copy link
Author

twuyts commented Aug 29, 2023

issue in the helm chart then?

@twuyts
Copy link
Author

twuyts commented Aug 29, 2023

issue in the helm chart then?

No scratch that. I spoke too soon.

@panyuenlau
Copy link
Member

panyuenlau commented Aug 29, 2023

@twuyts - The default webhook server port has been changed from 443 to 9443 in the Koperator implementation by #912, therefore, to successfully upgrade it to the latest version, the webhook port in the validatingwebhookconfigurations/kafka-operator-validating-webhook should be updated accordingly.

My bad - I just took a closer look into the deployment manifests, and it looks like we only changed the target port of the webhook server, and the validatingwebhookconfigurations uses the service port (443) that you linked to send the requests to the webhook server. So the changed in #912 shouldn't cause the issue.

@panyuenlau
Copy link
Member

panyuenlau commented Aug 29, 2023

@twuyts Can you provide the steps that you took to perform the upgrade? I can try and see if I can reproduce the issue.

edit: I was suspecting you might need to manually update the Service so the request can go to the corresponding named targetPort webhook-server

@twuyts
Copy link
Author

twuyts commented Aug 31, 2023

The upgrade is managed through the helm-controller, part of Flux, the solution we use for continuous delivery.
Basically what we did was update the kubernetes manifest for the koperator CRDs, and bump the version of the helm chart from v0.24.1 to v0.25.1 in our git repository. This is picked up automatically by the the Flux helmcontroller running on the k8s cluster, which then does the upgrade.
Unfortunately, I have no idea on exactly how the helmcontroller does that.

@vitalii-buchyn-exa
Copy link

vitalii-buchyn-exa commented Oct 4, 2023

Seeing a similar errors after upgrading to v0.25.1

We use helm chart:

NAME                                             	CHART VERSION	APP VERSION
banzaicloud-stable/kafka-operator                	0.25.1       	v0.25.1

We have an istio-proxy sidecar in operator pod.
istio-proxy version: banzaicloud istio-proxyv2:1.15.0

Operator logs have entries like:

{"level":"error","ts":"2023-10-04T10:06:44.405Z","msg":"Reconciler error","controller":"KafkaCluster","controllerGroup":"kafka.banzaicloud.io","controllerKind":"KafkaCluster","KafkaCluster":{"name":"sample-svc-kafka","namespace":"cloud"},"namespace":"cloud","name":"sample-svc-kafka","reconcileID":"a31ff20c-91bf-4e85-a71a-85a0d1d57917","error":"Internal error occurred: failed calling webhook \"kafkaclusters.kafka.banzaicloud.io\": failed to call webhook: Post \"https://kafka-operator-operator.kafka.svc:443/validate-kafka-banzaicloud-io-v1beta1-kafkacluster?timeout=10s\": context deadline exceeded","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235"}

For us, using PERMISSIVE mode is not acceptable, only STRICT, but that doesn't seem an issue,
because a connection to a webhook from any other pod (with istio) is successful:

~ $ curl -k -XPOST -vvv https://kafka-operator-operator.kafka.svc:443/validate-kafka-banzaicloud-io-v1beta1-kafkacluster?timeout=10s
* processing: https://kafka-operator-operator.kafka.svc:443/validate-kafka-banzaicloud-io-v1beta1-kafkacluster?timeout=10s
*   Trying 10.132.41.140:443...
* Connected to kafka-operator-operator.kafka.svc (10.132.41.140) port 443
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=kafka-operator-operator.kafka.svc
*  start date: Aug 28 10:40:28 2023 GMT
*  expire date: Aug 27 10:40:28 2024 GMT
*  issuer: CN=kafka-operator-ca
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* using HTTP/2
* h2 [:method: POST]
* h2 [:scheme: https]
* h2 [:authority: kafka-operator-operator.kafka.svc]
* h2 [:path: /validate-kafka-banzaicloud-io-v1beta1-kafkacluster?timeout=10s]
* h2 [user-agent: curl/8.2.1]
* h2 [accept: */*]
* Using Stream ID: 1
> POST /validate-kafka-banzaicloud-io-v1beta1-kafkacluster?timeout=10s HTTP/2
> Host: kafka-operator-operator.kafka.svc
> User-Agent: curl/8.2.1
> Accept: */*
>
< HTTP/2 200
< content-type: text/plain; charset=utf-8
< content-length: 128
< date: Wed, 04 Oct 2023 11:58:49 GMT
<
{"response":{"uid":"","allowed":false,"status":{"metadata":{},"message":"contentType=, expected application/json","code":400}}}
* Connection #0 to host kafka-operator-operator.kafka.svc left intact

So doesn't seem to be an issue like istio/istio#39290

@vitalii-buchyn-exa
Copy link

the same reconcile error with PERMISSIVE mode

Pod: kafka-operator-548fbb9fd4-vgdbt
   Pod Revision: asm-managed
   Pod Ports: 15090 (istio-proxy), 8443 (kube-rbac-proxy), 9443 (manager), 8080 (manager), 9001 (manager)
--------------------
Service: kafka-operator-alertmanager
   Port: http-alerts 9001/HTTP targets pod port 9001
--------------------
Service: kafka-operator-authproxy
   Port: https 8443/HTTPS targets pod port 8443
--------------------
Service: kafka-operator-operator
   Port: https 443/HTTPS targets pod port 9443
--------------------
Effective PeerAuthentication:
   Workload mTLS mode: PERMISSIVE

tried also to exclude 9443 port:

traffic.sidecar.istio.io/excludeOutboundPorts: "9443"
traffic.sidecar.istio.io/excludeInboundPorts: "9443"

no luck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants