Explicitly set appProtocol for the webhook server service #465

thandleman-r7 · 2025-09-24T20:39:27Z

Why?

We are deploying the CloudZero Agent into multiple clusters that have an Istio service mesh deployed. We noticed that the backfill job was continuously throwing an error that the webhook API was not available:

Upon further investigation, we notice that Istio was throwing errors about traffic being redirected to the BlackholeCluster i.e. traffic was not being allowed to continue on to the Service. Below is the output we saw in the istio-proxy container on the backfill job:

{"x_b3_sampled":null,"user_agent":null,"upstream_cluster":"BlackHoleCluster;","method":null,"downstream_remote_address":"10.0.180.113:40216","bytes_sent":0,"request_duration":null,"response_flags":"UH","protocol":null,"authority":null,"path":null,"upstream_host":null,"requested_server_name":"cloudzero-agent-webhook-server-svc.platform-delivery.svc.cluster.local","bytes_received":0,"response_duration":null,"x_b3_parentspanid":null,"x_forwarded_for":null,"x_b3_traceid":null,"downstream_local_address":"172.20.70.217:443","response_tx_duration":null,"response_code":0,"request_id":null,"start_time":"2025-09-24T18:35:18.852Z","duration":0,"upstream_local_address":null,"connection_termination_details":null,"x_b3_spanid":null}

However, this was odd as this is in cluster traffic. We have seen a similar error with other third party deployments. Looking closer, we notice that the name of the port for the Agent Webhook Server Service was hardcoded to http:

cloudzero-agent/helm/templates/webhook-service.yaml

Line 14 in 010f5ce

name: http

Istio will attempt to use the name of the port for a service to determine which protocol to use to handle the traffic: https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/#explicit-protocol-selection

However, since the backfill cronjob is attempting to establish a TLS connection with the Service, this hardcoded http value is causing failures. Simply editing the name of the port to https and triggering a new backfill job worked immediately.

We chose to go this route instead of disabling Istio injection on the job pod. We would prefer to keep it on.

What

This adds a simple change that allows us to rename the port of the service, while maintaining the original hardcoded value via .Values.insightsController.service.portName

How Tested

First, we ensured this was the issue manually in our cluster. We edited the port name on the Service and restarted the job. It immediately began working.

I further confirmed this change to the chart works by creating a simple YAML overrides file that overrode the port name, and confirmed the expected results by running helm template -f overrides.yaml ... and inspecting the Service in the resulting manifest.

thandleman-r7 · 2025-09-24T21:04:30Z

I only just noticed this section in the docs: https://github.com/Cloudzero/cloudzero-agent/blob/develop/helm/docs/istio.md#additional-configuration-options

Seems related, but from what we can see, the backfill doesnt seem to be getting tripped up once the name of the port is https

jake-cloudzero · 2025-09-25T19:28:14Z

@thandleman-r7 Thank you so much for not only the in-depth explanation of the problem, but also providing us a solution. We all really appreciated this approach.

This is really helpful for us as we dive deeper into making our chart more compatible with Itsio, as quite a bit of our customers use it. I was unaware of Itsio's semantics when selecting a protocol for a particular port, and inside the documentation you provided this section looks particularly interesting:

This can be configured in two ways:

By the name of the port: name: <protocol>[-<suffix>].
In Kubernetes 1.18+, by the appProtocol field: appProtocol: <protocol>.
If both are defined, appProtocol takes precedence over the port name.

While changing the port name in this instance will most likely be fine, there are other places in which we rely on various port names to be consistent, and would want to try and avoid changing these unless completely necessary. Have you tried adding appProtocol: "https" to the service?

If so, we would love to get this change in sooner rather than later, and would probably explicitly define the protocol we use through all of our services.

thandleman-r7 · 2025-09-25T21:48:10Z

I have not, will test this out shortly

thandleman-r7 · 2025-09-25T22:05:04Z

Confirmed that setting appProtocol: https also seemed to work. I have pushed the change up.

thandleman-r7 · 2025-10-01T16:55:31Z

@jake-cloudzero Are you waiting for anything on my side?

jake-cloudzero · 2025-10-06T17:05:14Z

@thandleman-r7 Looks good from us. We have some CI things we need to work out before our checks will work for forks. We are working to get those done. In the meantime we are going to get this through today: #485

jake-cloudzero · 2025-10-16T15:05:52Z

Update, this was merged into 1.2.8.

thandleman-r7 requested a review from a team as a code owner September 24, 2025 20:39

Explicitly set appProtocol for the webhook server service

d161d9a

thandleman-r7 force-pushed the configure-webhhook-server-svc-port-name branch from 73b95f0 to d161d9a Compare September 25, 2025 22:04

thandleman-r7 changed the title ~~Allow configuration of the port name for the webhook server Service.~~ Explicitly set appProtocol for the webhook server service Sep 25, 2025

jake-cloudzero mentioned this pull request Oct 6, 2025

CP-33514: add server svc port protocol #485

Merged

jake-cloudzero closed this Oct 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explicitly set appProtocol for the webhook server service #465

Explicitly set appProtocol for the webhook server service #465

Uh oh!

thandleman-r7 commented Sep 24, 2025 •

edited

Loading

Uh oh!

thandleman-r7 commented Sep 24, 2025

Uh oh!

jake-cloudzero commented Sep 25, 2025 •

edited

Loading

Uh oh!

thandleman-r7 commented Sep 25, 2025

Uh oh!

thandleman-r7 commented Sep 25, 2025

Uh oh!

thandleman-r7 commented Oct 1, 2025

Uh oh!

jake-cloudzero commented Oct 6, 2025

Uh oh!

jake-cloudzero commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Explicitly set appProtocol for the webhook server service #465

Explicitly set appProtocol for the webhook server service #465

Uh oh!

Conversation

thandleman-r7 commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

What

How Tested

Uh oh!

thandleman-r7 commented Sep 24, 2025

Uh oh!

jake-cloudzero commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thandleman-r7 commented Sep 25, 2025

Uh oh!

thandleman-r7 commented Sep 25, 2025

Uh oh!

thandleman-r7 commented Oct 1, 2025

Uh oh!

jake-cloudzero commented Oct 6, 2025

Uh oh!

jake-cloudzero commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thandleman-r7 commented Sep 24, 2025 •

edited

Loading

jake-cloudzero commented Sep 25, 2025 •

edited

Loading