Explicitly set appProtocol for the webhook server service #465
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why?
We are deploying the CloudZero Agent into multiple clusters that have an Istio service mesh deployed. We noticed that the
backfill
job was continuously throwing an error that the webhook API was not available:Upon further investigation, we notice that Istio was throwing errors about traffic being redirected to the
BlackholeCluster
i.e. traffic was not being allowed to continue on to the Service. Below is the output we saw in theistio-proxy
container on thebackfill
job:However, this was odd as this is in cluster traffic. We have seen a similar error with other third party deployments. Looking closer, we notice that the name of the port for the Agent Webhook Server Service was hardcoded to
http
:cloudzero-agent/helm/templates/webhook-service.yaml
Line 14 in 010f5ce
Istio will attempt to use the name of the port for a service to determine which protocol to use to handle the traffic: https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/#explicit-protocol-selection
However, since the
backfill
cronjob is attempting to establish a TLS connection with the Service, this hardcodedhttp
value is causing failures. Simply editing the name of the port tohttps
and triggering a newbackfill
job worked immediately.We chose to go this route instead of disabling Istio injection on the job pod. We would prefer to keep it on.
What
This adds a simple change that allows us to rename the port of the service, while maintaining the original hardcoded value via
.Values.insightsController.service.portName
How Tested
First, we ensured this was the issue manually in our cluster. We edited the port name on the Service and restarted the job. It immediately began working.
I further confirmed this change to the chart works by creating a simple YAML overrides file that overrode the port name, and confirmed the expected results by running
helm template -f overrides.yaml ...
and inspecting the Service in the resulting manifest.