-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revision reconciler always updates the Kubernetes Deployment #13204
Comments
For reference, here's the output of one of those logs after a reconcile
Not sure the best way to do so, but this seems like something we should fix if possible. /triage accepted |
Hi @psschwei, as I said in the description, I would be happy to help with the implementation. But, I also mentioned that I don't know what the preferred design here would be. If I would have enough self-confidence, I would have opened a PR and not an issue. So, what is your expection after above action ? |
/assign psschwei |
@SaschaSchwarze0 @psschwei Hi, maybe something similar to this could help here. I havent checked the details yet if defaulting first via K8s, then by knative and then passing the dep obj covers all cases but could be a start. |
@skonto this idea sounds interesting. But, I don't know how this can work. Knative is a Kubernetes client and only has all the client code available. The registry that the Schema has to do defaulting, does not contain a function for Deployments. Which makes sense given Even if it would work and we would import this function somehow, then we would import the defaulting code of one Kubernetes version. I don't know how much this has changed between Kube versions, but we might not be fully happy with that behavior. |
I opened a pull request with a different solution where a hash of the client-side built Deployment spec is stored as Deployment annotation to be later able to perform a comparison. This approach even works with mutating webhooks modifying the Deployment in an arbitrary way in addition to the "known updates" that the Kubernetes defaulting does. |
I started to look at server side apply and I believe it could solve the excess upgrades described in this issue as well as allowing two controllers to coordinate reconciling a single resource (tracked here: knative/pkg#2128). 1.9 release is next week but I think this is worth digging into for Knative 1.10 (something that I can do) |
@dprotaso for me to understand ... can you explain me how server-side apply helps to determine on the client side whether an update call to the server should be made or not ? Just to make sure it is clear: I named the issue "Revision reconciler always updates the Kubernetes Deployment". What I really meant was that the reconciler always triggers an update call to the server without that it really updates the object. And the call itself is the problem as it puts unnecessary load on the API server and consumes Knative's "quota" (QPS). |
Server-side apply makes use of Thus we would skip the SSA call if we know that all the fields the controller manages are set to their desired values. |
Okay, I am not having too much practical experience with server-side apply. Anyway, what about these scenarios: (1) You do not always know what "all the fields the controller manages" are, I think. Let me try to explain: The Knative controller creates a deployment. For simplification we assume that it sets the fields a, b, and c. You can determine from the managed fields that it set these fields. Now Knative next version comes out and sets fields a, b, c, and d. The comparison logic will now compare the old and new objects and will determine that d is different. Given d based on managed fields has not previously been set by the Knative controller, you don't update the deployment as long as a, b, and c are unchanged. But, in this case, the hash of the first object version that you calculate locally is different than the hash of the new version. My proposal would call the update which is correct. (2) Even fields that you manage can be changed without that you changed them, I think. With mutating admission in the picture. The Knative controller creates a deployment. It sets fields a, b, and c. A mutating admission webhook mutates field b. When the Knative controller reconciles the object again, it will determine that b has changed and that it owns this. It will run an update, the same mutating admission webhook will change it back making it a costly no-op. Basically any reconcilation of the revision triggers an update call. But, in this case, the hash of the first object version that you calculate locally will be the same as on the next reconcilations. My proposal would not call the update which is correct. |
@SaschaSchwarze0 @dprotaso @psschwei I did more digging into this. This has been discussed here: kubernetes-sigs/kubebuilder#592. I would vote for c) if possible and then check a) (hashing seems an already used approach) or b). I am not sure about depending on some external project like in d). |
@skonto I do NOT consider (c) an option. Server-side apply always goes to the API server. Even if the call is cheap, it is still an API call. And with a (reasonable) QPS like 50 but 4000 revisions in the system, you don't do anything else than no-op updates on deployments for 80 seconds. EDIT: in case you're not aware, we're running successfully with option (a) for half a year now. |
Regarding the load it depends on the use case. However I thought that the idea was to only get a reconciliation trigger when a field we manage is touched. So defaults for fields we dont set will not trigger anything. |
Sounds interesting. I personally did not spend time to figure out if and how the amount of revision reconciles can be reduced. Its mainly these groups:
So yeah, if there are means to reduce the amount of revision reconciles, that would also be cool. |
I think SSA is the right approach - you can perform diffs locally only on the fields the knative controller manages so that would prevent the excess API calls you are worrying about. The other benefit is it should reduce update conflicts between the controller and autoscaler because they care about different properties of the deployment I will focus on this for v1.11 /assign @dprotaso |
I was hunting down a performance issue that we are seeing when Knative is reconciling all revisions, for example due to the controller being restarted or global syncs.
The problem happens in the
checkAndUpdateDeployment
. The code there usesMakeDeployment
like in the create case to setup the deployment. It will then check if there are any changes compared to the previous version by running this code:This check will never return true. The reasons are the following:
Protocol
field in ports which gets set toTCP
, the TimeoutSeconds/PeriodSeconds/FailureThreshold in the queue proxy probe, maxSurge in the rolling update strategy which gets set to 25, the RevisionHistoryLimit which becomes 10, the APIVersion in an env.valueFrom.FieldRef which gets set tov1
, the restart policy that gets set toAlways
...service.istio.io/canonical-name
andservice.istio.io/canonical-revision
in the pod template.Why is this a problem? Assuming there are just 500 revisions in the system. If every revision reconcile is causing the one Update call, then - with a QPS of 40 (Default 5 multiplied by controller count 8), then only these update calls will need 12.5 seconds. If you have rather 3000 revisions, we're talking about 1m15s. And it's not just about the time. It's also the unnecessary hammering on the API server.
How can this be fixed? I currently have two ideas:
Pass the previous version along into
MakeDeployment
and there we will need to follow two strategies:a) is to set defaults where applicable. If you define a TCP probe, just specify that. Kubernetes would unlikely change its default, but specifying the explicit value also does not hurt.
b) Use the previous version to set fields that Knative does not care and Kubernetes could change, such as the maxSurge or revision limit.
A sample of that can be found in SaschaSchwarze0@e3fe7f5. There is also code added in
checkAndUpdateDeployment
that dumps the differences which one can separately apply to see the problem. This part is also explained below in the reproduction steps.But, this approach has two disadvantages:
a) it will require a regular verification and extension assuming Kubernetes eventually adds new fields with defaulting
b) it will be a large effort to capture every possible field that others could mutate in, it will never be able to handle situations where others mutate changes on top of Knative
Pass a target object into
MakeDeployment
. This would be an empty object for the create case and the current version for an update. All the existing functions that build the deployment details, pod spec, its container details, would need a rewrite to handle an empty object, or update an object.This will imo be cleaner, but its complexity and amount of changes grew beyond what I wanted to try out in code I am not familiar with so that I have no spike code to show here.
This will also not solve situations where webhooks mutate things that Knative sets.
I am happy to discuss this. You can reach me in Knative slack (@sascha). Also happy to help with the implementation.
--
One more thing I'd like to mention: creating an object from scratch and then applying it to the cluster for both the create and update case could be a pattern. I only focused on the revision controller to configure the deployment. I would not be surprised if that pattern (which I consider broken because of above explanations) is used elsewhere. Whether that is a pain or not depends on how many objects of that kind would be present and how often all objects get reconciled.
What version of Knative?
Current
Expected Behavior
Knative only updates Kubernetes deployments when there is a need to do this.
Actual Behavior
Every reconcile of a revision triggers a deployment update.
Steps to Reproduce the Problem
Add the following code before https://github.com/knative/serving/blob/v0.33.0/pkg/reconciler/revision/cruds.go#L93:
Then let Knative running while you have revisions in the system. You can observe the unnecessary updates in the container logs of the controller.
The text was updated successfully, but these errors were encountered: