Revision reconciler always updates the Kubernetes Deployment #13204

SaschaSchwarze0 · 2022-08-12T10:00:25Z

I was hunting down a performance issue that we are seeing when Knative is reconciling all revisions, for example due to the controller being restarted or global syncs.

The problem happens in the checkAndUpdateDeployment. The code there uses MakeDeployment like in the create case to setup the deployment. It will then check if there are any changes compared to the previous version by running this code:

	// If the spec we want is the spec we have, then we're good.
	if equality.Semantic.DeepEqual(have.Spec, deployment.Spec) {
		return have, nil
	}

This check will never return true. The reasons are the following:

Defaulting. Knative sets up the deployment and does not set fields at various places which Kubernetes will default to some value. Those differences are always there. Examples are: the Protocol field in ports which gets set to TCP, the TimeoutSeconds/PeriodSeconds/FailureThreshold in the queue proxy probe, maxSurge in the rolling update strategy which gets set to 25, the RevisionHistoryLimit which becomes 10, the APIVersion in an env.valueFrom.FieldRef which gets set to v1, the restart policy that gets set to Always ...
Mutations by others. Istio for example mutates in the labels service.istio.io/canonical-name and service.istio.io/canonical-revision in the pod template.

Why is this a problem? Assuming there are just 500 revisions in the system. If every revision reconcile is causing the one Update call, then - with a QPS of 40 (Default 5 multiplied by controller count 8), then only these update calls will need 12.5 seconds. If you have rather 3000 revisions, we're talking about 1m15s. And it's not just about the time. It's also the unnecessary hammering on the API server.

How can this be fixed? I currently have two ideas:

Pass the previous version along into MakeDeployment and there we will need to follow two strategies:

a) is to set defaults where applicable. If you define a TCP probe, just specify that. Kubernetes would unlikely change its default, but specifying the explicit value also does not hurt.
b) Use the previous version to set fields that Knative does not care and Kubernetes could change, such as the maxSurge or revision limit.

A sample of that can be found in SaschaSchwarze0@e3fe7f5. There is also code added in checkAndUpdateDeployment that dumps the differences which one can separately apply to see the problem. This part is also explained below in the reproduction steps.

But, this approach has two disadvantages:

a) it will require a regular verification and extension assuming Kubernetes eventually adds new fields with defaulting
b) it will be a large effort to capture every possible field that others could mutate in, it will never be able to handle situations where others mutate changes on top of Knative
Pass a target object into MakeDeployment. This would be an empty object for the create case and the current version for an update. All the existing functions that build the deployment details, pod spec, its container details, would need a rewrite to handle an empty object, or update an object.

This will imo be cleaner, but its complexity and amount of changes grew beyond what I wanted to try out in code I am not familiar with so that I have no spike code to show here.

This will also not solve situations where webhooks mutate things that Knative sets.

I am happy to discuss this. You can reach me in Knative slack (@sascha). Also happy to help with the implementation.

--

One more thing I'd like to mention: creating an object from scratch and then applying it to the cluster for both the create and update case could be a pattern. I only focused on the revision controller to configure the deployment. I would not be surprised if that pattern (which I consider broken because of above explanations) is used elsewhere. Whether that is a pain or not depends on how many objects of that kind would be present and how often all objects get reconciled.

What version of Knative?

Current

Expected Behavior

Knative only updates Kubernetes deployments when there is a need to do this.

Actual Behavior

Every reconcile of a revision triggers a deployment update.

Steps to Reproduce the Problem

Add the following code before https://github.com/knative/serving/blob/v0.33.0/pkg/reconciler/revision/cruds.go#L93:

	debugDiffs, err := kmp.SafeDiff(have.Spec, desiredDeployment.Spec)
	if err != nil {
		return nil, err
	}
	logger.Infof("Updating deployment %s/%s with diff\n", deployment.Namespace, deployment.Name, debugDiffs)

Then let Knative running while you have revisions in the system. You can observe the unnecessary updates in the container logs of the controller.

The text was updated successfully, but these errors were encountered:

psschwei · 2022-08-16T20:47:12Z

For reference, here's the output of one of those logs after a reconcile

  v1.DeploymentSpec{
    Replicas: &0,
    Selector: &{MatchLabels: {"serving.knative.dev/revisionUID": "f7a53897-8006-48a9-b38f-57617fc8a892"}},
    Template: v1.PodTemplateSpec{
      ObjectMeta: {Labels: {"app": "hello-00001", "serving.knative.dev/configuration": "hello", "serving.knative.dev/configurationGeneration": "1", "serving.knative.dev/configurationUID": "819b6139-fde5-40fb-9c9d-30265e7f7663", ...}, Annotations: {"serving.knative.dev/creator": "minikube-user"}},
      Spec: v1.PodSpec{
        Volumes:        nil,
        InitContainers: nil,
        Containers: []v1.Container{
          {
            ... // 3 identical fields
            Args:       nil,
            WorkingDir: "",
            Ports: []v1.ContainerPort{
              {
                Name:          "user-port",
                HostPort:      0,
                ContainerPort: 8080,
-               Protocol:      "TCP",
+               Protocol:      "",
                HostIP:        "",
              },
            },
            EnvFrom: nil,
            Env:     {{Name: "TARGET", Value: "Go Sample v2"}, {Name: "PORT", Value: "8080"}, {Name: "K_REVISION", Value: "hello-00001"}, {Name: "K_CONFIGURATION", Value: "hello"}, ...},
            Resources: v1.ResourceRequirements{
-             Limits:   nil,
+             Limits:   v1.ResourceList{},
-             Requests: nil,
+             Requests: v1.ResourceList{},
            },
            VolumeMounts:   nil,
            VolumeDevices:  nil,
            LivenessProbe:  nil,
            ReadinessProbe: nil,
            StartupProbe:   nil,
            Lifecycle: &v1.Lifecycle{
              PostStart: nil,
              PreStop: &v1.LifecycleHandler{
                Exec: nil,
                HTTPGet: &v1.HTTPGetAction{
                  Path:        "/wait-for-drain",
                  Port:        {IntVal: 8022},
                  Host:        "",
-                 Scheme:      "HTTP",
+                 Scheme:      "",
                  HTTPHeaders: nil,
                },
                TCPSocket: nil,
              },
            },
-           TerminationMessagePath:   "/dev/termination-log",
+           TerminationMessagePath:   "",
            TerminationMessagePolicy: "FallbackToLogsOnError",
-           ImagePullPolicy:          "IfNotPresent",
+           ImagePullPolicy:          "",
            SecurityContext:          nil,
            Stdin:                    false,
            ... // 2 identical fields
          },
          {
            ... // 3 identical fields
            Args:       nil,
            WorkingDir: "",
            Ports: []v1.ContainerPort{
              {
                Name:          "http-queueadm",
                HostPort:      0,
                ContainerPort: 8022,
-               Protocol:      "TCP",
+               Protocol:      "",
                HostIP:        "",
              },
              {
                Name:          "http-autometric",
                HostPort:      0,
                ContainerPort: 9090,
-               Protocol:      "TCP",
+               Protocol:      "",
                HostIP:        "",
              },
              {
                Name:          "http-usermetric",
                HostPort:      0,
                ContainerPort: 9091,
-               Protocol:      "TCP",
+               Protocol:      "",
                HostIP:        "",
              },
              {
                Name:          "queue-port",
                HostPort:      0,
                ContainerPort: 8012,
-               Protocol:      "TCP",
+               Protocol:      "",
                HostIP:        "",
              },
              {
                Name:          "https-port",
                HostPort:      0,
                ContainerPort: 8112,
-               Protocol:      "TCP",
+               Protocol:      "",
                HostIP:        "",
              },
            },
            EnvFrom: nil,
            Env: []v1.EnvVar{
              ... // 8 identical elements
              {Name: "REVISION_RESPONSE_START_TIMEOUT_SECONDS", Value: "0"},
              {Name: "REVISION_IDLE_TIMEOUT_SECONDS", Value: "0"},
              {
                Name:  "SERVING_POD",
                Value: "",
                ValueFrom: &v1.EnvVarSource{
                  FieldRef: &v1.ObjectFieldSelector{
-                   APIVersion: "v1",
+                   APIVersion: "",
                    FieldPath:  "metadata.name",
                  },
                  ResourceFieldRef: nil,
                  ConfigMapKeyRef:  nil,
                  SecretKeyRef:     nil,
                },
              },
              {
                Name:  "SERVING_POD_IP",
                Value: "",
                ValueFrom: &v1.EnvVarSource{
                  FieldRef: &v1.ObjectFieldSelector{
-                   APIVersion: "v1",
+                   APIVersion: "",
                    FieldPath:  "status.podIP",
                  },
                  ResourceFieldRef: nil,
                  ConfigMapKeyRef:  nil,
                  SecretKeyRef:     nil,
                },
              },
              {Name: "SERVING_LOGGING_CONFIG"},
              {Name: "SERVING_LOGGING_LEVEL"},
              ... // 18 identical elements
            },
            Resources:     {Requests: {s"cpu": {i: {...}, s: "25m", Format: "DecimalSI"}}},
            VolumeMounts:  nil,
            VolumeDevices: nil,
            LivenessProbe: nil,
            ReadinessProbe: &v1.Probe{
              ProbeHandler: v1.ProbeHandler{
                Exec: nil,
                HTTPGet: &v1.HTTPGetAction{
-                 Path:        "/",
+                 Path:        "",
                  Port:        {IntVal: 8012},
                  Host:        "",
-                 Scheme:      "HTTP",
+                 Scheme:      "",
                  HTTPHeaders: {{Name: "K-Network-Probe", Value: "queue"}},
                },
                TCPSocket: nil,
                GRPC:      nil,
              },
              InitialDelaySeconds:           0,
-             TimeoutSeconds:                1,
+             TimeoutSeconds:                0,
-             PeriodSeconds:                 10,
+             PeriodSeconds:                 0,
              SuccessThreshold:              1,
-             FailureThreshold:              3,
+             FailureThreshold:              0,
              TerminationGracePeriodSeconds: nil,
            },
            StartupProbe:             nil,
            Lifecycle:                nil,
-           TerminationMessagePath:   "/dev/termination-log",
+           TerminationMessagePath:   "",
-           TerminationMessagePolicy: "File",
+           TerminationMessagePolicy: "",
-           ImagePullPolicy:          "IfNotPresent",
+           ImagePullPolicy:          "",
            SecurityContext:          &{Capabilities: &{Drop: {"all"}}, RunAsNonRoot: &true, ReadOnlyRootFilesystem: &true, AllowPrivilegeEscalation: &false, ...},
            Stdin:                    false,
            ... // 2 identical fields
          },
        },
        EphemeralContainers:           nil,
-       RestartPolicy:                 "Always",
+       RestartPolicy:                 "",
        TerminationGracePeriodSeconds: &300,
        ActiveDeadlineSeconds:         nil,
-       DNSPolicy:                     "ClusterFirst",
+       DNSPolicy:                     "",
        NodeSelector:                  nil,
        ServiceAccountName:            "",
        ... // 5 identical fields
        HostIPC:               false,
        ShareProcessNamespace: nil,
-       SecurityContext:       s"&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[]Sysctl{},WindowsOptions:nil,FSGroupChangePolicy:nil,SeccompProfile:nil,}",
+       SecurityContext:       nil,
        ImagePullSecrets:      nil,
        Hostname:              "",
        Subdomain:             "",
        Affinity:              nil,
-       SchedulerName:         "default-scheduler",
+       SchedulerName:         "",
        Tolerations:           nil,
        HostAliases:           nil,
        ... // 11 identical fields
      },
    },
    Strategy: v1.DeploymentStrategy{
      Type: "RollingUpdate",
      RollingUpdate: &v1.RollingUpdateDeployment{
        MaxUnavailable: &{},
-       MaxSurge:       s"25%",
+       MaxSurge:       nil,
      },
    },
    MinReadySeconds:         0,
-   RevisionHistoryLimit:    &10,
+   RevisionHistoryLimit:    nil,
    Paused:                  false,
    ProgressDeadlineSeconds: &600,
  }

Not sure the best way to do so, but this seems like something we should fix if possible.

/triage accepted

SaschaSchwarze0 · 2022-10-04T08:26:40Z

Hi @psschwei, as I said in the description, I would be happy to help with the implementation. But, I also mentioned that I don't know what the preferred design here would be. If I would have enough self-confidence, I would have opened a PR and not an issue. So, what is your expection after above action ?

SaschaSchwarze0 · 2022-10-04T08:27:07Z

/assign psschwei

skonto · 2022-11-06T17:36:02Z

@SaschaSchwarze0 @psschwei Hi, maybe something similar to this could help here. I havent checked the details yet if defaulting first via K8s, then by knative and then passing the dep obj covers all cases but could be a start.

SaschaSchwarze0 · 2022-11-08T11:27:49Z

@SaschaSchwarze0 @psschwei Hi, maybe something similar to this could help here. I havent checked the details yet if defaulting first via K8s, then by knative and then passing the dep obj covers all cases but could be a start.

@skonto this idea sounds interesting. But, I don't know how this can work. Knative is a Kubernetes client and only has all the client code available. The registry that the Schema has to do defaulting, does not contain a function for Deployments. Which makes sense given SetObjectDefaults_Deployment is part of Kubernetes server code that Knative does not pick up - and which one is not supposed to import.

Even if it would work and we would import this function somehow, then we would import the defaulting code of one Kubernetes version. I don't know how much this has changed between Kube versions, but we might not be fully happy with that behavior.

SaschaSchwarze0 · 2022-12-13T12:41:45Z

I opened a pull request with a different solution where a hash of the client-side built Deployment spec is stored as Deployment annotation to be later able to perform a comparison. This approach even works with mutating webhooks modifying the Deployment in an arbitrary way in addition to the "known updates" that the Kubernetes defaulting does.

dprotaso · 2023-01-18T19:36:00Z

I started to look at server side apply and I believe it could solve the excess upgrades described in this issue as well as allowing two controllers to coordinate reconciling a single resource (tracked here: knative/pkg#2128).

1.9 release is next week but I think this is worth digging into for Knative 1.10 (something that I can do)

SaschaSchwarze0 · 2023-01-19T08:48:13Z

I started to look at server side apply and I believe it could solve the excess upgrades described in this issue as well as allowing two controllers to coordinate reconciling a single resource (tracked here: knative/pkg#2128).

1.9 release is next week but I think this is worth digging into for Knative 1.10 (something that I can do)

@dprotaso for me to understand ... can you explain me how server-side apply helps to determine on the client side whether an update call to the server should be made or not ? Just to make sure it is clear: I named the issue "Revision reconciler always updates the Kubernetes Deployment". What I really meant was that the reconciler always triggers an update call to the server without that it really updates the object. And the call itself is the problem as it puts unnecessary load on the API server and consumes Knative's "quota" (QPS).

dprotaso · 2023-01-19T15:11:29Z

can you explain me how server-side apply helps to determine on the client side whether an update call to the server should be made or not ?

Server-side apply makes use of managedFields on the resource. We can filter out on the client side all the fields that aren't managed by our controllers and then do a comparison of our desired spec. This in theory should remove the diffs caused by defaulting and potentially other controllers interacting with the resource.

Thus we would skip the SSA call if we know that all the fields the controller manages are set to their desired values.

SaschaSchwarze0 · 2023-01-19T16:12:06Z

can you explain me how server-side apply helps to determine on the client side whether an update call to the server should be made or not ?

Server-side apply makes use of managedFields on the resource. We can filter out on the client side all the fields that aren't managed by our controllers and then do a comparison of our desired spec. This in theory should remove the diffs caused by defaulting and potentially other controllers interacting with the resource.

Thus we would skip the SSA call if we know that all the fields the controller manages are set to their desired values.

Okay, I am not having too much practical experience with server-side apply. Anyway, what about these scenarios:

(1) You do not always know what "all the fields the controller manages" are, I think. Let me try to explain:

The Knative controller creates a deployment. For simplification we assume that it sets the fields a, b, and c. You can determine from the managed fields that it set these fields. Now Knative next version comes out and sets fields a, b, c, and d. The comparison logic will now compare the old and new objects and will determine that d is different. Given d based on managed fields has not previously been set by the Knative controller, you don't update the deployment as long as a, b, and c are unchanged.

But, in this case, the hash of the first object version that you calculate locally is different than the hash of the new version. My proposal would call the update which is correct.

(2) Even fields that you manage can be changed without that you changed them, I think. With mutating admission in the picture.

The Knative controller creates a deployment. It sets fields a, b, and c. A mutating admission webhook mutates field b. When the Knative controller reconciles the object again, it will determine that b has changed and that it owns this. It will run an update, the same mutating admission webhook will change it back making it a costly no-op. Basically any reconcilation of the revision triggers an update call.

But, in this case, the hash of the first object version that you calculate locally will be the same as on the next reconcilations. My proposal would not call the update which is correct.

skonto · 2023-05-04T11:58:34Z

@SaschaSchwarze0 @dprotaso @psschwei

I did more digging into this. This has been discussed here: kubernetes-sigs/kubebuilder#592.
Possible solutions I see:
a) use hashing for the part of the deployment we care about as in here.
b) Use semantic.DeepDerivative(expected.Spec, found.Spec)
kubernetes-sigs/kubebuilder#592 (comment). But only for the empty stuff (as in ray-project/kuberay#945). We still though need to set the non empty default values i would guess.
c) SSA is the goal see: kubernetes-sigs/kubebuilder#592 (comment).
d) Use some other library that does the work for us: https://github.com/banzaicloud/k8s-objectmatcher

I would vote for c) if possible and then check a) (hashing seems an already used approach) or b). I am not sure about depending on some external project like in d).

SaschaSchwarze0 · 2023-05-04T12:50:50Z

@skonto I do NOT consider (c) an option. Server-side apply always goes to the API server. Even if the call is cheap, it is still an API call. And with a (reasonable) QPS like 50 but 4000 revisions in the system, you don't do anything else than no-op updates on deployments for 80 seconds.

EDIT: in case you're not aware, we're running successfully with option (a) for half a year now.

skonto · 2023-05-04T13:16:08Z

Regarding the load it depends on the use case. However I thought that the idea was to only get a reconciliation trigger when a field we manage is touched. So defaults for fields we dont set will not trigger anything.
If we care about a field we should set it properly with our values. It would be nice to understand if that works as assumed here. Anyway we still have options, hashing is a valid option we could adopt as well.

SaschaSchwarze0 · 2023-05-04T13:36:23Z

However I thought that the idea was to only get a reconciliation trigger when a field we manage is touched. So defaults for fields we dont set will not trigger anything.

Sounds interesting. I personally did not spend time to figure out if and how the amount of revision reconciles can be reduced. Its mainly these groups:

Creation of the revision. For sure a reasonable reconcile.
Knative configuration changes that cause a reconcile of everything (like changing the queue-proxy image in config-deployment). That's necessary reconciles because here the deployment really has to be updated.
The controller leader changed. That's necessary reconciles on startup but can cause no deployment update if nothing actually changed.
An owned object is updated, in particular the deployment itself, be it its replica field changed by autoscaler or its status. Not sure if there's any scenario where this should actually cause the controller to also update the deployment.

So yeah, if there are means to reduce the amount of revision reconciles, that would also be cool.

dprotaso · 2023-05-04T23:10:15Z

I think SSA is the right approach - you can perform diffs locally only on the fields the knative controller manages so that would prevent the excess API calls you are worrying about.

The other benefit is it should reduce update conflicts between the controller and autoscaler because they care about different properties of the deployment

I will focus on this for v1.11

/assign @dprotaso
/unassign @SaschaSchwarze0

SaschaSchwarze0 added the kind/bug Categorizes issue or PR as related to a bug. label Aug 12, 2022

knative-prow bot added the triage/accepted Issues which should be fixed (post-triage) label Aug 16, 2022

psschwei assigned SaschaSchwarze0 Sep 19, 2022

knative-prow bot assigned psschwei Oct 4, 2022

norman465 mentioned this issue Nov 14, 2022

The conformance test ContainerExitingMsg is very flaky and currently dependant on a custom progressDeadline #13465

Open

SaschaSchwarze0 mentioned this issue Dec 13, 2022

Use hashing to compare deployment specs for equality #13552

Closed

dprotaso added this to the v1.10.0 milestone Jan 18, 2023

psschwei removed their assignment Jan 25, 2023

dprotaso added this to Serving Milestones Feb 2, 2023

skonto mentioned this issue Mar 20, 2023

Unable to manually configure maxSurge for Knative Serving's Service corresponding Deployment #13800

Closed

dprotaso modified the milestones: v1.10.0, v1.11.0 Apr 26, 2023

dprotaso mentioned this issue May 2, 2023

Failed to update deployment #13913

Closed

knative-prow bot assigned dprotaso and unassigned SaschaSchwarze0 May 4, 2023

dprotaso modified the milestones: v1.11.0, v1.12.0 Aug 16, 2023

dprotaso mentioned this issue Oct 10, 2023

Implemented the extension interface to make controller pluggable #14487

Closed

ReToCode mentioned this issue Jan 10, 2024

Improve/adapt Knative defaulting and validation for K8s object types #14774

Open

skonto mentioned this issue Feb 6, 2024

Add multi-container probing #14853

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revision reconciler always updates the Kubernetes Deployment #13204

Revision reconciler always updates the Kubernetes Deployment #13204

SaschaSchwarze0 commented Aug 12, 2022 •

edited

Loading

psschwei commented Aug 16, 2022

SaschaSchwarze0 commented Oct 4, 2022 •

edited

Loading

SaschaSchwarze0 commented Oct 4, 2022

skonto commented Nov 6, 2022 •

edited

Loading

SaschaSchwarze0 commented Nov 8, 2022 •

edited

Loading

SaschaSchwarze0 commented Dec 13, 2022 •

edited

Loading

dprotaso commented Jan 18, 2023 •

edited

Loading

SaschaSchwarze0 commented Jan 19, 2023

dprotaso commented Jan 19, 2023

SaschaSchwarze0 commented Jan 19, 2023

skonto commented May 4, 2023 •

edited

Loading

SaschaSchwarze0 commented May 4, 2023 •

edited

Loading

skonto commented May 4, 2023 •

edited

Loading

SaschaSchwarze0 commented May 4, 2023

dprotaso commented May 4, 2023 •

edited

Loading

Revision reconciler always updates the Kubernetes Deployment #13204

Revision reconciler always updates the Kubernetes Deployment #13204

Comments

SaschaSchwarze0 commented Aug 12, 2022 • edited Loading

What version of Knative?

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

psschwei commented Aug 16, 2022

SaschaSchwarze0 commented Oct 4, 2022 • edited Loading

SaschaSchwarze0 commented Oct 4, 2022

skonto commented Nov 6, 2022 • edited Loading

SaschaSchwarze0 commented Nov 8, 2022 • edited Loading

SaschaSchwarze0 commented Dec 13, 2022 • edited Loading

dprotaso commented Jan 18, 2023 • edited Loading

SaschaSchwarze0 commented Jan 19, 2023

dprotaso commented Jan 19, 2023

SaschaSchwarze0 commented Jan 19, 2023

skonto commented May 4, 2023 • edited Loading

SaschaSchwarze0 commented May 4, 2023 • edited Loading

skonto commented May 4, 2023 • edited Loading

SaschaSchwarze0 commented May 4, 2023

dprotaso commented May 4, 2023 • edited Loading

SaschaSchwarze0 commented Aug 12, 2022 •

edited

Loading

SaschaSchwarze0 commented Oct 4, 2022 •

edited

Loading

skonto commented Nov 6, 2022 •

edited

Loading

SaschaSchwarze0 commented Nov 8, 2022 •

edited

Loading

SaschaSchwarze0 commented Dec 13, 2022 •

edited

Loading

dprotaso commented Jan 18, 2023 •

edited

Loading

skonto commented May 4, 2023 •

edited

Loading

SaschaSchwarze0 commented May 4, 2023 •

edited

Loading

skonto commented May 4, 2023 •

edited

Loading

dprotaso commented May 4, 2023 •

edited

Loading