Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent Operator reconciliations due to Horizontal Pod Autoscaler dependent resource status updates #2663

Open
Veronica-SP opened this issue Jan 9, 2025 · 5 comments

Comments

@Veronica-SP
Copy link

Hello,

We're deploying an Operator, our Reconciler uses standalone dependent resources and implements the EventSourceInitializer interface.

One of the dependent resources is a Horizontal Pod Autoscaler which monitors a StatefulSet. We recently observed that the currentMetrics property of the status field of the HPA gets updated frequently (couple of times in a minute). These events trigger reconciliations in our Operator which are redundant since the status field is not part of the desired state specification.

Could you tell us whether there is a standard solution to this issue or suggest an approach for avoiding these frequent reconciliations?

Environment
Java Operator SDK version 4.4.4
Java version 17
Kubernetes version 1.30.5

Best regards,
Veronika

@csviri
Copy link
Collaborator

csviri commented Jan 9, 2025

Hi @Veronica-SP ,

you can use filters for that:

when registering the informer, and filter out events that will trigger the reconiliation.

@csviri
Copy link
Collaborator

csviri commented Jan 14, 2025

@Veronica-SP did you managed to solve this problem with filters, pls let us know if you still have issues, I'm closing this now.

@csviri csviri closed this as completed Jan 14, 2025
@ivanchuchulski
Copy link

Hello @csviri, hope you're doing well,

We've tried your suggestion and implemented an update filter for the dependent resource in question. However, we've struggled a bit to come up with the logic in the accept method.

Currently we're testing the following implementation:

  @Override
  public boolean accept(HorizontalPodAutoscaler newResource, HorizontalPodAutoscaler oldResource) {
	  return !newResource.getSpec().equals(oldResource.getSpec());
  }

We've initially tried to make the logic such that it'd accept (i.e. the method returns true) events, when equality of the status property is true, as we're not interested in changes of it.

return newResource.getStatus().equals(oldResource.getStatus());

However, with that approach we're concerned that events coming from updates for example could contain a difference in the status and difference in the spec are going to be filtered out.

Are you aware if the K8s API server could "merge" such diffs in a resource updates, because I'd presume it can't guarantee "atomic" updates to the major properties of a resource(metadata, status, spec). Also, are you aware if there is some clever logic in the Fabric8 informers or in the Operator framework?

Regards,
Ivan

@csviri csviri reopened this Jan 20, 2025
@csviri
Copy link
Collaborator

csviri commented Jan 20, 2025

Hi,

@OverRide
public boolean accept(HorizontalPodAutoscaler newResource, HorizontalPodAutoscaler oldResource) {
return !newResource.getSpec().equals(oldResource.getSpec());
}

as an alternative to this you could just check if the generation in metadata equals, generation increases only if the spec changes.

Unfortunatelly I'm not aware any additional generic support for this on Kuberentes side. There is a matcher in JOSDK, that matches the resources based on Server Side Apply meta attributes:

https://github.com/operator-framework/java-operator-sdk/blob/82c8b93a59305f49b5ea52eb9c31d61fb95f664e/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/dependent/kubernetes/SSABasedGenericKubernetesResourceMatcher.java

But it is for a bit different purpose, thus it checks if the actual resources is according desired state not previous. There could be theoretically an approach to check if some of the desired properties changed in the filter. That would be more efficient because, it would skip the whole reconciliation not just the resource reconciliation (see Dependent Resources feature in docs).
( this would be worth to investigate also on our side cc @metacosm @xstefank )

However, If I understand correctly, your problem is about frequent status changes, so I think this is probably always specific to a resource; so you might want to recognize some patterns that cause frequent changes and maybe accept the event when the state/status of the resource is stabilized.

An additional construct in general are conditions in status, what you might be interested in: https://maelvls.dev/kubernetes-conditions/

@ivanchuchulski
Copy link

Hello,

Unfortunately, there is no generation property in the metadata of the HorizontalPodAutoscaler resource. It seems that this property is specified in CustomResources and maybe most of the workload type of resources like Deployment and StatefulSet.

Regarding the matcher, we're using SSA on the dependent resources and when such reconciliation is triggered, mathching is not different when the fields are pruned and we don't actually initiate an update.

Looking at changes to the object with kubectl get --watch and comparing the diffs, the only fields that frequently change are the resourceVersion and status.currentMetrics.resource.current.averageValue in the case of memory metric. Here are some logs

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  resourceVersion: "178777238"
...
status:
  currentMetrics:
  - resource:
      current:
        averageUtilization: 87
        averageValue: "341368832"
      name: memory
    type: Resource

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  resourceVersion: "178777358"
...
status:
  currentMetrics:
  - resource:
      current:
        averageUtilization: 87
        averageValue: "341458944"
      name: memory
    type: Resource

Another a bit strange this is that for some reason in some diffs a suffix is added to the average value field, like so:

averageValue: "341458944"
---
averageValue: 341378389333m

averageValue: 341378389333m
---
averageValue: 341251413333m

We're going to try to monitor and adjust the resource utilization, since we've defined a target average utilization of 80 and the current metric is a bit above that. However, I don't know if that would produce a different behavior.

I think we can close the issue for now, as we've mitigated the reconciliations with the update filter approach, that you suggested. Thank you very much for your help and attention!

Regards,
Ivan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants