Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow formula based dynamic maxReplicas #5938

Open
lkishalmi opened this issue Jul 2, 2024 · 10 comments
Open

Allow formula based dynamic maxReplicas #5938

lkishalmi opened this issue Jul 2, 2024 · 10 comments
Labels
feature-request All issues for new features that have not been committed to needs-discussion

Comments

@lkishalmi
Copy link
Contributor

Proposal

There are certain resource saturation situations which would could restrict scaling out.

While these situations sometimes can be incorporated into a formula in scalinModifiers, that could result a fairly complex one especially when we already have multiple triggers.

I'd suggest that the maxReplicaCount of the ScaledObject could be an int or string, if it would not parse an integer, then it would be evaluated as a formula. The trigger definitions could be used as a source for the calculation.

Use-Case

We have several worker processes, that read/writes data between multiple datastores. (Redis, MySQL).

We are processing queues, and would like to process them fast enough. However if all workers are doing jobs everywhere, than some of the systems could get saturated and eventually broke.

At the moment we have empirically set maximum values, that are trying to save our backend systems from saturation. Still that happen from time-to-time.

We have good metrics/indicators to detect saturation. We have an alerting system in place. When the alert happens we manually set the maximum number of replicas. Unfortunately the time between the saturation detection and the scale down is really crucial, we need to act in 2-5 minutes. Sometimes we could not act that fast.

It would be good to automate these evaluations and actions. We have already have our scalers in KEDA, it controls our HPAs.

The saturation metris/indicators could be collected with existing KEDA scalers (triggers).

Is this a feature you are interested in implementing yourself?

No

Anything else?

No response

@lkishalmi lkishalmi added feature-request All issues for new features that have not been committed to needs-discussion labels Jul 2, 2024
@JorTurFer
Copy link
Member

Hello,
Currently, you can do it using scalingModifiers. If you include something like min(your_max_value, max(scalers)) you can limit the max value using the formula that you want to calculate the max value

@lkishalmi
Copy link
Contributor Author

Yes. Though unfortunately scalingModifiers do not work with traditional cpu or memory triggers.

@JorTurFer
Copy link
Member

Interesting point!
Let see other @kedacore/keda-contributors thoughts

@lkishalmi
Copy link
Contributor Author

Also, AFAIK changing the max value is kind of stronger rule, than an utilization based metric. Usually when we detect some kind of saturation in the system, we do not really care about Scale up/down policies/stabilization windows etc. The auto-scaler just shall drop some replicas if they were over the new limit.

@joebowbeer
Copy link
Contributor

Reaching maxReplicaCount is often treated as an alarm-worthy event.

Dynamically changing maxReplicaCount in order to scale out would make the alarm worthless.

@lkishalmi
Copy link
Contributor Author

Well, for those who have workloads where reaching the maxReplicaCount could be alarming, probably should not use dynamically changing maxReplicaCount.

We have a bunch of queue processing workloads working against 2-3 centralized backend systems. it's reallyhard to determine the correct max replicas / deployment across all the system. We are working with guesstimates, though we have good indicators when some central systems start to be saturated. Dropping some not that important workload would serve us well.

@JorTurFer
Copy link
Member

@zroubalik @tomkerkhove ?

@tomkerkhove
Copy link
Member

Reaching maxReplicaCount is often treated as an alarm-worthy event.

Dynamically changing maxReplicaCount in order to scale out would make the alarm worthless.

If you choose to dynamically define them, then I think it's safe to say you opt in for the behavior and should be OK.

@joebowbeer
Copy link
Contributor

joebowbeer commented Sep 3, 2024

I suspect that HPA is not designed to be used in this way.

What I do see is that a lot of code is bypassed when current replicas exceeds max replicas. This code implements the customizable behaviors, including the default behavior, which may result in erratic scaling:

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#default-behavior

https://github.com/kubernetes/kubernetes/blob/HEAD/pkg/controller/podautoscaler/horizontal.go#L822

@lkishalmi
Copy link
Contributor Author

Well, the code bypass would be a kind of desired behavior. I do not think anyone would miss scale down behaviors modifiers in those cases. In our workload have the following use cases:

  1. We have a weekly archival process, that affects the overall throughput of our system, so during those times we could cap other interfering workflows way lower, than on other days. That would be a cron based trigger.
  2. We are measuring the saturation of our core shared resources. If we have a trigger on resource saturation. The desired event would be to drop non-critical workloads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request All issues for new features that have not been committed to needs-discussion
Projects
None yet
Development

No branches or pull requests

4 participants