Smarter pod placement strategy for statefulsets #1654
Labels
kind/feature
Categorizes issue or PR as related to a new feature.
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
Description
What problem are you trying to solve?
Initial condition:
updateStrategy: {type: OnDelete}
, in particular - solrcloud cluster.Failure scenario:
When the operator launches statefulset, all statefulset pods get created simultaneously and they get assigned to absolutely random AZ. Lets say we launched 9 replicas, and in some unlucky but highly probable state, first 3 pods will land in A AZ, then next 3 pods will be in B AZ and the last 3 pods will land in C AZ.
Now let's imagine that the customer decided to scale-down statefulset by 3 pods. Scaling down happens in particular order so 3 pods with the highest ordinal get removed (remember, this is a statefulset). What you left with is 3 pods in A AZ, 3 pods in B AZ and 0 pods in C AZ. Everything is still ok at this point, except that the cluster now is unbalanced and non-AZ-redundant. Now the customer decides to change pod spec which triggers pod restart. At this point 4 out of 6 pods will violate topology spread constraint because there are no pods in AZ C and pods will get stuck in pending state until their corresponding EBS volume is killed (which may create some scary situations).
Possible solution of the problem:
Karpenter needs to be statefulset-aware and it should evaluate pod constraints (or schedule pods) in order of statefulset pod ordenal increase, not at random, for each statefuset pod, so the constraint get satisfied in case of statefulset scale-down.
How important is this feature to you?
This prevents us from scaling down solr clusters which is pretty big deal.
The text was updated successfully, but these errors were encountered: