Pod Autoscaler for tidb (SQL Processing) component in TiDB Cluster #5529

varun-mishra · 2024-01-22T12:21:17Z

Question

We are exploring the horizontal pod auto scaler (hpa) for the TiDB component. We tested it by deploying a small tidb cluster and auto-scaler hpa with basic configuration.

HPA was able to scale up and down the TiDB component. But if the replica count was not equal to the default value in TC spec the cluster state went into the not-ready status and tidb phase was changed to scale. This can cause maintenance hindrances for the operator.

Regarding the above-discussed cases, we have a few questions:

What was the outcome of HPA evaluation done by the PingCap team? (as said here it did not work: doc
Do you have such features in your roadmap?
If tc status is not-ready and tidb phase is scale, how exactly does it affect the TiDB cluster?
can we maintain a healthy state based on scale action done by hpa? or would it be anti-pattern for operator framework?

csuzhangxc · 2024-01-23T03:04:54Z

For a Database, scaling out or in a replica will cause data rebalance, this may take a long time and may have performance issues. The scaling operation will also cause some connections to close or re-connect to another replica, this may need some retry logic in the clients.

We tried to implement HAP before, but it didn't work well. and we don't have such features in the roadmap now.

Updating replicas in the StatefulSet may be overwritten by the TiDB Operator later.

varun-mishra · 2024-01-23T15:36:54Z

Thanks for the response @csuzhangxc
TiDB component is deployed as statefulsets in k8s but is a stateless component, which does not cause the rebalancing of data in case of scaling in/out.
The retry logic from the client side can be worked with clients.

Updating replicas in the StatefulSet may be overwritten by the TiDB Operator later.
I ran the experiment with hpa max-replica count same as tidb replica count in cluster spec, with this configuration operator was not overriding the sts. I observed it for more than 10 hours.

If available can you share the HPA/VPA experiment results done by Pingcap team.

Auto scaling can help TiDB clients with managing the burst load and lean period. Consider this as a requirement and if any change is needed I can contribute.
Let me know your thoughts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod Autoscaler for tidb (SQL Processing) component in TiDB Cluster #5529

Pod Autoscaler for tidb (SQL Processing) component in TiDB Cluster #5529

varun-mishra commented Jan 22, 2024

csuzhangxc commented Jan 23, 2024

varun-mishra commented Jan 23, 2024

Pod Autoscaler for tidb (SQL Processing) component in TiDB Cluster #5529

Pod Autoscaler for tidb (SQL Processing) component in TiDB Cluster #5529

Comments

varun-mishra commented Jan 22, 2024

Question

csuzhangxc commented Jan 23, 2024

varun-mishra commented Jan 23, 2024