-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Druid components autoscaling best practices #40
Comments
The configuration of Apache Druid Historical nodes is relatively more intricate compared to that of Middle Manager nodes. While it may seem straightforward to establish guidelines for scaling up when disk usage exceeds 80%, the scaling process for Historical nodes also necessitates careful consideration of CPU and memory usage. As a result, scaling Historical nodes involves a significant amount of complex logic. This can be an interesting project to work on. |
Scaling down of components like MiddleManagers/Indexers and Historicals would need both custom logic in the operator as well as longer wait times based on how they are to be removed out of service without data loss or service impact. If someone has started the work on the autoscaling of Druid components, or is yet to start, can they loop me in as well please? |
@saithal-confluent you can join #druid-operator channel in kubernetes slack. Also we are planning to start a working group on druid-operator. |
Anyone who wants to help - I would love to hear your thoughts on the Druid historical nodes autoscaling. To start the conversation I'll add some initial considerations. Rules:
Storage ConcernsWhen faced with a situation where you have a large volume of data and need to accommodate it in your Apache Druid cluster, you have two main options: adding more historical nodes or resizing the volumes of the existing historical nodes. Options 1: size up the disk in those situations.The downside is that if there is a need for another historical, it will add lots of storage to the cluster - and we cannot size down the disk. Also, too large volumes may affect the performance of queries. Option 2: scale-out with the same storage sizeThe downside is that it can lead to many historical nodes and can affect the performance of other components. When to scale out? - requires one conditionPerformance degradation - When to scale in? - requires all conditionsData can be stored in fewer nodes (?) |
@cyril-corbon WDYT about this? |
Started in this slack thread
We need an answer for how to scale each component of Druid.
Middle Managers are on the way to becoming dynamically provisioned which will solve this.
The biggest problem is autoscaling historicals where we should also take storage into our calculation.
Should the operator handle that? Should we have a smart third-party auto scaler (like KEDA)?
The text was updated successfully, but these errors were encountered: