Druid components autoscaling best practices #40

itamar-marom · 2023-04-18T08:59:02Z

Started in this slack thread

We need an answer for how to scale each component of Druid.
Middle Managers are on the way to becoming dynamically provisioned which will solve this.
The biggest problem is autoscaling historicals where we should also take storage into our calculation.
Should the operator handle that? Should we have a smart third-party auto scaler (like KEDA)?

sbashar · 2023-04-19T15:26:46Z

The configuration of Apache Druid Historical nodes is relatively more intricate compared to that of Middle Manager nodes. While it may seem straightforward to establish guidelines for scaling up when disk usage exceeds 80%, the scaling process for Historical nodes also necessitates careful consideration of CPU and memory usage. As a result, scaling Historical nodes involves a significant amount of complex logic. This can be an interesting project to work on.

saithal-confluent · 2023-05-15T04:49:33Z

Scaling down of components like MiddleManagers/Indexers and Historicals would need both custom logic in the operator as well as longer wait times based on how they are to be removed out of service without data loss or service impact.

If someone has started the work on the autoscaling of Druid components, or is yet to start, can they loop me in as well please?

AdheipSingh · 2023-05-15T17:29:18Z

@saithal-confluent you can join #druid-operator channel in kubernetes slack.

Also we are planning to start a working group on druid-operator.

itamar-marom · 2023-05-23T06:37:45Z

Anyone who wants to help - I would love to hear your thoughts on the Druid historical nodes autoscaling. To start the conversation I'll add some initial considerations.

Rules:

Scaling will be done 1 by 1

Storage Concerns

When faced with a situation where you have a large volume of data and need to accommodate it in your Apache Druid cluster, you have two main options: adding more historical nodes or resizing the volumes of the existing historical nodes.

Options 1: size up the disk in those situations.

The downside is that if there is a need for another historical, it will add lots of storage to the cluster - and we cannot size down the disk. Also, too large volumes may affect the performance of queries.

Option 2: scale-out with the same storage size

The downside is that it can lead to many historical nodes and can affect the performance of other components.

When to scale out? - requires one condition

Performance degradation -
CPU throttling
RAM
Disk IO
Latency
No other component needs to scale out (?)

When to scale in? - requires all conditions

Data can be stored in fewer nodes (?)
Utilization is low
other components won’t be affected

itamar-marom · 2023-07-24T12:22:01Z

@cyril-corbon WDYT about this?

AdheipSingh mentioned this issue Apr 26, 2023

Community roadmap 2023 apache/druid#14157

Open

itamar-marom added documentation Improvements or additions to documentation question Further information is requested labels Aug 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Druid components autoscaling best practices #40

Druid components autoscaling best practices #40

itamar-marom commented Apr 18, 2023 •

edited

Loading

sbashar commented Apr 19, 2023 •

edited

Loading

saithal-confluent commented May 15, 2023

AdheipSingh commented May 15, 2023

itamar-marom commented May 23, 2023

itamar-marom commented Jul 24, 2023

Druid components autoscaling best practices #40

Druid components autoscaling best practices #40

Comments

itamar-marom commented Apr 18, 2023 • edited Loading

sbashar commented Apr 19, 2023 • edited Loading

saithal-confluent commented May 15, 2023

AdheipSingh commented May 15, 2023

itamar-marom commented May 23, 2023

Rules:

Storage Concerns

Options 1: size up the disk in those situations.

Option 2: scale-out with the same storage size

When to scale out? - requires one condition

When to scale in? - requires all conditions

itamar-marom commented Jul 24, 2023

itamar-marom commented Apr 18, 2023 •

edited

Loading

sbashar commented Apr 19, 2023 •

edited

Loading