Skip to content

Restructure auto scaling docs#5656

Open
Blargian wants to merge 11 commits intoClickHouse:mainfrom
Blargian:autoscaling-docs
Open

Restructure auto scaling docs#5656
Blargian wants to merge 11 commits intoClickHouse:mainfrom
Blargian:autoscaling-docs

Conversation

@Blargian
Copy link
Copy Markdown
Member

@Blargian Blargian commented Mar 4, 2026

Summary

Restructure autoscaling docs

Checklist

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 4, 2026

@Blargian is attempting to deploy a commit to the ClickHouse Team on Vercel.

A member of the Team first needs to authorize it.

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clickhouse-docs Ready Ready Preview Mar 26, 2026 11:51am
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
clickhouse-docs-ko Ignored Ignored Preview Mar 26, 2026 11:51am

Request Review

@Blargian
Copy link
Copy Markdown
Member Author

Blargian commented Mar 6, 2026

@aashishkohli
Copy link
Copy Markdown
Contributor

Thanks @Blargian

Some change for the vertical autoscaling page.
We should reword the top section as:

Scale and Enterprise tier services support autoscaling based on CPU and memory usage. Service usage is constantly monitored over a lookback window to make scaling decisions. If the usage rises above or falls below certain thresholds, the service is scaled appropriately to match the demand.

CPU-based Scaling

CPU Scaling is based on target tracking which calculates the exact CPU allocation needed to keep utilization at a target level. A scaling action is only triggered if current CPU utilization falls outside a defined band:

Parameter Value Meaning
Target utilization 53% The utilization level ClickHouse aims to maintain
High watermark 75% Triggers scale-up when CPU exceeds this threshold
Low watermark 37.5% Triggers scale-down when CPU falls below this threshold

The recommender evaluates CPU utilization based on historical usage, and determines a recommended CPU size using this formula:

recommended_cpu = max_cpu_usage / target_utilization

If the CPU utilization is between 37.5%–75% of allocated capacity, no scaling action is taken. Outside that band, the recommender computes the exact size needed to land back at 53% utilization, and the service is scaled accordingly.

Example

A service allocated 4 vCPU experiences a spike to 3.8 vCPU usage (~95% utilization), crossing the 75% high watermark. The recommender calculates: 3.8 / 0.53 ≈ 7.2 vCPU, and rounds up to the next available size (8 vCPU). Once load subsides and usage drops below 37.5% (1.5 vCPU), the recommender scales back down proportionally.

Memory-based Scaling

Memory-based auto-scaling scales the cluster to 125% of the maximum memory usage, or up to 150% if OOM (out of memory) errors are encountered.

Scaling Decision

The larger of the CPU or memory recommendation is picked, and CPU and memory allocated to the service are scaled in lockstep increments of 1 CPU and 4 GiB memory.

@aashishkohli
Copy link
Copy Markdown
Contributor

The note at the bottom of the vertical autoscaling page can be updated as:

For Enterprise tier services, standard 1:4 profiles will support vertical autoscaling. Custom profiles don’t support vertical autoscaling or manual vertical scaling. However, these services can be scaled vertically by contacting support.

@aashishkohli
Copy link
Copy Markdown
Contributor

For the Scaling Recommendations page let's use this:


ClickHouse Cloud automatically adjusts CPU and memory resources for each service based on real-time usage — ensuring stable performance while minimizing resource wastage. To balance responsiveness with stability, we utilize a two-window recommender system that monitors utilization over both a short 3-hour window and a longer 30-hour window. This allows us to react quickly to changes and also make decisions based on longer-term trends.

When usage increases, the system references the long window so it can scale up in a single, decisive step to the highest observed load within the past 30 hours. This approach minimizes repeated scale events. Conversely, when traffic declines, the short window guides a quick scale-down within about three hours, conserving resources.

By integrating these two perspectives, the recommender intelligently balances responsiveness with stability.

Benefits

  • Cost optimization: Right-size your services to avoid paying for unused resources while maintaining performance.
  • Proactive scaling: Get ahead of potential performance issues before they impact your workloads.
  • Balanced approach: The 2-window design prevents over-provisioning from transient spikes while still ensuring adequate headroom for real demand.

@aashishkohli
Copy link
Copy Markdown
Contributor

Otherwise the changes look good. We will also need to add a page to explain "Scheduled Scaling" which is a feature currently in preview but will start rolling out more broadly in 2-3 weeks.

We can add that section as "Preview" feature if that is possible? Suggested page content is below.


Scheduled Scaling

ClickHouse Cloud services automatically scale based on CPU and memory utilization, but many workloads follow predictable patterns — daily ingestion spikes, batch jobs that run overnight, or traffic that drops sharply on weekends. For these use cases, Scheduled Scaling lets you define exactly when your service should scale up or down, independent of real-time metrics.

With Scheduled Scaling, you configure a set of time-based rules directly in the ClickHouse Cloud console. Each rule specifies a time, a recurrence (daily, weekly, or custom), and the target size — either the number of replicas (horizontal) or the memory tier (vertical). At the scheduled time, ClickHouse Cloud automatically applies the change, so your service is sized appropriately before demand arrives rather than reacting after the fact.

This is distinct from metric-based autoscaling, which responds dynamically to CPU and memory pressure. Scheduled Scaling is deterministic: you know exactly when the scaling will happen and to what size. The two approaches are complementary — a service can have a baseline scaling schedule and still benefit from autoscaling within that window if workloads fluctuate unexpectedly.

Scheduled Scaling is currently available in Private Preview. To enable it for your organization, contact the ClickHouse support team.

Setting up a scaling schedule

To configure a schedule, navigate to your service in the ClickHouse Cloud console and go to settings. From there, select Schedule Override and add a new rule.

The Scaling Schedules interface in the ClickHouse Cloud console, showing time-based scaling rules.

Each rule requires:

  • Time: When the scaling action should occur (in your local timezone)
  • Recurrence: How often the rule repeats (e.g. every weekday, every Sunday)
  • Target size: The number of replicas or memory allocation to scale to

Multiple rules can be combined to form a full weekly schedule. For example, you might scale out to 5 replicas every weekday at 6 AM and scale back to 2 replicas at 8 PM.

Use cases

Batch and ETL workloads: Scale up before a nightly ingest job runs and scale back down once it completes, avoiding over-provisioning during idle daytime hours.

Predictable traffic patterns: Services with consistent peak hours (e.g. business-hours query traffic) can be pre-scaled to handle load before it arrives, rather than waiting for autoscaling to react.

Weekend scale-down: Reduce replica count or memory tier over weekends when demand is lower, then restore capacity before the Monday morning surge.

Cost control: For teams managing ClickHouse Cloud spend, scheduled scale-downs during known low-utilization periods can meaningfully reduce resource consumption without any manual intervention.

:::note
A scheduled scaling action and a concurrent autoscaling recommendation may interact — the schedule takes precedence at its trigger time.
:::

@aashishkohli
Copy link
Copy Markdown
Contributor

image image

@Blargian Blargian marked this pull request as ready for review March 19, 2026 12:40
@Blargian Blargian requested review from a team as code owners March 19, 2026 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants