Skip to content

Commit 010f5ce

Browse files
authored
CP-32994: update sizing guide for recent agent versions (#460)
The sizing guide was still using the top-level `server.resources` instead of `components.agent.resources`. I also replaced the images for the equations with inline LaTeX, which is automatically rendered by GitHub using MathJax. And cleaned up the headings.
1 parent 3da0053 commit 010f5ce

File tree

3 files changed

+26
-24
lines changed

3 files changed

+26
-24
lines changed
-26 KB
Binary file not shown.
-25.4 KB
Binary file not shown.

helm/docs/sizing-guide.md

Lines changed: 26 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,60 +1,62 @@
1-
### Prometheus Agent Deployment Sizing Guide
1+
# Prometheus Agent Deployment Sizing Guide
22

3-
#### Introduction
3+
## Introduction
44

55
This guide outlines the recommended settings for deploying the Prometheus agent using the CloudZero Helm chart. It includes instructions on how to configure memory and CPU resources based on the size of your cluster.
66

7-
#### Baseline Memory Requirements
7+
## Baseline Memory Requirements
88

99
- **Base Memory:** 512Mi
1010
- **Base Memory Limit:** 1024Mi
1111
- **Additional Memory:** 0.75Gi per 100 nodes in the cluster
1212

13-
#### Sizing Calculation
13+
## Sizing Calculation
1414

1515
It is recommended to consider the shape and size of your prometheus cluster when setting resource memory limits for the prometheus agent. To calculate the memory requirements for your cluster, one can use the following formula:
1616

17-
![sizing formula](./assets/sizing-formula.png)
17+
$$Total Memory = Base Memory + \frac{Number of Nodes}{100} \times 0.75Gi$$
1818

1919
> This guide uses a basic formula based on number of nodes in the cluster. Please note, your mileage may vary if you have:
2020
>
2121
> - Very large machines, with a large number of pods
2222
> - High churn pods or jobs. Each pod started triggers allocation of a memory for that pods metrics cache in the agent's memory. If the pod restarts, a new cache is created for the new pod instance. [More details on the cache can be found in the prometheus documentation.](https://prometheus.io/docs/practices/remote_write/). This cache is maintained for 2 hours to handle failure recovery of remote writes.
2323
24-
#### Sample values-override.yml Configuration
24+
## Sample values-override.yml Configuration
2525

26-
Create a `values-override.yml` file or edit the default `value.yml` file with the following content to configure the resource limits and requests for your Prometheus agent deployment. Replace `<CALCULATED_MEMORY_LIMIT>` with the actual number of nodes in your cluster:
26+
Create a `values-override.yml` file or edit the default `value.yml` file with the following content to configure the resource limits and requests for your Prometheus agent deployment. Replace `<CALCULATED_MEMORY_LIMIT>` with the actual value calculated for your cluster:
2727

2828
```yaml
29-
server:
30-
resources:
31-
requests:
32-
memory: 512Mi
33-
cpu: 250m
34-
limits:
35-
memory: "<CALCULATED_MEMORY_LIMIT>"
29+
components:
30+
agent:
31+
resources:
32+
requests:
33+
memory: 512Mi
34+
cpu: 250m
35+
limits:
36+
memory: "<CALCULATED_MEMORY_LIMIT>"
3637
```
3738
3839
When using Helm, you can provide specific values in a separate `values-override.yml` file to override the defaults specified in the original `values.yml`. This approach allows you to override only the necessary values rather than providing the entire block.
3940

40-
#### Example Configuration for 200 Nodes
41+
## Example Configuration for 200 Nodes
4142

4243
Calculate the memory limit based on the number of nodes, for example 200 nodes, the configuration would be:
4344

44-
![Example](./assets/sizing-formula-eg.png)
45+
$$Total Memory = 512Mi + \frac{200}{100} \times 768Mi = 512Mi + 1536Mi = 2048Mi$$
4546

4647
Example `values-override.yml`:
4748

4849
```yaml
49-
server:
50-
resources:
51-
requests:
52-
memory: 512Mi
53-
cpu: 250m
54-
limits:
55-
memory: 2048Mi
50+
components:
51+
agent:
52+
resources:
53+
requests:
54+
memory: 512Mi
55+
cpu: 250m
56+
limits:
57+
memory: 2048Mi
5658
```
5759

58-
This file only includes the overrides for the server resources size limit.
60+
This file only includes the overrides for the agent resources size limit.
5961

6062
By following these instructions, you can ensure your Prometheus agent is properly sized to handle your cluster's load, preventing potential memory issues and ensuring smooth operation.

0 commit comments

Comments
 (0)