|
1 | | -### Prometheus Agent Deployment Sizing Guide |
| 1 | +# Prometheus Agent Deployment Sizing Guide |
2 | 2 |
|
3 | | -#### Introduction |
| 3 | +## Introduction |
4 | 4 |
|
5 | 5 | This guide outlines the recommended settings for deploying the Prometheus agent using the CloudZero Helm chart. It includes instructions on how to configure memory and CPU resources based on the size of your cluster. |
6 | 6 |
|
7 | | -#### Baseline Memory Requirements |
| 7 | +## Baseline Memory Requirements |
8 | 8 |
|
9 | 9 | - **Base Memory:** 512Mi |
10 | 10 | - **Base Memory Limit:** 1024Mi |
11 | 11 | - **Additional Memory:** 0.75Gi per 100 nodes in the cluster |
12 | 12 |
|
13 | | -#### Sizing Calculation |
| 13 | +## Sizing Calculation |
14 | 14 |
|
15 | 15 | It is recommended to consider the shape and size of your prometheus cluster when setting resource memory limits for the prometheus agent. To calculate the memory requirements for your cluster, one can use the following formula: |
16 | 16 |
|
17 | | - |
| 17 | +$$Total Memory = Base Memory + \frac{Number of Nodes}{100} \times 0.75Gi$$ |
18 | 18 |
|
19 | 19 | > This guide uses a basic formula based on number of nodes in the cluster. Please note, your mileage may vary if you have: |
20 | 20 | > |
21 | 21 | > - Very large machines, with a large number of pods |
22 | 22 | > - High churn pods or jobs. Each pod started triggers allocation of a memory for that pods metrics cache in the agent's memory. If the pod restarts, a new cache is created for the new pod instance. [More details on the cache can be found in the prometheus documentation.](https://prometheus.io/docs/practices/remote_write/). This cache is maintained for 2 hours to handle failure recovery of remote writes. |
23 | 23 |
|
24 | | -#### Sample values-override.yml Configuration |
| 24 | +## Sample values-override.yml Configuration |
25 | 25 |
|
26 | | -Create a `values-override.yml` file or edit the default `value.yml` file with the following content to configure the resource limits and requests for your Prometheus agent deployment. Replace `<CALCULATED_MEMORY_LIMIT>` with the actual number of nodes in your cluster: |
| 26 | +Create a `values-override.yml` file or edit the default `value.yml` file with the following content to configure the resource limits and requests for your Prometheus agent deployment. Replace `<CALCULATED_MEMORY_LIMIT>` with the actual value calculated for your cluster: |
27 | 27 |
|
28 | 28 | ```yaml |
29 | | -server: |
30 | | - resources: |
31 | | - requests: |
32 | | - memory: 512Mi |
33 | | - cpu: 250m |
34 | | - limits: |
35 | | - memory: "<CALCULATED_MEMORY_LIMIT>" |
| 29 | +components: |
| 30 | + agent: |
| 31 | + resources: |
| 32 | + requests: |
| 33 | + memory: 512Mi |
| 34 | + cpu: 250m |
| 35 | + limits: |
| 36 | + memory: "<CALCULATED_MEMORY_LIMIT>" |
36 | 37 | ``` |
37 | 38 |
|
38 | 39 | When using Helm, you can provide specific values in a separate `values-override.yml` file to override the defaults specified in the original `values.yml`. This approach allows you to override only the necessary values rather than providing the entire block. |
39 | 40 |
|
40 | | -#### Example Configuration for 200 Nodes |
| 41 | +## Example Configuration for 200 Nodes |
41 | 42 |
|
42 | 43 | Calculate the memory limit based on the number of nodes, for example 200 nodes, the configuration would be: |
43 | 44 |
|
44 | | - |
| 45 | +$$Total Memory = 512Mi + \frac{200}{100} \times 768Mi = 512Mi + 1536Mi = 2048Mi$$ |
45 | 46 |
|
46 | 47 | Example `values-override.yml`: |
47 | 48 |
|
48 | 49 | ```yaml |
49 | | -server: |
50 | | - resources: |
51 | | - requests: |
52 | | - memory: 512Mi |
53 | | - cpu: 250m |
54 | | - limits: |
55 | | - memory: 2048Mi |
| 50 | +components: |
| 51 | + agent: |
| 52 | + resources: |
| 53 | + requests: |
| 54 | + memory: 512Mi |
| 55 | + cpu: 250m |
| 56 | + limits: |
| 57 | + memory: 2048Mi |
56 | 58 | ``` |
57 | 59 |
|
58 | | -This file only includes the overrides for the server resources size limit. |
| 60 | +This file only includes the overrides for the agent resources size limit. |
59 | 61 |
|
60 | 62 | By following these instructions, you can ensure your Prometheus agent is properly sized to handle your cluster's load, preventing potential memory issues and ensuring smooth operation. |
0 commit comments