feat: hpa with cpu + mem util scaling options #628

burnjake · 2024-03-07T13:40:41Z

CHANGELOG.md updated - n/a?
Rebased/mergable
Tests pass (see comment below)
Sign CLA (if not already signed)

We would like to scale the number of replicas based on usage which is a slight pain currently as you have to set the deployment.spec.replicas field to none if we were to roll our own HPA resource. There's also a pre-existing issue: #624.

$ helm version
version.BuildInfo{Version:"v3.14.2", GitCommit:"c309b6f0ff63856811846ce18f3bdc93d2b4d54b", GitTreeState:"clean", GoVersion:"go1.22.0"}

Setting autoscaling.enabled: true templates the following Deployment and HPA resources:

$ cat values.yaml | grep autoscaling -A10
autoscaling:
  enabled: true
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 80
  targetMemoryUtilizationPercentage: 80
  behavior: {}

$ helm template ./ -s templates/deployment.yaml
---
# Source: telegraf/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: release-name-telegraf
  labels:
    helm.sh/chart: telegraf-1.8.43
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: telegraf
    app.kubernetes.io/instance: release-name
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: telegraf
      app.kubernetes.io/instance: release-name
  template:
    metadata:
      labels:
        app.kubernetes.io/name: telegraf
        app.kubernetes.io/instance: release-name
      annotations:
        checksum/config: 11e7bc3db613c177911535018f65051a22f67ef0cf419dc2f19448d2a629282f
    spec:
      serviceAccountName: release-name-telegraf
      containers:
      - name: telegraf
        image: "docker.io/library/telegraf:1.29-alpine"
        imagePullPolicy: "IfNotPresent"
        resources:
          {}
        env:
        - name: HOSTNAME
          value: telegraf-polling-service
        volumeMounts:
        - name: config
          mountPath: /etc/telegraf
      volumes:
      - name: config
        configMap:
          name: release-name-telegraf

$ helm template ./ -s templates/horizontalpodautoscaler.yaml
---
# Source: telegraf/templates/horizontalpodautoscaler.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: release-name-telegraf
  labels:
    helm.sh/chart: telegraf-1.8.43
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: telegraf
    app.kubernetes.io/instance: release-name
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: release-name-telegraf
  minReplicas: 1
  maxReplicas: 5
  metrics:
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 80

Setting autoscaling.enabled: false templates the following Deployment resource:

$ cat values.yaml | grep autoscaling -A10
autoscaling:
  enabled: false
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 80
  targetMemoryUtilizationPercentage: 80
  behavior: {}

helm template ./ -s templates/deployment.yaml
---
# Source: telegraf/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: release-name-telegraf
  labels:
    helm.sh/chart: telegraf-1.8.43
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: telegraf
    app.kubernetes.io/instance: release-name
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: telegraf
      app.kubernetes.io/instance: release-name
  template:
    metadata:
      labels:
        app.kubernetes.io/name: telegraf
        app.kubernetes.io/instance: release-name
      annotations:
        checksum/config: 11e7bc3db613c177911535018f65051a22f67ef0cf419dc2f19448d2a629282f
    spec:
      serviceAccountName: release-name-telegraf
      containers:
      - name: telegraf
        image: "docker.io/library/telegraf:1.29-alpine"
        imagePullPolicy: "IfNotPresent"
        resources:
          {}
        env:
        - name: HOSTNAME
          value: telegraf-polling-service
        volumeMounts:
        - name: config
          mountPath: /etc/telegraf
      volumes:
      - name: config
        configMap:
          name: release-name-telegraf

$ helm template ./ -s templates/horizontalpodautoscaler.yaml
Error: could not find template templates/horizontalpodautoscaler.yaml in chart

An example with behaviour:

$ cat values.yaml | grep autoscaling -A20
autoscaling:
  enabled: true
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 80
  targetMemoryUtilizationPercentage: 80
  behavior:
    scaleDown:
      policies:
      - type: Pods
        value: 4
        periodSeconds: 60
      - type: Percent
        value: 10
        periodSeconds: 60

$ helm template ./ -s templates/horizontalpodautoscaler.yaml
---
# Source: telegraf/templates/horizontalpodautoscaler.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: release-name-telegraf
  labels:
    helm.sh/chart: telegraf-1.8.43
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: telegraf
    app.kubernetes.io/instance: release-name
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: release-name-telegraf
  minReplicas: 1
  maxReplicas: 5
  behavior:
    scaleDown:
      policies:
      - periodSeconds: 60
        type: Pods
        value: 4
      - periodSeconds: 60
        type: Percent
        value: 10
  metrics:
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 80

powersj · 2024-03-20T12:16:45Z

charts/telegraf/templates/horizontalpodautoscaler.yaml

+        name: memory
+        target:
+          type: Utilization
+          averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}


ok so I understand that the autoscaler will launch additional telegraf nodes if you get above a certain memory and CPU usage, but what ensures that the first pod gets reduced usage? Is there a load balancer or some other proxy in front that would round robin the usage?

Trying to understand the full use-case and how a user would take advantage of this without needing to make modifications to their config. Thanks!

Hi! Apologies I've been away for a few days. So our use case is to utilise the opentelemetry input, aggregate with basicstats and output with the prometheusclient. We have a traffic pattern where the number of connections varies quite a lot within the day, so varying our replica count is prudent.

As the opentelemetry input expects connections via gRPC, we can't depend on normal load balancing via a k8s service and instead we need to use rely on an external LB which we've plumbed into the ingress of the cluster which will discover the new replicas and do the things to spread the traffic (update its connection pool I think?). In short, we don't need extra configuration within telegraf for this to work, but our use case is indeed very specific!

feat: hpa with cpu + mem util scaling options

55ac327

powersj reviewed Mar 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: hpa with cpu + mem util scaling options #628

feat: hpa with cpu + mem util scaling options #628

burnjake commented Mar 7, 2024 •

edited

Loading

powersj Mar 20, 2024

burnjake Apr 5, 2024 •

edited

Loading

feat: hpa with cpu + mem util scaling options #628

Are you sure you want to change the base?

feat: hpa with cpu + mem util scaling options #628

Conversation

burnjake commented Mar 7, 2024 • edited Loading

powersj Mar 20, 2024

Choose a reason for hiding this comment

burnjake Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

burnjake commented Mar 7, 2024 •

edited

Loading

burnjake Apr 5, 2024 •

edited

Loading