Repository with scripts, slides and guidance for Prometheus ContribFest in KubeCon Paris 2024
Slides: https://docs.google.com/presentation/d/1ERc2DJZBIp6UcL_vtAQocBjbiSxgMw009fzZBsUa3j0/edit
NOTE: If you have any problem with any scenario, check reference configuration Prometheus Operator, GMP Operator made by us (don't cheat!) (:
You'll need go
, docker
, kind
and kubectl
installed. Once you get there simply
run:
make cluster-create
This will create a 3-node workshop cluster called kubecon2024-prometheus
and connect kubectl
to that cluster.
This will also run initial scenario (kubectl apply -f scenarios/0_initial
):
- Metric source pods (avalanche) in the
default
namespace running (10 replicas) - 2 Prometheus hashmod without operator in
monitoring
namespace scraping metric source pods - Metric backend pod (Prometheus that receives remote-write and exposes UI) in the
remote
namespace running.- NOTE: Remote write endpoint will be available in the cluster under
http://metric-backend.remote.svc:9090/api/v1/write
URL.
- NOTE: Remote write endpoint will be available in the cluster under
You can verify Prometheus Receiver is running and have metric source metrics:
kubectl -n remote port-forward svc/metric-backend 9090
Confirm the Prometheus UI is accessible in your web browser at http://localhost:9090.
Here we can simulate running more applications, so more metrics needed to be collected in the cluster. We won't break collection/OOM Prometheus with only 10 to 15 replica increase, but imagine this won't fit in 2 Prometheus replicas you might have.
With initial collector, you would need to manually change configuration when the more applications are scheduled to the cluster.
- Verify
First let's make sure you have 10 replicas visible on remote backend UI, so
kubectl -n remote port-forward svc/metric-backend 9090
Query for e.g. sum(up) by (instance, pod, operator)
on http://localhost:9090.
- Scale
Scale replicas to 15 e.g. kubectl scale deployment/metric-source --replicas=15
- Verify
Forward traffic again to remote backend:
kubectl -n remote port-forward svc/metric-backend 9090
Query for e.g. sum(up) by (instance, pod, operator)
on http://localhost:9090.
Before you start (especially if you ran GMP Operator stage already):
- (opt) Ensure no
monitorig
namespacekubectl delete namespace monitoring
- (opt) Ensure no
gmp-system
andgmp-public
namespacekubectl delete namespace gmp-system
andkubectl delete namespace gmp-public
- Scale back (if you need) to 10 replicas
kubectl scale deployment/metric-source --replicas=10
From high level, to run Prometheus Operator in auto-scaling hashmod mode you need a few things:
-
You need Prometheus Operator bundle (which includes CRDs, RBAC, Service Accounts and operator). Normally you would go to https://prometheus-operator.dev/docs/user-guides/getting-started/ website and follow the first step. However, we provide one for you in this repo, which includes additional component called KEDA for the horizontal pod autoscaling. It also setups Prometheus Operator in
prometheus-op-system
namespace.kubectl apply --server-side -f scenarios/prometheus-operator/requirements/bundle.yaml
-
Create and apply
PrometheusAgent
Custom Resource with remote write configuration. Remember about podMonitorSelector options! -
Create and apply
PodMonitor
Custom Resource to get Prometheus managed by Prometheus Operator to scrapemetric-source
pods in thedefault
namespace. -
Autoscaling configuration, so
ScaledObject
Custom Resource from KEDA e.g. on number of targets.
Once that done and working you should see avalanche metrics from Prometheus Operator collected by remote backend:
kubectl -n remote port-forward svc/metric-backend 9090
Confirm the Prometheus UI is accessible in your web browser at http://localhost:9090
Do the Stress Scenario to check if it auto-scales!
Before you start (if you ran Prometheus Operator stage already):
- (opt) Ensure no
prometheus-op-system
namespacekubectl delete namespace prometheus-op-system
- Scale back (if you need) to 10 replicas
kubectl scale deployment/metric-source --replicas=10
GMP operator allows you to globally monitor and alert on your workloads using Prometheus, all without the hassle of manually managing and operating Prometheus instances. GMP operator automatically scales to handle your data.
From high level, to run GMP operator you need a few things:
-
Install the GMP Custom Resource Definitions
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/main/manifests/setup.yaml
-
Install the GMP operator
kubectl apply -f scenarios/gmp-operator/.reference/operator.yaml
For this contribFest we're using the latest image of gmp-operator, which is not released yet. Usually you would apply https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/main/manifests/operator.yaml
Confirm pods have
running
status.kubectl get pods -n gmp-system
-
On you own try creating
OperatorConfig
Custom Resource that configures remote write. Make sure it lands ingmp-public
namespace (required). -
Create
PodMonitoring
Custom Resource that scrapes avalanche.
Once that done and working you should see avalanche metrics from GMP Operator collected by remote backend:
kubectl -n remote port-forward svc/metric-backend 9090
Query for e.g. sum(up) by (instance, pod, operator)
on http://localhost:9090.
You should see all avalanche metrics and you see 3 Prometheus collectors.
We don't need to stress... as we can't automatically add/remove nodes on kind, but GMP operator would ensure Prometheus collection scales with number of nodes.