Demonstrating the monitoring and alerting of Kafka, Postgres and a Spring Boot application using Prometheus, Grafana and Alertmanager. The Spring Boot application is used to generate Kafka events and perform Postgres operations to demonstrate the metrics capture.
This repo accompanies the following series of articles on monitoring and alerting with Prometheus, Grafana, and Alertmanager:
- Prometheus, Grafana & Alertmanager - Part 1: Introduction: setting up a monitoring and alerting stack consisting of Prometheus, Grafana and Alertmanager.
- Prometheus, Grafana & Alertmanager - Part 2: Monitoring Demo: demonstrating monitoring Kafka, Postgres, and a Spring Boot application using Prometheus and Grafana.
- Prometheus, Grafana & Alertmanager - Part 3: Alerting Demo: demonstrating managing and raising alerts using Prometheus Alertmanager.
The Spring Boot application provides a REST endpoint that when called triggers the application to produce events. The application then consume these events, and writes corresponding records to the database.
The Spring Boot application docker image should be built:
mvn clean install
docker build -t monitoring-demo-app .
This image will be used when the docker containers are started in the next step.
Start the Spring Boot application, Kafka, Zookeeper, Kafka exporter, Postgres, Postgres exporter, Prometheus, Grafana and Alertmanager:
docker-compose up -d
The exporters export metrics from Kafka and Postgres respectively for consumption by Prometheus. The Spring Boot application metrics are also ingested by Prometheus using Spring Actuator and micrometer. The metrics are visualised in Grafana dashboards. Alertmanager raises alerts based on configured rules, and optionally notifies a Slack channel.
View the exported targets in Prometheus:
- Navigate to http://localhost:9090/
- Select the Status dropdown and then Targets
Connect Grafana to the Prometheus data source:
- Navigate to http://localhost:3000/
- Login with the default credentials that have autocompleted.
- Go to Connections / Data sources / Add data source - select Prometheus.
- Enter the Prometheus URL: http://prometheus:9090 as the Connection URL.
- Click 'Save & test'.
Import example Kafka dashboard:
- Navigate to https://grafana.com/grafana/dashboards/7589-kafka-exporter-overview/
- Download JSON. (Also available at ./dashboards/kafka_7589_rev5.json).
- Under 'Dashboards' select 'Create Dashboard' / 'Import dashboard'.
- Select 'prometheus' as the datasource and import.
Import example Postgres dashboard:
- Navigate to https://grafana.com/grafana/dashboards/9628-postgresql-database/
- Download JSON. (Also available at ./dashboards/postgres_9628_rev7.json)
- Import as above.
Import example Alertmanager dashboard:
- Navigate to https://grafana.com/grafana/dashboards/9578-alertmanager/
- Download JSON. (Also available at ./dashboards/alertmanager_9578_rev4.json)
- Import as above.
Import example Spring Boot application dashboard:
- Navigate to https://grafana.com/grafana/dashboards/19004-spring-boot-statistics/
- Download JSON. (Also available at ./dashboards/springboot_19004_rev1.json)
- Import as above.
Hit the REST endpoint on the Spring Boot application to generate events, specifying the period to send events for, and the delay in milliseconds between each send:
curl -v -d '{"periodToSendSeconds":60, "delayMilliseconds":100}' -H "Content-Type: application/json" -X POST http://localhost:9001/v1/trigger
Application logs show:
INFO d.s.TriggerService - Sending events for 60 seconds
INFO d.s.TriggerService - Total events sent: 224
Confirm the total events sent has resulted in corresponding items being persisted for each:
curl http://localhost:9001/v1/items/count
Example response:
{"count":224}
Note that the producer linger.ms
is configured as 3 milliseconds, so applying a delay shorter than this will result in batches of events being produced.
There are two example alerts configured in the config/rules.yml
file:
- A health check alert on the running instances (down for over 10 seconds).
- A Kafka alert Consumer lag on the Kafka topics exceeding 100 over 10 seconds.
To trigger the health check alert, stop the Postgres exporter container instance and wait 10 seconds:
docker stop postgres-exporter
Once the alert has been raised the container can be restarted with:
docker start postgres-exporter
To trigger the Kafka alert, send in a large volume of events to the demo-topic
(which the Spring Boot application consumes from) and observe the consumer lagging:
curl -v -d '{"periodToSendSeconds":5, "delayMilliseconds":0}' -H "Content-Type: application/json" -X POST http://localhost:9001/v1/trigger
The Spring Boot application logs out the events as it sends and consumes them, so the long period of time spent consuming the events can be verified there.
Navigate to Alertmanager at http://localhost:9093/ to view the alerts.
Alternatively view the alerts firing in the Alertmanager Grafana dashboard (as configured above).
Configure a Slack webhook endpoint for the workspace that will be called by Alertmanager with the alert notification:
https://api.slack.com/messaging/webhooks
Configure the slack integration in config/alertmanager.yml
. Comment in the configuration for the slack_configs
, and enter the correct webhook URL for the api_url
parameter.
When an alert is fired, a notification will appear in the configured Slack channel, in this case #demo-alerts
.
The Spring Boot application runs on port 9001. If the application does not cleanly stop you may get an error on the next run:
listen EADDRINUSE: address already in use :::9001
Find the port and kill the process:
lsof -i -P | grep 9001
kill -9 <processId>
Bring down the Docker containers with:
docker-compose down
Alternatively force remove with the following command:
docker rm -f $(docker ps -aq)
If Docker issues occur it can be useful to clean down the Docker env:
docker system prune
docker volume prune