Dockerized cron job to backup ClickHouse databases on single host or cluster with shards and replicas. Based on Alpine docker image, clickhouse-backup tool along with it's ability to work as a REST API service. Logrotate has been added to manage the log files produced by the backup agent. If any issues/suggestions, use the issues tab or create a new PR. FYI: If you're seeking to PostgreSQL backup agent, it can be accessed here.
Diagram as Code in Python and described in this blog post
The agent does the following:
- creates scheduled FULL or DIFF backups (POST to /backup/create)
- checks "create backup" action status before every upload (GET to /backup/status)
- uploads each backup to a remote storage (POST to /backup/upload/)
- checks and waits until upload operation finishes (GET to /backup/actions)
- manages log file with API responses and errors
- generates customized output to standard container logs
- if a backup is not uploaded to remote storage, it's marked as failed and will not be used as the last backup for subsequent DIFF backups
Important: according to the clickhouse-backup official FAQ, "incremental backup calculate increment only during execute upload or create_remote command or similar REST API request". In other words, DIFF and FULL local backups are actually the same (clickhouse-backup list local) Clickhouse-backup creates local backups first before uploading them to remote storage.
If you list remote backups using the command (clickhouse-backup list remote), you will notice the distinction between these two backup types. This is why the agent only issues a warning when you attempt to create a DIFF backup for the first time without having any prior FULL backups.
Default settings:
- DIFF backups: every hour from Monday through Friday and Sunday, plus every hour from 0 through 20 on Saturday
- FULL backups: every Saturday at 8.30 PM
- Rotate and compess logs weekly, rotated 14 times before being removed
- Clickhouse-backup API basic authentication is enabled (rlAPIuser)
- Clickhouse server authentication is enabled (rlbackup)
- Remote storage is ftp with authentication enabled
- Backups to keep local: 6
- Backups to keep remote: 336
- docker-compose.yml - describes environment to test the agent locally
There are the following services:
- clickhouse server (clickhouse-server:23.8-alpine)
- clickhouse-backup (altinity/clickhouse-backup:2.4.0)
- our clickhouse-backup-agent (ch-backup-agent)
- ftpd_server (stilliard/pure-ftpd)
- ./clickhouse/clickhouse-backup-config.yml - clickhouse-backup config file
- ./agent/Dockerfile - backup agent's docker image
- ./agent/ch-backup-logrotate.conf - logrotate config file
- ./agent/clickhouse-backup.sh - script to define backup and upload steps
- ./agent/cronfile - cron job backup and logrotate tasks
- ./github/workflows/docker-image.yml - simple GitHub action to build agent's docker image on every Dockerfile change
- as a resource for learning docker, docker compose, bash, cron and logrotate
- as a source of the script, cron job task or docker files. just grab them and you're set
- as a sample of pairing clickhouse-backup and clickhouse server
- check out logrotate and cron settings in the agent folder
- verify the Dockerfile in the agent folder (if docker is being used)
- adjust clickhouse backup settings if necessary (./clickhouse/clickhouse-backup-config.yml) Change credentials, clickhouse host and remote storage at least
- clickhouse-backup API container or standalone service shoud have access to /var/clickhouse/ folders to create backup successfully. In case of a container, see docker-compose.yml. If your clickhouse-backup API is a Linux service, run the service on the first replica for each shard, and then update cronfile accordingly
- copy cron and script files to a remote host, and then make a test run
- in the case of using Docker, please check the 'docker-compose.yml' file and remove any unnecessary services (such as clickhouse and ftp). Afterward, run docker-compose up -d --build to get containers started
- use docker logs or docker compose to check service logs Log files are also located under the /var/log/clickhouse-backup/ folder
More info and tricks at the blog post
Output with error, warning and info messages:
Log file:
Diagram as code: