Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix that pgbackrest sometimes stops operating in prod cluster (at least add alerts!) #331

Open
Venryx opened this issue Jun 25, 2024 · 1 comment

Comments

@Venryx
Copy link
Collaborator

Venryx commented Jun 25, 2024

No description provided.

@Venryx Venryx converted this from a draft issue Jun 25, 2024
@Venryx Venryx moved this from 🆕 New to 🔖 Short-term (Venryx) in Public task list Jun 25, 2024
@Venryx
Copy link
Collaborator Author

Venryx commented Jul 25, 2024

Update

After restoring the database (open terminal in stuck db pod -> scp contents to other server -> launch same version of postgres with the pgdata directory from scp transfer -> pgdump from that temp instance -> clear PVC in prod cluster, and import from pgdump), the pgbackrest backups started working again. (first new backup on June 26th)

On July 25th though, the database pod got its PVC to 100% storage usage again, causing the issue again. I checked the pgbackrest backups at this point, and the last successful one had been on July 20th.

In summary: Pgbackrest config might actually be fine; but there is something causing the backups to fail at some point. (and no alerting in place when that happens! could detect by checking the "Conditions" column of the Kubernetes Jobs in postgres-operator namespace)

@Venryx Venryx changed the title Fix that pgbackrest appears to not be operating in the prod cluster atm Fix that pgbackrest sometimes stops operating in prod cluster (at least add alerts!) Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🔖 Short-term (Venryx)
Development

No branches or pull requests

1 participant