You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Allow one-off PSQL errors during checks, check for split-brain (#26)
We found that the parameters around checking who was master was too
strict. A single PSQL error (such as connection reset) for transient
errors would put the replica into master and have a dual master-master
(split-brain) configuration.
We now changed that so that three consistent errors are necessary for
the replica to become master in scenarios where the master is running,
but not accepting PSQL commands.
We've also added a check for split-brain configurations. We've
piggy-backed the status checks to also check for scenarios where both
nodes are master. If they are, both nodes immediately shut down their
postgres, haproxy, and monitor processes. This sets the VM to "failure"
status in BOSH, which should be a very easy find for those with
monitoring solutions (e.g. Prometheus). To recover from this failure
mode, look at README.md, where it is explained step-by-step (it's easy).
0 commit comments