Skip to content

Commit 7d1dd2f

Browse files
proplexjhunt
authored andcommitted
High Availability Support in Active/Passive Mode (#22)
This BOSH release now support Highly-Available PostgreSQL, in an active/passive setup, with automatic failover behind a single VRRP-managed virtual IP address. On bootstrap, if there is no data directory, the postgres job will revert to a normal, index-based setup. The first node will assume the role of the master, and the second will become a replica. Once the data directory has been populated, future restarts of the postgres job will attempt to contact the other node to see if it is a master. If the other node responds, and reports itself as a master, the local node will attempt a `pg_basebackup` from the master and assume the role of a replica. If the other node doesn't respond, or reports itself as a replica, the local node will keep trying, for up to `postgres.replication.grace` seconds, at which point it will assume the mantle of leadership and become the master node, using its current data directory as the canonical truth. Each node then starts up a `monitor` process; this process is responsible for ultimately promoting a local replica to be a master, in the event that the real master goes offline. It works like this: 1. Busy-loop (via 1-second sleeps) until the local postgres instance is available on its configured port. This prevents monitor from trying to restart the postgres while it is running a replica `pg_basebackup`. 2. Busy-loop (again via 1-second sleeps) for as long as the local postgres is a master. 3. Busy-loop (again via 1-second sleeps), checking the master status of the other postgres node, until it detects that either the master node has gone away (via a connection timeout), or the master node has somehow become a replica. 4. Promote the local postgres node to a master. This has been tested by hand in vSphere with CF's ccdb, uaa, and diegodb using postgres as it's storage medium. We noticed a `x < 5` second outage during master failover.
1 parent ed3d107 commit 7d1dd2f

File tree

45 files changed

+3397
-493
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+3397
-493
lines changed

README.md

Lines changed: 86 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -50,27 +50,89 @@ templates/make_manifest openstack-nova my-networking.yml
5050
bosh -n deploy
5151
```
5252

53-
### Development
54-
55-
As a developer of this release, create new releases and upload them:
56-
57-
```
58-
bosh create release --force && bosh -n upload release
59-
```
60-
61-
### Final releases
62-
63-
To share final releases:
64-
65-
```
66-
bosh create release --final
67-
```
68-
69-
By default the version number will be bumped to the next major number. You can specify alternate versions:
70-
71-
72-
```
73-
bosh create release --final --version 2.1
74-
```
75-
76-
After the first release you need to contact [Dmitriy Kalinin](mailto://[email protected]) to request your project is added to https://bosh.io/releases (as mentioned in README above).
53+
## High Availability
54+
55+
HA is implemented with automatic failover, if you set
56+
`postgres.replication.enabled` to true.
57+
58+
On bootstrap, if there is no data directory, the postgres job will
59+
revert to a normal, index-based setup. The first node will assume
60+
the role of the master, and the second will become a replica.
61+
62+
Once the data directory has been populated, future restarts of the
63+
postgres job will attempt to contact the other node to see if it
64+
is a master. If the other node responds, and reports itself as a
65+
master, the local node will attempt a `pg_basebackup` from the
66+
master and assume the role of a replica.
67+
68+
If the other node doesn't respond, or reports itself as a replica,
69+
the local node will keep trying, for up to
70+
`postgres.replication.grace` seconds, at which point it will
71+
assume the mantle of leadership and become the master node,
72+
using its current data directory as the canonical truth.
73+
74+
Each node then starts up a `monitor` process; this process is
75+
responsible for ultimately promoting a local replica to be a
76+
master, in the event that the real master goes offline. It works
77+
like this:
78+
79+
1. Busy-loop (via 1-second sleeps) until the local postgres
80+
instance is available on its configured port. This prevents
81+
monitor from trying to restart the postgres while it is
82+
running a replica `pg_basebackup`.
83+
84+
2. Busy-loop (again via 1-second sleeps) for as long as the
85+
local postgres is a master.
86+
87+
3. Busy-loop (again via 1-second sleeps), checking the master
88+
status of the other postgres node, until it detects that
89+
either the master node has gone away (via a connection
90+
timeout), or the master node has somehow become a replica.
91+
92+
4. Promote the local postgres node to a master.
93+
94+
Intelligent routing can be done by colocating the `haproxy` and
95+
`keepalived` jobs on the instance groups with `postgres`. HAproxy
96+
is configured with an external check that will only treat the
97+
master postgres node as healthy. This ensures that either load
98+
balancer node will only ever route to the write master.
99+
100+
The `keepalived` node trades a VRRP VIP between the `haproxy`
101+
instances. This ensures that the cluster can be accessed over a
102+
single, fixed IP address. Each keepalived process watches its own
103+
haproxy process; if it notices haproxy is down, it will terminate,
104+
to allow the VIP to transgress to the other node, who is assumed
105+
to be healthy.
106+
107+
It is possible to "instance-up" a single postgres node deploy to a
108+
HA cluster by adding the `vip` job and changing postgres `instances`
109+
to 2. More information about this can be found in `manifests/ha.yml`
110+
111+
For backup purposes, a route is exposed through haproxy which
112+
routes directly to the read-only replica for backup jobs. By default
113+
it is port `7432`, but is also configurable via `vip.readonly_port`
114+
115+
Here's a diagram:
116+
117+
![High Availability Diagram](docs/ha.png)
118+
119+
The following parameters affect high availability:
120+
121+
- `postgres.replication.enabled` - Enables replication, which is
122+
necessary for HA. Defaults to `false`.
123+
124+
- `postgres.replication.grace` - How many seconds to wait for
125+
the other node to report itself as a master, during boot.
126+
Defaults to `15`.
127+
128+
- `postgres.replication.connect_timeout` - How many seconds to
129+
allow a `psql` health check to attempt to connect to the other
130+
node before considering it a failure. The lower this value,
131+
the faster your cluster will failover, but the higher a risk
132+
of accidental failover and split-brain. Defaults to `5`.
133+
134+
- `vip.readonly_port` - Which port to access the read-only node
135+
of the cluster. Defaults to `7542`.
136+
137+
- `vip.vip` - Which IP to use as a VIP that is traded between the
138+
two nodes.

ci/release_notes.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# New Features
2+
* Added option to deploy Postgres as a two-node HA cluster with HAProxy and a
3+
VRRP VIP. It features auto-failover and auto-recovery. Uses streaming
4+
replication via WAL.
5+
6+
More information about HA postgres can be found in the `README`, and an example
7+
manifest has been provided under `manifests/ha.yml`

config/blobs.yml

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,24 @@
1-
---
1+
haproxy/haproxy-1.8.10.tar.gz:
2+
size: 2058928
3+
object_id: 5fce0e5c-c863-4bdd-6859-485f7ffcfc7f
4+
sha: f2fd8671c8c40aa85f9d13e9b6c0aebc22a71d33
5+
haproxy/pcre2-10.31.tar.gz:
6+
size: 2130574
7+
object_id: 81491cd7-e484-460a-7f08-9dc953b79614
8+
sha: 7a77476a908c16cb26ad8a26363b67f00c8303bd
9+
haproxy/socat-1.7.3.1.tar.gz:
10+
size: 606049
11+
object_id: abe408af-41c4-49a2-7209-62d64c8efd3f
12+
sha: a6f1d8ab3e85f565dbe172f33a9be6708dd52ffb
13+
keepalived/keepalived-1.2.24.tar.gz:
14+
size: 601873
15+
object_id: 09205505-e300-42b0-6513-24d4c4388960
16+
sha: a69b2c40627a9bd69698a57e81a8c8c97826025d
217
pgrt/pgrt:
18+
size: 7391896
319
object_id: b7968265-d373-404d-b6e0-12caa15f9f98
420
sha: 2f6c7cc5a7b89712be68f2663fdf3fa9f997a2fa
5-
size: 7391896
621
postgres/postgresql-9.5.1.tar.bz2:
22+
size: 18441638
723
object_id: c3acc49c-a9ec-49a1-ae82-e97a00672695
824
sha: 905bc31bc4d500e9498340407740a61835a2022e
9-
size: 18441638
10-
pgpool2/pgpool-II-3.5.4.tar.gz:
11-
object_id: c0821167-44de-470e-ad15-48309d6dd44a
12-
sha: 4ea15dc8bb740baf720b18f182b400ea60b1ae45
13-
size: 2237911

docs/ha.png

19.4 KB
Loading

jobs/pgpool/monit

Lines changed: 0 additions & 11 deletions
This file was deleted.

jobs/pgpool/spec

Lines changed: 0 additions & 51 deletions
This file was deleted.

jobs/pgpool/templates/bin/ctl

Lines changed: 0 additions & 38 deletions
This file was deleted.

jobs/pgpool/templates/bin/monit_debugger

Lines changed: 0 additions & 13 deletions
This file was deleted.

jobs/pgpool/templates/bin/watcher

Lines changed: 0 additions & 42 deletions
This file was deleted.

jobs/pgpool/templates/config/.gitkeep

Whitespace-only changes.

0 commit comments

Comments
 (0)