Skip to content

Commit 636f93d

Browse files
committed
Increase monit timeout on Postgres job start
We've discovered a bug in the following scenario: Using HA Postgres, with one node down, and the other attempting to start. The node attempting to start will take about 30-40 seconds to bootstrap, waiting for the other node to potentially come online. By default, `monit start` timeout is 30 seconds, which can in some cases cause a loop where Postgres attempts to start and gets killed by monit in a loop. This commit fixes that by extending the timeout to 60 seconds.
1 parent aab76e0 commit 636f93d

File tree

2 files changed

+7
-2
lines changed

2 files changed

+7
-2
lines changed

ci/release_notes.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Improvements
2+
3+
Increase `monit start` timeout of the Postgres job to 60 seconds (previously 30
4+
seconds). This fixes a bug where the Postgres job would be prematurely killed by
5+
monit during boot.

jobs/postgres/monit

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
check process postgres
22
with pidfile /var/vcap/sys/run/postgres/postgres.pid
3-
start program "/var/vcap/jobs/postgres/bin/ctl start"
3+
start program "/var/vcap/jobs/postgres/bin/ctl start" with timeout 60 seconds
44
stop program "/var/vcap/jobs/postgres/bin/ctl stop"
55
group vcap
66

77
<% if p('postgres.replication.enabled') %>
88
check process monitor
99
with pidfile /var/vcap/sys/run/postgres/monitor.pid
10-
start program "/var/vcap/jobs/postgres/bin/monitor start"
10+
start program "/var/vcap/jobs/postgres/bin/monitor start" with timeout 60 seconds
1111
stop program "/var/vcap/jobs/postgres/bin/monitor stop"
1212
group vcap
1313
<% end %>

0 commit comments

Comments
 (0)