In this post we will have a look at switchover and failover of a Patroni cluster. As well as a look at the maintenance mode Patroni offers, which gives the opportunity to prevent from an automatic failover.
Switchover
There are two possibilities to run a switchover, either in scheduled mode or immediately.
1. Scheduled Switchover
postgres@patroni1:/home/postgres/ [PG1] patronictl switchover Master [patroni1]: Candidate ['patroni2', 'patroni3'] []: patroni2 When should the switchover take place (e.g. 2019-10-08T11:31 ) [now]: 2019-10-08T10:32 Current cluster topology +---------+----------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +---------+----------+----------------+--------+---------+----+-----------+ | PG1 | patroni1 | 192.168.22.111 | Leader | running | 2 | 0.0 | | PG1 | patroni2 | 192.168.22.112 | | running | 2 | 0.0 | | PG1 | patroni3 | 192.168.22.113 | | running | 2 | 0.0 | +---------+----------+----------------+--------+---------+----+-----------+ Are you sure you want to schedule switchover of cluster PG1 at 2019-10-08T10:32:00+02:00, demoting current master patroni1? [y/N]: y 2019-10-08 10:31:14.89236 Switchover scheduled +---------+----------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +---------+----------+----------------+--------+---------+----+-----------+ | PG1 | patroni1 | 192.168.22.111 | Leader | running | 2 | 0.0 | | PG1 | patroni2 | 192.168.22.112 | | running | 2 | 0.0 | | PG1 | patroni3 | 192.168.22.113 | | running | 2 | 0.0 | +---------+----------+----------------+--------+---------+----+-----------+ Switchover scheduled at: 2019-10-08T10:32:00+02:00 from: patroni1 to: patroni2 postgres@patroni1:/home/postgres/ [PG1]
That’s it. At the given time, the switchover will take place. All you see in the logfile is an entry like this
Oct 8 10:32:00 patroni1 patroni: 2019-10-08 10:32:00,006 INFO: Manual scheduled failover at 2019-10-08T10:32:00+02:00 Oct 8 10:32:00 patroni1 patroni: 2019-10-08 10:32:00,016 INFO: Got response from patroni2 http://192.168.22.112:8008/patroni: {"database_system_identifier": "6745341072751547355", "postmaster_start_time": "2019-10-08 10:09:40.217 CEST", "timeline": 2, "cluster_unlocked": false, "patroni": {"scope": "PG1", "version": "1.6.0"}, "state": "running", "role": "replica", "xlog": {"received_location": 83886560, "replayed_timestamp": null, "paused": false, "replayed_location": 83886560}, "server_version": 110005} Oct 8 10:32:00 patroni1 patroni: 2019-10-08 10:32:00,113 INFO: manual failover: demoting myself Oct 8 10:32:01 patroni1 patroni: 2019-10-08 10:32:01,256 INFO: Leader key released Oct 8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,271 INFO: Local timeline=2 lsn=0/6000028 Oct 8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,279 INFO: master_timeline=3 Oct 8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,281 INFO: master: history=1#0110/5000098#011no recovery target specified Oct 8 10:32:03 patroni1 patroni: 2#0110/6000098#011no recovery target specified Oct 8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,282 INFO: closed patroni connection to the postgresql cluster Oct 8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,312 INFO: postmaster pid=11537 Oct 8 10:32:03 patroni1 patroni: 192.168.22.111:5432 - no response Oct 8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03.325 CEST - 1 - 11537 - - @ - 0LOG: listening on IPv4 address "192.168.22.111", port 5432 Oct 8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03.328 CEST - 2 - 11537 - - @ - 0LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" Oct 8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03.339 CEST - 3 - 11537 - - @ - 0LOG: redirecting log output to logging collector process Oct 8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03.339 CEST - 4 - 11537 - - @ - 0HINT: Future log output will appear in directory "pg_log". Oct 8 10:32:04 patroni1 patroni: 192.168.22.111:5432 - accepting connections Oct 8 10:32:04 patroni1 patroni: 192.168.22.111:5432 - accepting connections Oct 8 10:32:04 patroni1 patroni: 2019-10-08 10:32:04,895 INFO: Lock owner: patroni2; I am patroni1 Oct 8 10:32:04 patroni1 patroni: 2019-10-08 10:32:04,895 INFO: does not have lock Oct 8 10:32:04 patroni1 patroni: 2019-10-08 10:32:04,896 INFO: establishing a new patroni connection to the postgres cluster
2. Immediate switchover
Here you start the same way as for planned switchover, but the switchover will take place immediatelly.
postgres@patroni1:/home/postgres/ [PG1] patronictl list +---------+----------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +---------+----------+----------------+--------+---------+----+-----------+ | PG1 | patroni1 | 192.168.22.111 | | running | 1 | 0.0 | | PG1 | patroni2 | 192.168.22.112 | Leader | running | 1 | 0.0 | | PG1 | patroni3 | 192.168.22.113 | | running | 1 | 0.0 | +---------+----------+----------------+--------+---------+----+-----------+ postgres@patroni1:/home/postgres/ [PG1] patronictl switchover Master [patroni2]: Candidate ['patroni1', 'patroni3'] []: patroni1 When should the switchover take place (e.g. 2019-10-08T11:09 ) [now]: Current cluster topology +---------+----------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +---------+----------+----------------+--------+---------+----+-----------+ | PG1 | patroni1 | 192.168.22.111 | | running | 1 | 0.0 | | PG1 | patroni2 | 192.168.22.112 | Leader | running | 1 | 0.0 | | PG1 | patroni3 | 192.168.22.113 | | running | 1 | 0.0 | +---------+----------+----------------+--------+---------+----+-----------+ Are you sure you want to switchover cluster PG1, demoting current master patroni2? [y/N]: y 2019-10-08 10:09:38.88046 Successfully switched over to "patroni1" +---------+----------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +---------+----------+----------------+--------+---------+----+-----------+ | PG1 | patroni1 | 192.168.22.111 | Leader | running | 1 | | | PG1 | patroni2 | 192.168.22.112 | | stopped | | unknown | | PG1 | patroni3 | 192.168.22.113 | | running | 1 | 0.0 | +---------+----------+----------------+--------+---------+----+-----------+ postgres@patroni1:/home/postgres/ [PG1] patronictl list +---------+----------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +---------+----------+----------------+--------+---------+----+-----------+ | PG1 | patroni1 | 192.168.22.111 | Leader | running | 2 | 0.0 | | PG1 | patroni2 | 192.168.22.112 | | running | 2 | 0.0 | | PG1 | patroni3 | 192.168.22.113 | | running | 2 | 0.0 | +---------+----------+----------------+--------+---------+----+-----------+ postgres@patroni1:/home/postgres/ [PG1]
Failover
In difference to the switchover, the failover is executed automatically, when the Leader node is getting unavailable for unplanned reason.
You can only adjust some database parameter to affect the failover.
The parameters for failover arre also managed using patronictl. But they are not in the parameter section, they are above. so let’s say, we adjust one parameter and add one paramter to not use the default anymore.
postgres@patroni1:/u01/app/postgres/local/dmk/etc/ [PG1] patronictl edit-config postgres@patroni1:/u01/app/postgres/local/dmk/etc/ [PG1] patronictl edit-config --- +++ @@ -1,5 +1,6 @@ -loop_wait: 7 +loop_wait: 10 maximum_lag_on_failover: 1048576 +master_start_timeout: 240 postgresql: parameters: archive_command: /bin/true Apply these changes? [y/N]: y Configuration changed
Afterwards there is no need to restart the database. Changes take affect immediately. So the failover can be configured according to every special need. A list of all possible parameter changes can be found here .
Maintenance mode
In some cases it is necessary to do maintenance on a single node and you do not want Patroni to manage the cluster. This can be needed for e.g. release updates.
When Patroni paused, it won’t change the state of PostgeSQL. For example it will not try to start the cluster when it is stopped.
So let’s do an example. We will pause the cluster, stop the replica, upgrade from 9.6.8 to 9.6.13 and afterwards start the replica again. In case we do not pause the replica, the database will be started automatically by Patroni.
postgres@patroni1:/home/postgres/ [PG1] patronictl pause Success: cluster management is paused You have new mail in /var/spool/mail/opendb postgres@patroni1:/home/postgres/ [PG1] patronictl list +---------+----------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +---------+----------+----------------+--------+---------+----+-----------+ | PG1 | patroni1 | 192.168.22.111 | Leader | running | 2 | 0.0 | | PG1 | patroni2 | 192.168.22.112 | | running | 2 | 0.0 | | PG1 | patroni3 | 192.168.22.113 | | running | 2 | 0.0 | +---------+----------+----------------+--------+---------+----+-----------+ Maintenance mode: on
On the replica
postgres@patroni2:/home/postgres/ [PG1] pg_ctl stop -D /u02/pgdata/96/PG1/ -m fast postgres@patroni2:/home/postgres/ [PG1] export PATH= /u01/app/postgres/product/PG96/db_13/bin:$PATH postgres@patroni2:/home/postgres/ [PG1] export PORT=5432 postgres@patroni2:/home/postgres/ [PG1] which pg_ctl /u01/app/opendb/product/PG96/db_13/bin/pg_ctl postgres@patroni2:/home/postgres/ [PG1] pg_ctl -D /u02/pgdata/96/PG1 start server starting postgres@patroni2:/home/postgres/ [PG1] 2019-10-08 17:25:28.358 CEST - 1 - 23192 - - @ - 0LOG: redirecting log output to logging collector process 2019-10-08 17:25:28.358 CEST - 2 - 23192 - - @ - 0HINT: Future log output will appear in directory "pg_log". postgres@patroni2:/home/postgres/ [PG1] psql -c "select version()" postgres version ------------------------------------------------------------------------------------------------------------------------------ PostgreSQL 9.6.13 dbi services build on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit (1 row) postgres@patroni2:/home/postgres/ [PG1] patronictl resume Success: cluster management is resumed postgres@patroni2:/home/postgres/ [PG1] patronictl list +---------+----------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +---------+----------+----------------+--------+---------+----+-----------+ | PG1 | patroni1 | 192.168.22.111 | Leader | running | 5 | 0.0 | | PG1 | patroni2 | 192.168.22.112 | | running | 5 | 0.0 | | PG1 | patroni3 | 192.168.22.113 | | running | 5 | 0.0 | +---------+----------+----------------+--------+---------+----+-----------+
You can do this on the other nodes as well.
Conclusion
Switchover is quite easy and for all the test I did so far it was really reliable. As well as the failover, here you just have to think about adjusting the parameters to your needs. Not in every case it is the best solution to wait 5 min for a failover.