In my last blog we had a look on how to bootstrap a complete Patroni Cluster from pgBackRest. But there is also the possibility to recreate only one Patroni node using pgBackRest. In this blog we will have a look on all the things we need to change to recreate a node successfully from backup.
First of all, let’s assume we have a three node Patroni Setup using etcd as key value store and pgBackRest is used to backup the database.
postgres@hero2:/home/postgres/ [PG1] patronictl list +--------+----------------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | + Cluster: PG1 (7166642010148587394) ---------+----+-----------+ | hero1 | 192.168.22.251 | Leader | running | 1 | | | hero2 | 192.168.22.252 | Replica | running | 1 | 0 | | hero3 | 192.168.22.253 | Replica | running | 1 | 0 | +--------+----------------+---------+---------+----+-----------+ postgres@hero2:/home/postgres/ [PG1] etcdctl member list 1c83efe87807cd7b, started, hero2, http://192.168.22.252:2380, http://192.168.22.252:2379, false b059fedab560f470, started, hero3, http://192.168.22.253:2380, http://192.168.22.253:2379, false ecee86d6079e2735, started, hero1, http://192.168.22.251:2380, http://192.168.22.251:2379, false
To start we check if there is a backup available in our pgBackRest repository.
postgres@backup_hero:/home/postgres/ [pg14] pgbackrest info stanza: hero2hero status: ok cipher: none db (current) wal archive min/max (14): 000000010000000000000001/00000001000000000000000C full backup: 20221116-165815F timestamp start/stop: 2022-11-16 16:58:15 / 2022-11-16 16:59:04 wal start/stop: 00000001000000000000000A / 00000001000000000000000C database size: 105.0MB, database backup size: 105.0MB repo1: backup set size: 30.8MB, backup size: 30.8MB
So as everything looks healthy we can start to change the configuration of the Patroni cluster. We have to make this change in two steps.
In the first step edit the configuration using edit-conf. Add the recovery_conf section between use_slots and retry_timeout. Safe your changes.
postgres@hero2:/home/postgres/ [PG1] patronictl edit-config use_slots: true recovery_conf: restore_command: pgbackrest --stanza=hero2hero archive-get %f %p restore_target_timeline: latest retry_timeout: 10 ttl: 30
As the next step, change the patroni.yml. We add the create_replica_methods to the file. As we define pgbackrest and basebackup, it will try to recreate the replica from pgBackRest first and if this is not possible it will recreate from basebackup of the master.
This change should be done at least on the node you want to recreate using pgBackRest but it’s better to do it on all three nodes to make sure that you are prepared in any case. Be careful with this change and make sure you add enough spaces at the beginning of the line. Add the section create_replica_methods below the parameters section.
postgresql: .. .. parameters: unix_socket_directories: '/tmp' create_replica_methods: - pgbackrest - basebackup pgbackrest: command: pgbackrest --stanza=hero2hero restore keep_data: True no_params: True basebackup: checkpoint: 'fast'
Once all the changes are done, you need to reload patroni
sudo systemctl reload patroni
Now we can destroy one replica and recreate it.
postgres@hero2:/home/postgres/ [pg14] sudo systemctl stop patroni postgres@hero2:/home/postgres/ [pg14] rm -rf /u02/pgdata/14/PG1/* postgres@hero2:/home/postgres/ [pg14] ls -al /u02/pgdata/14/PG1/ total 0 drwxr-x---. 2 postgres postgres 6 Nov 16 17:16 . drwxr-x---. 3 postgres postgres 17 Sep 5 10:20 .. postgres@hero2:/home/postgres/ [pg14] sudo systemctl start patroni
This does not really show as if the replica is really recreated from pgBackRest, but if we check the logfile, we get a self explaining message, that pgBackRest was used (see last line)
postgres@hero2:/home/postgres/ [pg14] sudo journalctl -u patroni -f -- Logs begin at Fri 2022-09-16 08:50:46 CEST. -- Nov 16 17:16:37 hero2 systemd: Starting dbi services patroni service... Nov 16 17:16:37 hero2 systemd: Started dbi services patroni service. Nov 16 17:16:38 hero2 patroni: 2022-11-16 17:16:38,085 INFO: Selected new etcd server http://192.168.22.253:2379 Nov 16 17:16:38 hero2 patroni: 2022-11-16 17:16:38,094 INFO: No PostgreSQL configuration items changed, nothing to reload. Nov 16 17:16:38 hero2 patroni: 2022-11-16 17:16:38,116 INFO: Lock owner: hero1; I am hero2 Nov 16 17:16:38 hero2 patroni: 2022-11-16 17:16:38,120 INFO: trying to bootstrap from leader 'hero1' Nov 16 17:16:42 hero2 patroni: 2022-11-16 17:16:42,532 INFO: replica has been created using pgbackrest .. .. ..
The implementation of that behavior is quite easy and fast to setup. As already mentioned, keep in mind to put enough spaces in the patroni.yml otherwise the replica won’t be created using pgBackRest. So in case you still see this entry in your logfile
INFO: replica has been created using basebackup INFO: bootstrapped from leader 'hero1'
Have a look at your patroni.yml once again!