During the preparation of my presentation for the pgconf.eu I ran into one big issue. I had to stop my cluster to add a new node. That was not the way I wanted to archive this. I want a high availability solution, that can be scaled up without any outage. Due to a little hint during the pgconf.eu I was able to find a solution. In this post I will show the manually scale up, without using a playbook.
Starting position
We start with a 3 node patroni cluster which can be created using this blog post.
Now we want to add a fourth node to the existing etcd and Patroni cluster. In case you also need a playbook to install a forth node, check out my GitHub repository.
Scale up the etcd cluster
This step is only needed, when you want to scale up your etcd cluster as well. To scale up a Patroni cluster it is not necessary to scale up etcd cluster. You can, of course, scale up Patroni without adding more etcd cluster members. But maybe someone also needs to scale up his etcd cluster and searches for a solution. If not, just jump to the next step.
Be sure the etcd and patroni service are not started on the forth node.
postgres@patroni4:/home/postgres/ [PG1] systemctl status etcd ● etcd.service - dbi services etcd service Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled) Active: inactive (dead) postgres@patroni4:/home/postgres/ [PG1] systemctl status patroni ● patroni.service - dbi services patroni service Loaded: loaded (/etc/systemd/system/patroni.service; enabled; vendor preset: disabled) Active: inactive (dead) postgres@patroni4:/home/postgres/ [PG1]
Make the following adjustments in the etcd.conf of the 4th node.
postgres@patroni4:/home/postgres/ [PG1] cat /u01/app/postgres/local/dmk/etc/etcd.conf name: patroni4 data-dir: /u02/pgdata/etcd initial-advertise-peer-urls: http://192.168.22.114:2380 listen-peer-urls: http://192.168.22.114:2380 listen-client-urls: http://192.168.22.114:2379,http://localhost:2379 advertise-client-urls: http://192.168.22.114:2379 initial-cluster-state: 'existing' initial-cluster: patroni1=http://192.168.22.111:2380,patroni2=http://192.168.22.112:2380,patroni3=http://192.168.22.113:2380,patroni4=http://192.168.22.114:2380
Next add the new etcd member to the existing etcd cluster. You can execute that on every existing member of the cluster.
postgres@patroni1:/home/postgres/ [PG1] etcdctl member add patroni4 http://192.168.22.114:2380 Added member named patroni4 with ID dd9fab8349b3cfc to cluster ETCD_NAME="patroni4" ETCD_INITIAL_CLUSTER="patroni4=http://192.168.22.114:2380,patroni1=http://192.168.22.111:2380,patroni2=http://192.168.22.112:2380,patroni3=http://192.168.22.113:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
Now you can start the etcd service on the 4th node.
postgres@patroni4:/home/postgres/ [PG1] sudo systemctl start etcd postgres@patroni4:/home/postgres/ [PG1] systemctl status etcd ● etcd.service - dbi services etcd service Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2019-10-17 16:39:16 CEST; 9s ago Main PID: 8239 (etcd) CGroup: /system.slice/etcd.service └─8239 /u01/app/postgres/local/dmk/bin/etcd --config-file /u01/app/postgres/local/dmk/etc/etcd.conf postgres@patroni4:/home/postgres/ [PG1]
And after a short check, we can see, that Node 4 is added to the existing cluster
postgres@patroni4:/home/postgres/ [PG1] etcdctl member list dd9fab8349b3cfc: name=patroni4 peerURLs=http://192.168.22.114:2380 clientURLs=http://192.168.22.114:2379 isLeader=false 16e1dca5ee237693: name=patroni1 peerURLs=http://192.168.22.111:2380 clientURLs=http://192.168.22.111:2379 isLeader=false 28a43bb36c801ed4: name=patroni2 peerURLs=http://192.168.22.112:2380 clientURLs=http://192.168.22.112:2379 isLeader=false 5ba7b55764fad76e: name=patroni3 peerURLs=http://192.168.22.113:2380 clientURLs=http://192.168.22.113:2379 isLeader=true
Scale up Patroni
Scale up the Patroni cluster is also really easy.
Adjust the host entry in the patroni.yml on the new node.
postgres@patroni4:/home/postgres/ [PG1] cat /u01/app/postgres/local/dmk/etc/patroni.yml | grep hosts hosts: 192.168.22.111:2379,192.168.22.112:2379,192.168.22.113:2379,192.168.22.114:2379
Afterwards, start the Patroni service.
postgres@patroni4:/home/postgres/ [PG1] sudo systemctl start patroni postgres@patroni4:/home/postgres/ [PG1] systemctl status patroni ● patroni.service - dbi services patroni service Loaded: loaded (/etc/systemd/system/patroni.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2019-10-17 17:03:19 CEST; 5s ago Process: 8476 ExecStartPre=/usr/bin/sudo /bin/chown postgres /dev/watchdog (code=exited, status=0/SUCCESS) Process: 8468 ExecStartPre=/usr/bin/sudo /sbin/modprobe softdog (code=exited, status=0/SUCCESS) Main PID: 8482 (patroni) CGroup: /system.slice/patroni.service ├─8482 /usr/bin/python2 /u01/app/postgres/local/dmk/bin/patroni /u01/app/postgres/local/dmk/etc/patroni.yml ├─8500 /u01/app/postgres/product/11/db_5/bin/postgres -D /u02/pgdata/11/PG1/ --config-file=/u02/pgdata/11/PG1/postgresql.conf --listen_addresses=192.168.22.114 --max_worker_processes=8 --max_locks_per_transact... ├─8502 postgres: PG1: logger ├─8503 postgres: PG1: startup waiting for 000000020000000000000006 ├─8504 postgres: PG1: checkpointer ├─8505 postgres: PG1: background writer ├─8506 postgres: PG1: stats collector ├─8507 postgres: PG1: walreceiver └─8513 postgres: PG1: postgres postgres 192.168.22.114(48882) idle
To be sure, everything runs correctly, check the status of the Patroni cluster
postgres@patroni4:/home/postgres/ [PG1] patronictl list +---------+----------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +---------+----------+----------------+--------+---------+----+-----------+ | PG1 | patroni1 | 192.168.22.111 | | running | 2 | 0.0 | | PG1 | patroni2 | 192.168.22.112 | | running | 2 | 0.0 | | PG1 | patroni3 | 192.168.22.113 | Leader | running | 2 | 0.0 | | PG1 | patroni4 | 192.168.22.114 | | running | 2 | 0.0 | +---------+----------+----------------+--------+---------+----+-----------+
Conclusion
Using the playbooks had one failure. The entry for host in the patroni.yml is only checking localhost. When starting the fourth node, Patroni is not looking for all the other hosts, it is just looking for its own availability. This works fine in an initial cluster, but not when you want to extended one.
And: Always keep in mind, you need an uneven number of members for an etcd cluster, don’t add only a forth etcd node.