Last week the new Patroni 2.0 release was published which brings many new features. But one makes me absolutely curious. At the moment only as BETA, but I had to test it. Patroni on pure Raft. It’s possible to run Patroni without 3rd party dependencies. So no Etcd, Consul or Zookeeper is needed anymore. Great improvement!
In this blog we will have a look at the setup and try an failover as well.

Starting position

With Patroni on Raft it is possible to run a two node Patroni cluster as well, but I decided to setup a three node cluster.
So what you need to prepare:

  • Three identical VMs with CentOS8 installed
  • All with Postgres 13 and it’s dependencies installed from source. I chose Postgres 13, because the support is newly added as well in Patroni 2.0.
  • I also created the /etc/hostname entries and exchanged the ssh-key between the three servers.
  • Our DMK is installed on these servers as well.
  • Firewall and SELinux are disabled in this example. If you want to run the setup with both enable, you need to configure the firewall and SELinux first.

Setup Patroni

Let’s start with the installation of Patroni. The following steps need to be performed on all three servers. The installation is also not as complicated as before with the new release.

[root@partoni1 ~]$ yum install python3-psycopg2
[root@partoni1 ~]$ su - postgres
postgres@partoni1:/home/postgres/ [pg130] pip3 install patroni[raft] --user
Collecting patroni[raft]
  Using cached https://files.pythonhosted.org/packages/7c/d3/21a189f5f33ef6ce4ff9433c74aa30b70fc3aaf7fecde97c2979a3abdd06/patroni-2.0.0-py3-none-any.whl
Requirement already satisfied: cdiff in /usr/local/lib/python3.6/site-packages (from patroni[raft])
Requirement already satisfied: urllib3[secure]!=1.21,>=1.19.1 in /usr/local/lib/python3.6/site-packages (from patroni[raft])
Requirement already satisfied: PyYAML in /usr/local/lib64/python3.6/site-packages (from patroni[raft])
Requirement already satisfied: six>=1.7 in /usr/lib/python3.6/site-packages (from patroni[raft])
Requirement already satisfied: prettytable>=0.7 in /usr/local/lib/python3.6/site-packages (from patroni[raft])
Requirement already satisfied: python-dateutil in /usr/lib/python3.6/site-packages (from patroni[raft])
Requirement already satisfied: click>=4.1 in /usr/local/lib/python3.6/site-packages (from patroni[raft])
Requirement already satisfied: psutil>=2.0.0 in /usr/local/lib64/python3.6/site-packages (from patroni[raft])
Requirement already satisfied: pysyncobj>=0.3.5; extra == "raft" in /usr/local/lib/python3.6/site-packages (from patroni[raft])
Requirement already satisfied: idna>=2.0.0; extra == "secure" in /usr/lib/python3.6/site-packages (from urllib3[secure]!=1.21,>=1.19.1->patroni[raft])
Requirement already satisfied: pyOpenSSL>=0.14; extra == "secure" in /usr/lib/python3.6/site-packages (from urllib3[secure]!=1.21,>=1.19.1->patroni[raft])
Requirement already satisfied: certifi; extra == "secure" in /usr/local/lib/python3.6/site-packages (from urllib3[secure]!=1.21,>=1.19.1->patroni[raft])
Requirement already satisfied: cryptography>=1.3.4; extra == "secure" in /usr/lib64/python3.6/site-packages (from urllib3[secure]!=1.21,>=1.19.1->patroni[raft])
Requirement already satisfied: asn1crypto>=0.21.0 in /usr/lib/python3.6/site-packages (from cryptography>=1.3.4; extra == "secure"->urllib3[secure]!=1.21,>=1.19.1->patroni[raft])
Requirement already satisfied: cffi!=1.11.3,>=1.7 in /usr/lib64/python3.6/site-packages (from cryptography>=1.3.4; extra == "secure"->urllib3[secure]!=1.21,>=1.19.1->patroni[raft])
Requirement already satisfied: pycparser in /usr/lib/python3.6/site-packages (from cffi!=1.11.3,>=1.7->cryptography>=1.3.4; extra == "secure"->urllib3[secure]!=1.21,>=1.19.1->patroni[raft])
Installing collected packages: patroni
Successfully installed patroni-2.0.0

Configuration

As the installation was successful, we can go on with the configuration of Patroni. As I am using our DMK, the patroni.yml file is stored in the DMK home. But you can store it somewhere else (of course). You only have to adjust some values like IP addresses and name on every server.
But the most important section in here is the Raft section (line 17-22).
– data_dir: For storing the Raft Log and snapshot. It’s an optional parameter.
– self_addr: It’s the address to listen for Raft connections. This needs to be set. Otherwise the node will not become part of the consensus.
– partner_addrs: Here the list of the other Patroni nodes needs to be added.

postgres@partoni1:/u01/app/postgres/local/dmk/etc/ [pg130] cat patroni.yml
scope: PG1
#namespace: /service/
name: patroni1

restapi:
  listen: 192.168.22.201:8008
  connect_address: 192.168.22.201:8008
#  certfile: /etc/ssl/certs/ssl-cert-snakeoil.pem
#  keyfile: /etc/ssl/private/ssl-cert-snakeoil.key
#  authentication:
#    username: username
#    password: password

# ctl:
#   insecure: false # Allow connections to SSL sites without certs
#   certfile: /etc/ssl/certs/ssl-cert-snakeoil.pem
#   cacert: /etc/ssl/certs/ssl-cacert-snakeoil.pem

raft:
  data_dir: /u02/pgdata/raft
  self_addr: 192.168.22.201:5010
  partner_addrs: ['192.168.22.202:5010','192.168.22.203:5010']

bootstrap:
  # and all other cluster members will use it as a `global configuration`
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: 'hot_standby'
        hot_standby: "on"
        wal_keep_segments: 8
        max_replication_slots: 10
        wal_log_hints: "on"
        listen_addresses: '*'
        port: 5432
        logging_collector: 'on'
        log_truncate_on_rotation: 'on'
        log_filename: 'postgresql-%a.log'
        log_rotation_age: '1440'
        log_line_prefix: '%m - %l - %p - %h - %u@%d - %x'
        log_directory: 'pg_log'
        log_min_messages: 'WARNING'
        log_autovacuum_min_duration: '60s'
        log_min_error_statement: 'NOTICE'
        log_min_duration_statement: '30s'
        log_checkpoints: 'on'
        log_statement: 'ddl'
        log_lock_waits: 'on'
        log_temp_files: '0'
        log_timezone: 'Europe/Zurich'
        log_connections: 'on'
        log_disconnections: 'on'
        log_duration: 'on'
        client_min_messages: 'WARNING'
        wal_level: 'replica'
        hot_standby_feedback: 'on'
        max_wal_senders: '10'
        shared_buffers: '128MB'
        work_mem: '8MB'
        effective_cache_size: '512MB'
        maintenance_work_mem: '64MB'
        wal_compression: 'off'
        max_wal_senders: '20'
        shared_preload_libraries: 'pg_stat_statements'
        autovacuum_max_workers: '6'
        autovacuum_vacuum_scale_factor: '0.1'
        autovacuum_vacuum_threshold: '50'
        archive_mode: 'on'
        archive_command: '/bin/true'
        wal_log_hints: 'on'
#      recovery_conf:
#        restore_command: cp ../wal_archive/%f %p

  # some desired options for 'initdb'
  initdb:  # Note: It needs to be a list (some options need values, others are switches)
  - encoding: UTF8
  - data-checksums

  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
  - host replication replicator 192.168.22.0/24 md5
  - host all all 192.168.22.0/24 md5
#  - hostssl all all 0.0.0.0/0 md5

  # Additional script to be launched after initial cluster creation (will be passed the connection URL as parameter)
# post_init: /usr/local/bin/setup_cluster.sh

  # Some additional users users which needs to be created after initializing new cluster
  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb
    replicator:
      password: postgres
      options:
        - superuser

postgresql:
  listen: 192.168.22.201:5432
  connect_address: 192.168.22.201:5432
  data_dir: /u02/pgdata/13/PG1
  bin_dir: /u01/app/postgres/product/13/db_0/bin
#  config_dir:
  pgpass: /u01/app/postgres/local/dmk/etc/pgpass0
  authentication:
    replication:
      username: replicator
      password: *********
    superuser:
      username: postgres
      password: *********
  parameters:
    unix_socket_directories: '/tmp'

watchdog:
  mode: automatic # Allowed values: off, automatic, required
  device: /dev/watchdog
  safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

Service

To start Patroni automatically after reboot. Let’s create a service.

# systemd integration for patroni
# Put this file under /etc/systemd/system/patroni.service
#     then: systemctl daemon-reload
#     then: systemctl list-unit-files | grep patroni
#     then: systemctl enable patroni.service
#

[Unit]
Description=dbi services patroni service
After=etcd.service syslog.target network.target

[Service]
User=postgres
Group=postgres
Type=simple
ExecStartPre=-/usr/bin/sudo /sbin/modprobe softdog
ExecStartPre=-/usr/bin/sudo /bin/chown postgres /dev/watchdog
ExecStart=/home/postgres/.local/bin/patroni /u01/app/postgres/local/dmk/etc/patroni.yml
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=process
Restart=no
TimeoutSec=30

[Install]
WantedBy=multi-user.target

Once everything is created and before starting Patroni, it is possible to check the Patroni configuration file. And that’s the point, when it gets a bit funny at the moment.

postgres@partoni1:/u02/pgdata/raft/ [PG1] patroni --validate-config /u01/app/postgres/local/dmk/etc/patroni.yml
restapi.listen 192.168.22.201:8008 didn't pass validation: 'Port 8008 is already in use.'
Traceback (most recent call last):
  File "/u01/app/postgres/local/dmk/bin/patroni", line 11, in 
    sys.exit(main())
  File "/home/postgres/.local/lib/python3.6/site-packages/patroni/__init__.py", line 170, in main
    return patroni_main()
  File "/home/postgres/.local/lib/python3.6/site-packages/patroni/__init__.py", line 138, in patroni_main
    abstract_main(Patroni, schema)
  File "/home/postgres/.local/lib/python3.6/site-packages/patroni/daemon.py", line 88, in abstract_main
    Config(args.configfile, validator=validator)
  File "/home/postgres/.local/lib/python3.6/site-packages/patroni/config.py", line 102, in __init__
    error = validator(self._local_configuration)
  File "/home/postgres/.local/lib/python3.6/site-packages/patroni/validator.py", line 177, in __call__
    for i in self.validate(data):
  File "/home/postgres/.local/lib/python3.6/site-packages/patroni/validator.py", line 209, in validate
    for i in self.iter():
  File "/home/postgres/.local/lib/python3.6/site-packages/patroni/validator.py", line 217, in iter
    for i in self.iter_dict():
  File "/home/postgres/.local/lib/python3.6/site-packages/patroni/validator.py", line 244, in iter_dict
    validator = self.validator[key]._schema[d]
KeyError: 'raft'

I tried several version of the Raft configuration. But every time I got an error. I also tried to set the system parameters for Raft and commented the Raft block out in the configuration file. But then I got the following output.

postgres@partoni1:/u02/pgdata/raft/ [PG1] patroni --validate-config /u01/app/postgres/local/dmk/etc/patroni.yml
consul  is not defined.
etcd  is not defined.
etcd3  is not defined.
exhibitor  is not defined.
kubernetes  is not defined.
raft  is not defined.
zookeeper  is not defined.
postgresql.authentication.rewind  is not defined.

So seems like there is something not working correctly when validating the configuration file. In the end I was not sure what to test anymore and so I just tried to start the Patroni service. Full exploratory spirit.

postgres@partoni1:/u02/pgdata/raft/ [PG1] sudo systemctl start patroni
postgres@partoni1:/u02/pgdata/raft/ [PG1] sudo systemctl status patroni
● patroni.service - dbi services patroni service
   Loaded: loaded (/etc/systemd/system/patroni.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2020-09-10 11:36:40 CEST; 3s ago
  Process: 5232 ExecStartPre=/usr/bin/sudo /bin/chown postgres /dev/watchdog (code=exited, status=0/SUCCESS)
  Process: 5229 ExecStartPre=/usr/bin/sudo /sbin/modprobe softdog (code=exited, status=0/SUCCESS)
 Main PID: 5236 (patroni)
    Tasks: 2 (limit: 11480)
   Memory: 19.7M
   CGroup: /system.slice/patroni.service
           └─5236 /usr/bin/python3.6 /u01/app/postgres/local/dmk/bin/patroni /u01/app/postgres/local/dmk/etc/patroni.yml

Sep 10 11:36:40 partoni1 systemd[1]: Starting dbi services patroni service...
Sep 10 11:36:40 partoni1 sudo[5229]: postgres : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/sbin/modprobe softdog
Sep 10 11:36:40 partoni1 sudo[5232]: postgres : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/chown postgres /dev/watchdog
Sep 10 11:36:40 partoni1 systemd[1]: Started dbi services patroni service.

And….it starts without any errors.
But that did not fully convinced me. Let’s check the Raft setup. Here Patroni delivers a simple command to check the status of the Raft setup.

commit_idx: 62850
postgres@partoni1:/u02/pgdata/raft/ [PG1] syncobj_admin -conn 192.168.22.201:5010 -status
enabled_code_version: 0
last_applied: 62850
leader: 192.168.22.203:5010
leader_commit_idx: 62850
log_len: 31
match_idx_count: 0
next_node_idx_count: 0
partner_node_status_server_192.168.22.202:5010: 2
partner_node_status_server_192.168.22.203:5010: 2
partner_nodes_count: 2
raft_term: 30
readonly_nodes_count: 0
revision: deprecated
self: 192.168.22.201:5010
self_code_version: 0
state: 0
uptime: 379
version: 0.3.6

And of course we can still check the cluster status itself.

postgres@partoni1:/u02/pgdata/raft/ [PG1] patronictl list
+ Cluster: PG1 (6870147915530980670) -+---------+----+-----------+
| Member   | Host           | Role    | State   | TL | Lag in MB |
+----------+----------------+---------+---------+----+-----------+
| patroni1 | 192.168.22.201 | Replica | running |  6 |         0 |
| patroni2 | 192.168.22.202 | Leader  | running |  6 |           |
| patroni3 | 192.168.22.203 | Replica | running |  6 |         0 |
+----------+----------------+---------+---------+----+-----------+

This look good as well. So even the configuration validation gives us an error, the Cluster is running smoothly.

Failover

So let’s see what happens if we restart the Leader node. Within a short time, the Leader changes to another node

postgres@partoni3:/home/postgres/ [PG1] patronictl list
+ Cluster: PG1 (6870147915530980670) -+---------+----+-----------+
| Member   | Host           | Role    | State   | TL | Lag in MB |
+----------+----------------+---------+---------+----+-----------+
| patroni1 | 192.168.22.201 | Leader  | running |  7 |           |
| patroni3 | 192.168.22.203 | Replica | running |  7 |         0 |
+----------+----------------+---------+---------+----+-----------+

As soon as patroni2 is back to the network, it will attach to the cluster again without issues as a Replica

postgres@partoni3:/home/postgres/ [PG1] patronictl list
+ Cluster: PG1 (6870147915530980670) -+---------+----+-----------+
| Member   | Host           | Role    | State   | TL | Lag in MB |
+----------+----------------+---------+---------+----+-----------+
| patroni1 | 192.168.22.201 | Leader  | running |  7 |           |
| patroni2 | 192.168.22.202 | Replica | running |  7 |         0 |
| patroni3 | 192.168.22.203 | Replica | running |  7 |         0 |
+----------+----------------+---------+---------+----+-----------+

Conclusion

The configuration check was published as a first draft in Version 1.6.5. Seems there is some space for improvement here. Still not sure where I do the mistake and if there is really one. I also tested my patroni.yml with is configured for etcd. Maybe it’s just because of the brand new Raft possibility.

For me, the Raft status overview is a bit cryptical. Of course, it shows the most important information, like leader, partner_node_status_server and partner_nodes_count. But compared to etcd, where I just get a simple “cluster is healthy”, it needs a bit time to getting used to. Besides that it needs some time to recognize the unavailability of one node.

Using Patroni with pure Raft works fine and the setup is easier than the etcd setup, where you can get some member mismatches. Especially when you don’t want to install an additional tool on your server, it could get a really good possibility. The documentation of Patroni is still a bit minimalistic regarding the configuration. But with some patience you find whatever you’re searching for.