This blog refers to an older version of EDB’s Postgres on Kubernetes offering that is no longer available.

In the last three posts we deployed an EDB database container and two pgpool instances, scaled that up to include a read only replica and finally customized the PostgreSQL instance with ConfigMaps. In this post will we look at how the EDB Failover Manager is configured in the database containers.

This are the pods currently running in my environment, two pgpool containers and two PostgreSQL containers:

dwe@dwe:~$ oc get pods
NAME                 READY     STATUS    RESTARTS   AGE
edb-as10-0-1-gk8dt   1/1       Running   1          9d
edb-as10-0-1-n5z4w   1/1       Running   0          3m
edb-pgpool-1-h5psk   1/1       Running   1          9d
edb-pgpool-1-tq95s   1/1       Running   1          9d

The first one (edb-as10-0-1-gk8dt) is the primary instance and EFM should be running there as well:

dwe@dwe:~$ oc rsh edb-as10-0-1-gk8dt
sh-4.2$ psql -c "select pg_is_in_recovery()" postgres
 pg_is_in_recovery 
-------------------
 f
(1 row)

sh-4.2$ ps -ef | grep efm
edbuser    202     1  0 08:45 ?        00:00:04 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-7.b10.el7.x86_64/jre/bin/java -cp /usr/edb/efm-3.0/lib/EFM-3.0.0.jar -Xmx32m com.enterprisedb.efm.main.ServiceCommand __int_start /etc/edb/efm-3.0/edb.properties
sh-4.2$ 

Looking at the configuration there are some interesting points:

sh-4.2$ cat /etc/edb/efm-3.0/edb.properties | egrep -v "^$|^#"
db.user=edbuser
db.password.encrypted=ca78865e0f85d15edc6c51b2e5c0a58f
db.port=5444
db.database=edb
db.service.owner=edbuser
db.service.name=
db.bin=/usr/edb/as10/bin
db.recovery.conf.dir=/edbvolume/edb/edb-as10-0-1-gk8dt/pgdata
jdbc.sslmode=disable
[email protected]
script.notification=
bind.address=172.17.0.6:5430
admin.port=5431
is.witness=false
local.period=10
local.timeout=60
local.timeout.final=10
remote.timeout=10
node.timeout=10
stop.isolated.master=false
pingServerIp=8.8.8.8
pingServerCommand=/bin/ping -q -c3 -w5
auto.allow.hosts=false
db.reuse.connection.count=0
auto.failover=true
auto.reconfigure=true
promotable=true
minimum.standbys=0
recovery.check.period=2
auto.resume.period=0
virtualIp=
virtualIp.interface=
virtualIp.prefix=
script.fence=
script.post.promotion=/var/efm/post_promotion_steps.sh %f
script.resumed=
script.db.failure=/var/efm/stopEFM
script.master.isolated=
script.remote.pre.promotion=
script.remote.post.promotion=
script.custom.monitor=
custom.monitor.interval=
custom.monitor.timeout=
custom.monitor.safe.mode=
sudo.command=sudo
sudo.user.command=sudo -u %u
log.dir=/var/log/efm-3.0
jgroups.loglevel=
efm.loglevel=
jvm.options=-Xmx32m
kubernetes.port.range=1
kubernetes.namespace=myproject
kubernetes.pod.labels=cluster=edb
kubernetes.master.host=172.30.0.1
kubernetes.master.httpsPort=443
create.database.master=/var/lib/edb/bin/createmasterdb.sh
create.database.standby=/var/lib/edb/bin/createstandbydb.sh
kubernetes.is.init.master=true

The last 8 lines are not there when you do a manual EFM installation so this is something specific in the container deployment. Apparently it is EFM that creates the master and the replica instance(s). The rest is more or less the default setup. The cluster status should be fine then:

sh-4.2$ /usr/edb/efm-3.0/bin/efm cluster-status edb
Cluster Status: edb
VIP: 

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Master      172.17.0.6           UP     UP        
	Standby     172.17.0.8           UP     UP        

Allowed node host list:
	172.17.0.6

Membership coordinator: 172.17.0.6

Standby priority host list:
	172.17.0.8

Promote Status:

	DB Type     Address              XLog Loc         Info
	--------------------------------------------------------------
	Master      172.17.0.6           0/5000140        
	Standby     172.17.0.8           0/5000140        

	Standby database(s) in sync with master. It is safe to promote.

We should be able to do a switchover:

sh-4.2$ /usr/edb/efm-3.0/bin/efm promote edb -switchover    
Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original master. Run the 'cluster-status' command for information about the new cluster state.
sh-4.2$ /usr/edb/efm-3.0/bin/efm cluster-status edb
Cluster Status: edb
VIP: 

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Standby     172.17.0.6           UP     UP        
	Master      172.17.0.8           UP     UP        

Allowed node host list:
	172.17.0.6

Membership coordinator: 172.17.0.6

Standby priority host list:
	172.17.0.6

Promote Status:

	DB Type     Address              XLog Loc         Info
	--------------------------------------------------------------
	Master      172.17.0.8           0/60001A8        
	Standby     172.17.0.6           0/60001A8        

	Standby database(s) in sync with master. It is safe to promote.

Seems it worked so the instances should have switched the roles and the current instance must be in recovery:

sh-4.2$ psql -c "select pg_is_in_recovery()" postgres
 pg_is_in_recovery 
-------------------
 t
(1 row)

Fine. This works as expected. So far for the first look at EFM inside the containers. It is not the same setup you’ll find when you install EFM on your own and EFM is doing more here than it does usually. A lot of stuff happens in the scripts provided by EDB here:

sh-4.2$ ls -la /var/lib/edb/bin/
total 72
drwxrwx---  2 enterprisedb root  4096 May 11 20:40 .
drwxrwx--- 24 enterprisedb root  4096 May 28 18:03 ..
-rwxrwx---  1 enterprisedb root  1907 Feb 17 17:14 cleanup.sh
-rwxrwx---  1 enterprisedb root  4219 May 10 22:11 createmasterdb.sh
-rwxrwx---  1 enterprisedb root  2582 May 11 03:30 createstandbydb.sh
-rwxrwx---  1 enterprisedb root  1491 May 10 22:12 dbcommon.sh
-rwxrwx---  1 enterprisedb root 10187 May 10 22:28 dbfunctions.sh
-rwxrwx---  1 enterprisedb root   621 May 10 22:15 dbsettings.sh
-rwxrwx---  1 enterprisedb root  5778 Apr 26 22:55 helperfunctions.sh
-rwxrwx---  1 enterprisedb root    33 Feb 18 03:43 killPgAgent
-rwxrwx---  1 enterprisedb root  5431 May 10 22:29 launcher.sh
-rwxrwx---  1 enterprisedb root   179 May 10 22:12 startPgAgent
-rwxrwx---  1 enterprisedb root   504 May 11 12:32 startPgPool

These scripts are referenced in the EFM configuration in several places and contain all the logic for initializing the cluster, starting it up, stopping and restarting the cluster, setting up replication and so on. To understand what really is going on one needs to understand the scripts (which is out of scope of this post).