This blog refers to an older version of EDB’s Postgres on Kubernetes offering that is no longer available.
In the last three posts we deployed an EDB database container and two pgpool instances, scaled that up to include a read only replica and finally customized the PostgreSQL instance with ConfigMaps. In this post will we look at how the EDB Failover Manager is configured in the database containers.
This are the pods currently running in my environment, two pgpool containers and two PostgreSQL containers:
dwe@dwe:~$ oc get pods NAME READY STATUS RESTARTS AGE edb-as10-0-1-gk8dt 1/1 Running 1 9d edb-as10-0-1-n5z4w 1/1 Running 0 3m edb-pgpool-1-h5psk 1/1 Running 1 9d edb-pgpool-1-tq95s 1/1 Running 1 9d
The first one (edb-as10-0-1-gk8dt) is the primary instance and EFM should be running there as well:
dwe@dwe:~$ oc rsh edb-as10-0-1-gk8dt sh-4.2$ psql -c "select pg_is_in_recovery()" postgres pg_is_in_recovery ------------------- f (1 row) sh-4.2$ ps -ef | grep efm edbuser 202 1 0 08:45 ? 00:00:04 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-7.b10.el7.x86_64/jre/bin/java -cp /usr/edb/efm-3.0/lib/EFM-3.0.0.jar -Xmx32m com.enterprisedb.efm.main.ServiceCommand __int_start /etc/edb/efm-3.0/edb.properties sh-4.2$
Looking at the configuration there are some interesting points:
sh-4.2$ cat /etc/edb/efm-3.0/edb.properties | egrep -v "^$|^#" db.user=edbuser db.password.encrypted=ca78865e0f85d15edc6c51b2e5c0a58f db.port=5444 db.database=edb db.service.owner=edbuser db.service.name= db.bin=/usr/edb/as10/bin db.recovery.conf.dir=/edbvolume/edb/edb-as10-0-1-gk8dt/pgdata jdbc.sslmode=disable [email protected] script.notification= bind.address=172.17.0.6:5430 admin.port=5431 is.witness=false local.period=10 local.timeout=60 local.timeout.final=10 remote.timeout=10 node.timeout=10 stop.isolated.master=false pingServerIp=8.8.8.8 pingServerCommand=/bin/ping -q -c3 -w5 auto.allow.hosts=false db.reuse.connection.count=0 auto.failover=true auto.reconfigure=true promotable=true minimum.standbys=0 recovery.check.period=2 auto.resume.period=0 virtualIp= virtualIp.interface= virtualIp.prefix= script.fence= script.post.promotion=/var/efm/post_promotion_steps.sh %f script.resumed= script.db.failure=/var/efm/stopEFM script.master.isolated= script.remote.pre.promotion= script.remote.post.promotion= script.custom.monitor= custom.monitor.interval= custom.monitor.timeout= custom.monitor.safe.mode= sudo.command=sudo sudo.user.command=sudo -u %u log.dir=/var/log/efm-3.0 jgroups.loglevel= efm.loglevel= jvm.options=-Xmx32m kubernetes.port.range=1 kubernetes.namespace=myproject kubernetes.pod.labels=cluster=edb kubernetes.master.host=172.30.0.1 kubernetes.master.httpsPort=443 create.database.master=/var/lib/edb/bin/createmasterdb.sh create.database.standby=/var/lib/edb/bin/createstandbydb.sh kubernetes.is.init.master=true
The last 8 lines are not there when you do a manual EFM installation so this is something specific in the container deployment. Apparently it is EFM that creates the master and the replica instance(s). The rest is more or less the default setup. The cluster status should be fine then:
sh-4.2$ /usr/edb/efm-3.0/bin/efm cluster-status edb Cluster Status: edb VIP: Agent Type Address Agent DB Info -------------------------------------------------------------- Master 172.17.0.6 UP UP Standby 172.17.0.8 UP UP Allowed node host list: 172.17.0.6 Membership coordinator: 172.17.0.6 Standby priority host list: 172.17.0.8 Promote Status: DB Type Address XLog Loc Info -------------------------------------------------------------- Master 172.17.0.6 0/5000140 Standby 172.17.0.8 0/5000140 Standby database(s) in sync with master. It is safe to promote.
We should be able to do a switchover:
sh-4.2$ /usr/edb/efm-3.0/bin/efm promote edb -switchover Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original master. Run the 'cluster-status' command for information about the new cluster state. sh-4.2$ /usr/edb/efm-3.0/bin/efm cluster-status edb Cluster Status: edb VIP: Agent Type Address Agent DB Info -------------------------------------------------------------- Standby 172.17.0.6 UP UP Master 172.17.0.8 UP UP Allowed node host list: 172.17.0.6 Membership coordinator: 172.17.0.6 Standby priority host list: 172.17.0.6 Promote Status: DB Type Address XLog Loc Info -------------------------------------------------------------- Master 172.17.0.8 0/60001A8 Standby 172.17.0.6 0/60001A8 Standby database(s) in sync with master. It is safe to promote.
Seems it worked so the instances should have switched the roles and the current instance must be in recovery:
sh-4.2$ psql -c "select pg_is_in_recovery()" postgres pg_is_in_recovery ------------------- t (1 row)
Fine. This works as expected. So far for the first look at EFM inside the containers. It is not the same setup you’ll find when you install EFM on your own and EFM is doing more here than it does usually. A lot of stuff happens in the scripts provided by EDB here:
sh-4.2$ ls -la /var/lib/edb/bin/ total 72 drwxrwx--- 2 enterprisedb root 4096 May 11 20:40 . drwxrwx--- 24 enterprisedb root 4096 May 28 18:03 .. -rwxrwx--- 1 enterprisedb root 1907 Feb 17 17:14 cleanup.sh -rwxrwx--- 1 enterprisedb root 4219 May 10 22:11 createmasterdb.sh -rwxrwx--- 1 enterprisedb root 2582 May 11 03:30 createstandbydb.sh -rwxrwx--- 1 enterprisedb root 1491 May 10 22:12 dbcommon.sh -rwxrwx--- 1 enterprisedb root 10187 May 10 22:28 dbfunctions.sh -rwxrwx--- 1 enterprisedb root 621 May 10 22:15 dbsettings.sh -rwxrwx--- 1 enterprisedb root 5778 Apr 26 22:55 helperfunctions.sh -rwxrwx--- 1 enterprisedb root 33 Feb 18 03:43 killPgAgent -rwxrwx--- 1 enterprisedb root 5431 May 10 22:29 launcher.sh -rwxrwx--- 1 enterprisedb root 179 May 10 22:12 startPgAgent -rwxrwx--- 1 enterprisedb root 504 May 11 12:32 startPgPool
These scripts are referenced in the EFM configuration in several places and contain all the logic for initializing the cluster, starting it up, stopping and restarting the cluster, setting up replication and so on. To understand what really is going on one needs to understand the scripts (which is out of scope of this post).