Some days ago EnterpriseDB released a new version of its EDB Failover Manager which brings one feature that really sounds great: “Controlled switchover and switchback for easier maintenance and disaster recovery tests”. This is exactly what you want when you are used to operate Oracle DataGuard. Switching back and forward as you like without caring much about the old master. The old master shall just be converted to a standby which follows the new master automatically. This post is about upgrading EFM from version 2.0 to 2.1.
As I still have the environment available which was used for describing the maintenance scenarios with EDB Failover Manager (Maintenance scenarios with EDB Failover Manager (1) – Standby node, Maintenance scenarios with EDB Failover Manager (2) – Primary node and Maintenance scenarios with EDB Failover Manager (3) – Witness node) I will use the same environment to upgrade to the new release. Lets start …
This is the current status of my failover cluster:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | [root@edbbart ~]$ /usr/edb-efm/bin/efm cluster-status efm Cluster Status: efm Automatic failover is disabled. Agent Type Address Agent DB Info -------------------------------------------------------------- Master 192.168.22.245 UP UP Witness 192.168.22.244 UP N /A Standby 192.168.22.243 UP UP Allowed node host list: 192.168.22.244 192.168.22.245 192.168.22.243 Standby priority host list: 192.168.22.243 Promote Status: DB Type Address XLog Loc Info -------------------------------------------------------------- Master 192.168.22.245 0 /3B01C5E0 Standby 192.168.22.243 0 /3B01C5E0 Standby database(s) in sync with master. It is safe to promote. [root@edbbart ~]$ |
Obviously you have to download the new version to begin the upgrade. Once the rpm is available on all nodes simply install it on all the nodes:
1 | [root@edbppas tmp]$ yum localinstall efm21-2.1.0-1.rhel7.x86_64.rpm |
EFM 2.1 comes with an utility command that helps in upgrading a cluster. You should invoke it on each node:
1 2 3 4 5 6 7 8 | [root@edbbart tmp]$ /usr/efm-2 .1 /bin/efm upgrade-conf efm Processing efm.properties file . Setting new property node.timeout to 40 (sec) based on existing timeout 5000 (ms) and max tries 8. Processing efm.nodes file . Upgrade of files is finished. Please ensure that the new file permissions match those of the template files before starting EFM. The db.service.name property should be set before starting a non-witness agent. |
This created a new configuration file in the new directory under /etc which was created when the new version was installed:
1 2 | [root@edbbart tmp]$ ls /etc/efm-2 .1 efm.nodes efm.nodes. in efm.properties efm.properties. in |
All the values from the old EFM cluster should be there in the new configuration files:
1 2 3 4 | [root@edbbart efm-2.1]$ pwd /etc/efm-2 .1 [root@edbbart efm-2.1]$ cat efm.properties | grep daniel user.email=daniel.westermann... |
Before going further check the new configuration parameters for EFM 2.1, which are:
1 2 3 4 5 6 7 8 9 10 | auto.allow.hosts auto.resume.period db.service.name jvm.options minimum.standbys node.timeout promotable recovery.check.period script.notification script.resumed |
I’ll leave everything as it was before for now. Notice that a new service got created:
1 2 3 | [root@edbppas efm-2.1]$ systemctl list-unit-files | grep efm efm-2.0.service enabled efm-2.1.service disabled |
Lets try to shutdown the old service on all nodes and then start the new one. Step 1 (on all nodes):
1 2 3 | [root@edbppas efm-2.1]$ systemctl stop efm-2.0.service [root@edbppas efm-2.1]$ systemctl disable efm-2.0.service rm '/etc/systemd/system/multi-user.target.wants/efm-2.0.service' |
Then enable the new service:
1 2 3 4 5 | [root@edbppas efm-2.1]$ systemctl enable efm-2.1.service ln -s '/usr/lib/systemd/system/efm-2.1.service' '/etc/systemd/system/multi-user.target.wants/efm-2.1.service' [root@edbppas efm-2.1]$ systemctl list-unit-files | grep efm efm-2.0.service disabled efm-2.1.service enabled |
Make sure your efm.nodes file contains all the nodes which make up the cluster, in my case:
1 2 3 4 | [root@edbppas efm-2.1]$ cat efm.nodes # List of node address:port combinations separated by whitespace. # The list should include at least the membership coordinator's address. 192.168.22.243:9998 192.168.22.244:9998 192.168.22.245:9998 |
Lets try to start the new service on the witness node first:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | [root@edbbart efm-2.1]$ systemctl start efm-2.1.service [root@edbbart efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm Cluster Status: efm VIP: 192.168.22.250 Automatic failover is disabled. Agent Type Address Agent DB Info -------------------------------------------------------------- Witness 192.168.22.244 UP N /A Allowed node host list: 192.168.22.244 Membership coordinator: 192.168.22.244 Standby priority host list: (List is empty.) Promote Status: Did not find XLog location for any nodes. |
Looks good. Are we really running the new version?
1 2 | [root@edbbart efm-2.1]$ /usr/edb-efm/bin/efm - v Failover Manager, version 2.1.0 |
Looks fine as well. Time to add the other nodes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | [root@edbbart efm-2.1]$ /usr/edb-efm/bin/efm add-node efm 192.168.22.243 add-node signal sent to local agent. [root@edbbart efm-2.1]$ /usr/edb-efm/bin/efm add-node efm 192.168.22.245 add-node signal sent to local agent. [root@edbbart efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm Cluster Status: efm VIP: 192.168.22.250 Automatic failover is disabled. Agent Type Address Agent DB Info -------------------------------------------------------------- Witness 192.168.22.244 UP N /A Allowed node host list: 192.168.22.244 192.168.22.243 Membership coordinator: 192.168.22.244 Standby priority host list: (List is empty.) Promote Status: Did not find XLog location for any nodes. |
Proceed on the master:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | [root@ppasstandby efm-2.1]$ systemctl start efm-2.1.service [root@ppasstandby efm-2.1]$ systemctl status efm-2.1.service efm-2.1.service - EnterpriseDB Failover Manager 2.1 Loaded: loaded ( /usr/lib/systemd/system/efm-2 .1.service; enabled) Active: active (running) since Thu 2016-09-08 12:04:11 CEST; 25s ago Process: 4020 ExecStart= /bin/bash -c /usr/efm-2 .1 /bin/runefm .sh start ${CLUSTER} (code=exited, status=0 /SUCCESS ) Main PID: 4075 (java) CGroup: /system .slice /efm-2 .1.service └─4075 /usr/lib/jvm/java-1 .8.0-openjdk-1.8.0.77-0.b03.el7_2.x86_64 /jre/bin/java - cp /usr/e ... Sep 08 12:04:07 ppasstandby systemd[1]: Starting EnterpriseDB Failover Manager 2.1... Sep 08 12:04:08 ppasstandby sudo [4087]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND= /usr/efm- ... efm Sep 08 12:04:08 ppasstandby sudo [4098]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND= /usr/efm- ... efm Sep 08 12:04:08 ppasstandby sudo [4114]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAND= /usr/ ... efm Sep 08 12:04:08 ppasstandby sudo [4125]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAND= /usr/ ... efm Sep 08 12:04:10 ppasstandby sudo [4165]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND= /usr/efm- ...9998 Sep 08 12:04:10 ppasstandby sudo [4176]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND= /usr/efm- ...4075 Sep 08 12:04:11 ppasstandby systemd[1]: Started EnterpriseDB Failover Manager 2.1. Hint: Some lines were ellipsized, use -l to show in full. |
And then continue on the standby:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | [root@edbppas efm-2.1]$ systemctl start efm-2.1.service [root@edbppas efm-2.1]$ systemctl status efm-2.1.service efm-2.1.service - EnterpriseDB Failover Manager 2.1 Loaded: loaded ( /usr/lib/systemd/system/efm-2 .1.service; enabled) Active: active (running) since Thu 2016-09-08 12:05:28 CEST; 3s ago Process: 3820 ExecStart= /bin/bash -c /usr/efm-2 .1 /bin/runefm .sh start ${CLUSTER} (code=exited, status=0 /SUCCESS ) Main PID: 3875 (java) CGroup: /system .slice /efm-2 .1.service └─3875 /usr/lib/jvm/java-1 .8.0-openjdk-1.8.0.77-0.b03.el7_2.x86_64 /jre/bin/jav ... Sep 08 12:05:24 edbppas systemd[1]: Starting EnterpriseDB Failover Manager 2.1... Sep 08 12:05:25 edbppas sudo [3887]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND= /u ...efm Sep 08 12:05:25 edbppas sudo [3898]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND= /u ...efm Sep 08 12:05:25 edbppas sudo [3914]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAN...efm Sep 08 12:05:25 edbppas sudo [3925]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAN...efm Sep 08 12:05:25 edbppas sudo [3945]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAN...efm Sep 08 12:05:28 edbppas sudo [3981]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND= /u ...998 Sep 08 12:05:28 edbppas sudo [3994]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND= /u ...875 Sep 08 12:05:28 edbppas systemd[1]: Started EnterpriseDB Failover Manager 2.1. Hint: Some lines were ellipsized, use -l to show in full. |
What is the cluster status now?:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | [root@edbbart efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm Cluster Status: efm VIP: 192.168.22.250 Automatic failover is disabled. Agent Type Address Agent DB Info -------------------------------------------------------------- Master 192.168.22.245 UP UP Witness 192.168.22.244 UP N /A Standby 192.168.22.243 UP UP Allowed node host list: 192.168.22.244 192.168.22.243 192.168.22.245 Membership coordinator: 192.168.22.244 Standby priority host list: 192.168.22.243 Promote Status: DB Type Address XLog Loc Info -------------------------------------------------------------- Master 192.168.22.245 0 /3B01C7A0 Standby 192.168.22.243 0 /3B01C7A0 Standby database(s) in sync with master. It is safe to promote. |
Cool. Back in operation on the new release. Quite easy.
PS: Remember to re-point your symlinks in /etc and /usr if you created symlinks for easy of use.