By Mouhamadou Diaw
In a previous blog I talked about the FSFO callout scripts which is a new feature with Oracle 21c with the broker.
This feature will allow to execute some tasks before and after a fast-start failover. By default, the automatic failover will not happen if the pre-script fails. Maybe it’s not what we want, sometimes we will want to continue the automatic failover even if the pre-tasks did not execute successfully.
Oracle has a parameter for this, it is the FastStartFailoverActionOnPreCalloutFailure. This parameter has two values :
STOP: the FSFO will not happen if there is not a .suc file (the pre-tasks fail)
CONTINUE: the FSFO will continue even if the pre-tasks fail
In this blog I do some tests with this parameter and show the results. Below the configuration I used, the same that the one used in my previous blog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
DGMGRL> show configuration Configuration - db21 Protection Mode: MaxPerformance Members: DB21_SITE1 - Primary database DB21_SITE2 - (*) Physical standby database Fast-Start Failover: Enabled in Potential Data Loss Mode Configuration Status: SUCCESS (status updated 17 seconds ago) DGMGRL> |
FastStartFailoverActionOnPreCalloutFailure=STOP
The first tests are done with the parameter set to STOP. Below my callout scripts
fsfocallout.ora script with the value=STOP
1
2
3
4
5
6
7
8
|
oracle@oraadserver3: /u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] cat fsfocallout.ora | grep - v ^ # FastStartFailoverPreCallout=fsfo_precallout FastStartFailoverPreCalloutTimeout=25 FastStartFailoverPreCalloutSucFileName=fsfo_precallout.suc FastStartFailoverPreCalloutErrorFileName=fsfo_precallout.err FastStartFailoverActionOnPreCalloutFailure=STOP FastStartFailoverPostCallout=fsfo_postcallout oracle@oraadserver3: /u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] |
fsfo_precallout script with errors inside
1
2
3
4
5
6
7
8
9
10
11
12
|
oracle@oraadserver3: /u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] cat fsfo_precallout #! /bin/bash if [ 1 -lt 100 ] then touch /temp/test echo "starting fun observer" > /temp/test echo "starting fun observer" > /temp/test touch /u01/app/oracle/aadmin/prod20/broker_files/config_db21/callout/fsfo_precallout .suc else touch /u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout .err fi oracle@oraadserver3: /u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] |
As you may see, I did some mistakes (/temp instead of /tmp, aadmin instead of admin) in the script so that the pre-tasks will not finish successfully.
Now let’s simulate a Failover to validate the expected behavior
1
2
3
4
5
6
7
8
9
|
SQL> select db_unique_name,open_mode from v$ database ; DB_UNIQUE_NAME OPEN_MODE ------------------------------ -------------------- db21_site1 READ WRITE SQL> shut abort ORACLE instance shut down. SQL> |
After the shutdown abort of the primary, we can see in the observer logfile, that the automatic failover did not happen because of the value of the parameter FastStartFailoverActionOnPreCalloutFailure=STOP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
[W000 2022-04-13T11:07:07.786+02:00] Fast-Start Failover is not enabled or can't be checked. Retry after 15 seconds. [W000 2022-04-13T11:07:22.792+02:00] Standby database has changed to DB21_SITE2. [W000 2022-04-13T11:07:22.794+02:00] Try to connect to the primary. [W000 2022-04-13T11:07:22.794+02:00] Try to connect to the primary DB21_SITE1. [W000 2022-04-13T11:07:24.028+02:00] Connection to the primary restored! [W000 2022-04-13T11:07:24.034+02:00] The standby DB21_SITE2 is ready to be a FSFO target [W000 2022-04-13T11:07:26.036+02:00] Disconnecting from database DB21_SITE1. [W000 2022-04-13T11:24:02.493+02:00] Primary database cannot be reached. [W000 2022-04-13T11:24:02.494+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 15 seconds [W000 2022-04-13T11:24:03.496+02:00] Try to connect to the primary. [W000 2022-04-13T11:24:05.797+02:00] Primary database cannot be reached. [W000 2022-04-13T11:24:06.799+02:00] Try to connect to the primary. [W000 2022-04-13T11:24:19.665+02:00] Primary database cannot be reached. [W000 2022-04-13T11:24:19.665+02:00] Fast-Start Failover threshold has expired. [W000 2022-04-13T11:24:19.666+02:00] Succeeded to parse FSFO callout config file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfocallout.ora' [W000 2022-04-13T11:24:19.666+02:00] Try to connect to the standby. [W000 2022-04-13T11:24:19.666+02:00] Check if the standby is ready for failover. [W000 2022-04-13T11:24:19.685+02:00] Doing pre-FSFO callout. [W000 2022-04-13T11:24:23.746+02:00] Failed to ping the primary. [W000 2022-04-13T11:24:29.821+02:00] Failed to ping the primary. [W000 2022-04-13T11:24:36.020+02:00] Failed to ping the primary. [W000 2022-04-13T11:24:41.040+02:00] Failed to ping the primary. [W000 2022-04-13T11:24:41.040+02:00] Failed to detect the pre-FSFO callout suc file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.suc' , or error file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.err' , after 25 seconds passed. [W000 2022-04-13T11:24:41.040+02:00] Will not continue Fast-Start Failover since pre-FSFO callout failure action is STOP [W000 2022-04-13T11:24:41.040+02:00] Returning to primary ping state. [W000 2022-04-13T11:24:41.040+02:00] Try to connect to the primary. [W000 2022-04-13T11:24:43.274+02:00] Primary database cannot be reached. |
FastStartFailoverActionOnPreCalloutFailure=CONTINUE
If for any raison I want the fsfo to happen event if the pre-tasks fail, I have to explicitly set the value to CONTINUE
Let’s do the same tests but with the parameter to CONTINUE
1
2
3
4
5
6
7
8
|
oracle@oraadserver3: /u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] cat fsfocallout.ora | grep - v ^ # FastStartFailoverPreCallout=fsfo_precallout FastStartFailoverPreCalloutTimeout=25 FastStartFailoverPreCalloutSucFileName=fsfo_precallout.suc FastStartFailoverPreCalloutErrorFileName=fsfo_precallout.err FastStartFailoverActionOnPreCalloutFailure=CONTINUE FastStartFailoverPostCallout=fsfo_postcallout oracle@oraadserver3: /u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] |
In the observer logfile, we can see that as expected the automatic failover happens because the value is CONTINUE for the parameter.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
[W000 2022-04-13T11:36:15.409+02:00] Primary database cannot be reached. [W000 2022-04-13T11:36:15.410+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 15 seconds [W000 2022-04-13T11:36:16.410+02:00] Try to connect to the primary. [W000 2022-04-13T11:36:18.949+02:00] Primary database cannot be reached. [W000 2022-04-13T11:36:19.950+02:00] Try to connect to the primary. [W000 2022-04-13T11:36:30.063+02:00] Primary database cannot be reached. [W000 2022-04-13T11:36:30.063+02:00] Fast-Start Failover threshold has expired. [W000 2022-04-13T11:36:30.072+02:00] Succeeded to parse FSFO callout config file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfocallout.ora' [W000 2022-04-13T11:36:30.072+02:00] Try to connect to the standby. [W000 2022-04-13T11:36:30.072+02:00] Check if the standby is ready for failover. [W000 2022-04-13T11:36:30.087+02:00] Doing pre-FSFO callout. [W000 2022-04-13T11:36:34.095+02:00] Failed to ping the primary. [W000 2022-04-13T11:36:40.146+02:00] Failed to ping the primary. [W000 2022-04-13T11:36:46.255+02:00] Failed to ping the primary. [W000 2022-04-13T11:36:52.311+02:00] Failed to ping the primary. [W000 2022-04-13T11:36:55.352+02:00] Failed to detect the pre-FSFO callout suc file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.suc' , or error file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.err' , after 25 seconds passed. [W000 2022-04-13T11:36:55.352+02:00] Will continue Fast-Start Failover since pre-FSFO callout failure action is CONTINUE [S006 2022-04-13T11:36:55.352+02:00] Fast-Start Failover started... 2022-04-13T11:36:55.352+02:00 Initiating Fast-Start Failover to database "DB21_SITE2" ... [S006 2022-04-13T11:36:55.352+02:00] Initiating Fast-start Failover. 2022-04-13T11:36:55.362+02:00 Performing failover NOW, please wait... 2022-04-13T11:37:18.566+02:00 Failover succeeded, new primary is "DB21_SITE2" . 2022-04-13T11:37:18.566+02:00 Failover processing complete, broker ready. 2022-04-13T11:37:18.566+02:00 [S006 2022-04-13T11:37:18.566+02:00] Fast-Start Failover finished... [W000 2022-04-13T11:37:18.566+02:00] Failover succeeded. Restart pinging. [W000 2022-04-13T11:37:18.582+02:00] Primary database has changed to DB21_SITE2. |
Conclusion
We can just say that when dealing with fsfo callout scripts, be sure that the parameter FastStartFailoverActionOnPreCalloutFailure is correctly set according your wishes.