By Mouhamadou Diaw

In a previous blog I talked about the FSFO callout scripts which is a new feature with Oracle 21c with the broker.

This feature will allow to execute some tasks before and after a fast-start failover. By default, the automatic failover will not happen if the pre-script fails. Maybe it’s not what we want, sometimes we will want to continue the automatic failover even if the pre-tasks did not execute successfully.

Oracle has a parameter for this, it is the FastStartFailoverActionOnPreCalloutFailure. This parameter has two values :
STOP: the FSFO will not happen if there is not a .suc file (the pre-tasks fail)
CONTINUE: the FSFO will continue even if the pre-tasks fail

In this blog I do some tests with this parameter and show the results. Below the configuration I used, the same that the one used in my previous blog

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
DGMGRL> show configuration
Configuration - db21
  Protection Mode: MaxPerformance
  Members:
  DB21_SITE1 - Primary database
    DB21_SITE2 - (*) Physical standby database
Fast-Start Failover: Enabled in Potential Data Loss Mode
Configuration Status:
SUCCESS   (status updated 17 seconds ago)
DGMGRL>

FastStartFailoverActionOnPreCalloutFailure=STOP

The first tests are done with the parameter set to STOP. Below my callout scripts

fsfocallout.ora script with the value=STOP

1
2
3
4
5
6
7
8
oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] cat fsfocallout.ora | grep -v ^#
FastStartFailoverPreCallout=fsfo_precallout
FastStartFailoverPreCalloutTimeout=25
FastStartFailoverPreCalloutSucFileName=fsfo_precallout.suc
FastStartFailoverPreCalloutErrorFileName=fsfo_precallout.err
FastStartFailoverActionOnPreCalloutFailure=STOP
FastStartFailoverPostCallout=fsfo_postcallout
oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)]

fsfo_precallout script with errors inside

1
2
3
4
5
6
7
8
9
10
11
12
oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] cat fsfo_precallout
#! /bin/bash
if [ 1 -lt 100 ]
 then
   touch /temp/test
   echo "starting fun observer" > /temp/test
   echo "starting fun observer" > /temp/test
   touch  /u01/app/oracle/aadmin/prod20/broker_files/config_db21/callout/fsfo_precallout.suc
else
  touch /u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.err
fi
oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)]

As you may see, I did some mistakes (/temp instead of /tmp, aadmin instead of admin) in the script so that the pre-tasks will not finish successfully.

Now let’s simulate a Failover to validate the expected behavior

1
2
3
4
5
6
7
8
9
SQL> select db_unique_name,open_mode from v$database;
DB_UNIQUE_NAME                 OPEN_MODE
------------------------------ --------------------
db21_site1                     READ WRITE
SQL> shut abort
ORACLE instance shut down.
SQL>

After the shutdown abort of the primary, we can see in the observer logfile, that the automatic failover did not happen because of the value of the parameter FastStartFailoverActionOnPreCalloutFailure=STOP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[W000 2022-04-13T11:07:07.786+02:00] Fast-Start Failover is not enabled or can't be checked. Retry after 15 seconds.
[W000 2022-04-13T11:07:22.792+02:00] Standby database has changed to DB21_SITE2.
[W000 2022-04-13T11:07:22.794+02:00] Try to connect to the primary.
[W000 2022-04-13T11:07:22.794+02:00] Try to connect to the primary DB21_SITE1.
[W000 2022-04-13T11:07:24.028+02:00] Connection to the primary restored!
[W000 2022-04-13T11:07:24.034+02:00] The standby DB21_SITE2 is ready to be a FSFO target
[W000 2022-04-13T11:07:26.036+02:00] Disconnecting from database DB21_SITE1.
[W000 2022-04-13T11:24:02.493+02:00] Primary database cannot be reached.
[W000 2022-04-13T11:24:02.494+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 15 seconds
[W000 2022-04-13T11:24:03.496+02:00] Try to connect to the primary.
[W000 2022-04-13T11:24:05.797+02:00] Primary database cannot be reached.
[W000 2022-04-13T11:24:06.799+02:00] Try to connect to the primary.
[W000 2022-04-13T11:24:19.665+02:00] Primary database cannot be reached.
[W000 2022-04-13T11:24:19.665+02:00] Fast-Start Failover threshold has expired.
[W000 2022-04-13T11:24:19.666+02:00] Succeeded to parse FSFO callout config file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfocallout.ora'
[W000 2022-04-13T11:24:19.666+02:00] Try to connect to the standby.
[W000 2022-04-13T11:24:19.666+02:00] Check if the standby is ready for failover.
[W000 2022-04-13T11:24:19.685+02:00] Doing pre-FSFO callout.
[W000 2022-04-13T11:24:23.746+02:00] Failed to ping the primary.
[W000 2022-04-13T11:24:29.821+02:00] Failed to ping the primary.
[W000 2022-04-13T11:24:36.020+02:00] Failed to ping the primary.
[W000 2022-04-13T11:24:41.040+02:00] Failed to ping the primary.
[W000 2022-04-13T11:24:41.040+02:00] Failed to detect the pre-FSFO callout suc file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.suc', or error file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.err', after 25 seconds passed.
[W000 2022-04-13T11:24:41.040+02:00] Will not continue Fast-Start Failover since pre-FSFO callout failure action is STOP
[W000 2022-04-13T11:24:41.040+02:00] Returning to primary ping state.
[W000 2022-04-13T11:24:41.040+02:00] Try to connect to the primary.
[W000 2022-04-13T11:24:43.274+02:00] Primary database cannot be reached.

FastStartFailoverActionOnPreCalloutFailure=CONTINUE

If for any raison I want the fsfo to happen event if the pre-tasks fail, I have to explicitly set the value to CONTINUE

Let’s do the same tests but with the parameter to CONTINUE

1
2
3
4
5
6
7
8
oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] cat fsfocallout.ora | grep -v ^#
FastStartFailoverPreCallout=fsfo_precallout
FastStartFailoverPreCalloutTimeout=25
FastStartFailoverPreCalloutSucFileName=fsfo_precallout.suc
FastStartFailoverPreCalloutErrorFileName=fsfo_precallout.err
FastStartFailoverActionOnPreCalloutFailure=CONTINUE
FastStartFailoverPostCallout=fsfo_postcallout
oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)]

In the observer logfile, we can see that as expected the automatic failover happens because the value is CONTINUE for the parameter.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[W000 2022-04-13T11:36:15.409+02:00] Primary database cannot be reached.
[W000 2022-04-13T11:36:15.410+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 15 seconds
[W000 2022-04-13T11:36:16.410+02:00] Try to connect to the primary.
[W000 2022-04-13T11:36:18.949+02:00] Primary database cannot be reached.
[W000 2022-04-13T11:36:19.950+02:00] Try to connect to the primary.
[W000 2022-04-13T11:36:30.063+02:00] Primary database cannot be reached.
[W000 2022-04-13T11:36:30.063+02:00] Fast-Start Failover threshold has expired.
[W000 2022-04-13T11:36:30.072+02:00] Succeeded to parse FSFO callout config file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfocallout.ora'
[W000 2022-04-13T11:36:30.072+02:00] Try to connect to the standby.
[W000 2022-04-13T11:36:30.072+02:00] Check if the standby is ready for failover.
[W000 2022-04-13T11:36:30.087+02:00] Doing pre-FSFO callout.
[W000 2022-04-13T11:36:34.095+02:00] Failed to ping the primary.
[W000 2022-04-13T11:36:40.146+02:00] Failed to ping the primary.
[W000 2022-04-13T11:36:46.255+02:00] Failed to ping the primary.
[W000 2022-04-13T11:36:52.311+02:00] Failed to ping the primary.
[W000 2022-04-13T11:36:55.352+02:00] Failed to detect the pre-FSFO callout suc file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.suc', or error file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.err', after 25 seconds passed.
[W000 2022-04-13T11:36:55.352+02:00] Will continue Fast-Start Failover since pre-FSFO callout failure action is CONTINUE
[S006 2022-04-13T11:36:55.352+02:00] Fast-Start Failover started...
2022-04-13T11:36:55.352+02:00
Initiating Fast-Start Failover to database "DB21_SITE2"...
[S006 2022-04-13T11:36:55.352+02:00] Initiating Fast-start Failover.
2022-04-13T11:36:55.362+02:00
Performing failover NOW, please wait...
2022-04-13T11:37:18.566+02:00
Failover succeeded, new primary is "DB21_SITE2".
2022-04-13T11:37:18.566+02:00
Failover processing complete, broker ready.
2022-04-13T11:37:18.566+02:00
[S006 2022-04-13T11:37:18.566+02:00] Fast-Start Failover finished...
[W000 2022-04-13T11:37:18.566+02:00] Failover succeeded. Restart pinging.
[W000 2022-04-13T11:37:18.582+02:00] Primary database has changed to DB21_SITE2.

Conclusion

We can just say that when dealing with fsfo callout scripts, be sure that the parameter FastStartFailoverActionOnPreCalloutFailure is correctly set according your wishes.