By Mouhamadou Diaw
In a previous blog I talked about the FSFO callout scripts which is a new feature with Oracle 21c with the broker.
This feature will allow to execute some tasks before and after a fast-start failover. By default, the automatic failover will not happen if the pre-script fails. Maybe it’s not what we want, sometimes we will want to continue the automatic failover even if the pre-tasks did not execute successfully.
Oracle has a parameter for this, it is the FastStartFailoverActionOnPreCalloutFailure. This parameter has two values :
STOP: the FSFO will not happen if there is not a .suc file (the pre-tasks fail)
CONTINUE: the FSFO will continue even if the pre-tasks fail
In this blog I do some tests with this parameter and show the results. Below the configuration I used, the same that the one used in my previous blog
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
DGMGRL> show configurationConfiguration - db21 Protection Mode: MaxPerformance Members: DB21_SITE1 - Primary database DB21_SITE2 - (*) Physical standby databaseFast-Start Failover: Enabled in Potential Data Loss ModeConfiguration Status:SUCCESS (status updated 17 seconds ago)DGMGRL> |
FastStartFailoverActionOnPreCalloutFailure=STOP
The first tests are done with the parameter set to STOP. Below my callout scripts
fsfocallout.ora script with the value=STOP
|
1
2
3
4
5
6
7
8
|
oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] cat fsfocallout.ora | grep -v ^#FastStartFailoverPreCallout=fsfo_precalloutFastStartFailoverPreCalloutTimeout=25FastStartFailoverPreCalloutSucFileName=fsfo_precallout.sucFastStartFailoverPreCalloutErrorFileName=fsfo_precallout.errFastStartFailoverActionOnPreCalloutFailure=STOPFastStartFailoverPostCallout=fsfo_postcalloutoracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] |
fsfo_precallout script with errors inside
|
1
2
3
4
5
6
7
8
9
10
11
12
|
oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] cat fsfo_precallout#! /bin/bashif [ 1 -lt 100 ] then touch /temp/test echo "starting fun observer" > /temp/test echo "starting fun observer" > /temp/test touch /u01/app/oracle/aadmin/prod20/broker_files/config_db21/callout/fsfo_precallout.sucelse touch /u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.errfioracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] |
As you may see, I did some mistakes (/temp instead of /tmp, aadmin instead of admin) in the script so that the pre-tasks will not finish successfully.
Now let’s simulate a Failover to validate the expected behavior
|
1
2
3
4
5
6
7
8
9
|
SQL> select db_unique_name,open_mode from v$database;DB_UNIQUE_NAME OPEN_MODE------------------------------ --------------------db21_site1 READ WRITESQL> shut abortORACLE instance shut down.SQL> |
After the shutdown abort of the primary, we can see in the observer logfile, that the automatic failover did not happen because of the value of the parameter FastStartFailoverActionOnPreCalloutFailure=STOP
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
[W000 2022-04-13T11:07:07.786+02:00] Fast-Start Failover is not enabled or can't be checked. Retry after 15 seconds.[W000 2022-04-13T11:07:22.792+02:00] Standby database has changed to DB21_SITE2.[W000 2022-04-13T11:07:22.794+02:00] Try to connect to the primary.[W000 2022-04-13T11:07:22.794+02:00] Try to connect to the primary DB21_SITE1.[W000 2022-04-13T11:07:24.028+02:00] Connection to the primary restored![W000 2022-04-13T11:07:24.034+02:00] The standby DB21_SITE2 is ready to be a FSFO target[W000 2022-04-13T11:07:26.036+02:00] Disconnecting from database DB21_SITE1.[W000 2022-04-13T11:24:02.493+02:00] Primary database cannot be reached.[W000 2022-04-13T11:24:02.494+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 15 seconds[W000 2022-04-13T11:24:03.496+02:00] Try to connect to the primary.[W000 2022-04-13T11:24:05.797+02:00] Primary database cannot be reached.[W000 2022-04-13T11:24:06.799+02:00] Try to connect to the primary.[W000 2022-04-13T11:24:19.665+02:00] Primary database cannot be reached.[W000 2022-04-13T11:24:19.665+02:00] Fast-Start Failover threshold has expired.[W000 2022-04-13T11:24:19.666+02:00] Succeeded to parse FSFO callout config file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfocallout.ora'[W000 2022-04-13T11:24:19.666+02:00] Try to connect to the standby.[W000 2022-04-13T11:24:19.666+02:00] Check if the standby is ready for failover.[W000 2022-04-13T11:24:19.685+02:00] Doing pre-FSFO callout.[W000 2022-04-13T11:24:23.746+02:00] Failed to ping the primary.[W000 2022-04-13T11:24:29.821+02:00] Failed to ping the primary.[W000 2022-04-13T11:24:36.020+02:00] Failed to ping the primary.[W000 2022-04-13T11:24:41.040+02:00] Failed to ping the primary.[W000 2022-04-13T11:24:41.040+02:00] Failed to detect the pre-FSFO callout suc file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.suc', or error file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.err', after 25 seconds passed.[W000 2022-04-13T11:24:41.040+02:00] Will not continue Fast-Start Failover since pre-FSFO callout failure action is STOP[W000 2022-04-13T11:24:41.040+02:00] Returning to primary ping state.[W000 2022-04-13T11:24:41.040+02:00] Try to connect to the primary.[W000 2022-04-13T11:24:43.274+02:00] Primary database cannot be reached. |
FastStartFailoverActionOnPreCalloutFailure=CONTINUE
If for any raison I want the fsfo to happen event if the pre-tasks fail, I have to explicitly set the value to CONTINUE
Let’s do the same tests but with the parameter to CONTINUE
|
1
2
3
4
5
6
7
8
|
oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] cat fsfocallout.ora | grep -v ^#FastStartFailoverPreCallout=fsfo_precalloutFastStartFailoverPreCalloutTimeout=25FastStartFailoverPreCalloutSucFileName=fsfo_precallout.sucFastStartFailoverPreCalloutErrorFileName=fsfo_precallout.errFastStartFailoverActionOnPreCalloutFailure=CONTINUEFastStartFailoverPostCallout=fsfo_postcalloutoracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] |
In the observer logfile, we can see that as expected the automatic failover happens because the value is CONTINUE for the parameter.
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
[W000 2022-04-13T11:36:15.409+02:00] Primary database cannot be reached.[W000 2022-04-13T11:36:15.410+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 15 seconds[W000 2022-04-13T11:36:16.410+02:00] Try to connect to the primary.[W000 2022-04-13T11:36:18.949+02:00] Primary database cannot be reached.[W000 2022-04-13T11:36:19.950+02:00] Try to connect to the primary.[W000 2022-04-13T11:36:30.063+02:00] Primary database cannot be reached.[W000 2022-04-13T11:36:30.063+02:00] Fast-Start Failover threshold has expired.[W000 2022-04-13T11:36:30.072+02:00] Succeeded to parse FSFO callout config file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfocallout.ora'[W000 2022-04-13T11:36:30.072+02:00] Try to connect to the standby.[W000 2022-04-13T11:36:30.072+02:00] Check if the standby is ready for failover.[W000 2022-04-13T11:36:30.087+02:00] Doing pre-FSFO callout.[W000 2022-04-13T11:36:34.095+02:00] Failed to ping the primary.[W000 2022-04-13T11:36:40.146+02:00] Failed to ping the primary.[W000 2022-04-13T11:36:46.255+02:00] Failed to ping the primary.[W000 2022-04-13T11:36:52.311+02:00] Failed to ping the primary.[W000 2022-04-13T11:36:55.352+02:00] Failed to detect the pre-FSFO callout suc file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.suc', or error file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.err', after 25 seconds passed.[W000 2022-04-13T11:36:55.352+02:00] Will continue Fast-Start Failover since pre-FSFO callout failure action is CONTINUE[S006 2022-04-13T11:36:55.352+02:00] Fast-Start Failover started...2022-04-13T11:36:55.352+02:00Initiating Fast-Start Failover to database "DB21_SITE2"...[S006 2022-04-13T11:36:55.352+02:00] Initiating Fast-start Failover.2022-04-13T11:36:55.362+02:00Performing failover NOW, please wait...2022-04-13T11:37:18.566+02:00Failover succeeded, new primary is "DB21_SITE2".2022-04-13T11:37:18.566+02:00Failover processing complete, broker ready.2022-04-13T11:37:18.566+02:00[S006 2022-04-13T11:37:18.566+02:00] Fast-Start Failover finished...[W000 2022-04-13T11:37:18.566+02:00] Failover succeeded. Restart pinging.[W000 2022-04-13T11:37:18.582+02:00] Primary database has changed to DB21_SITE2. |
Conclusion
We can just say that when dealing with fsfo callout scripts, be sure that the parameter FastStartFailoverActionOnPreCalloutFailure is correctly set according your wishes.