By Mouhamadou Diaw
An observer is an OCI client that connects to the primary and target standby databases using the same SYS credentials you used when you connected to the Oracle Data Guard configuration with DGMGRL.
The observer is highly recommended in a Data Guard environment. But it is mandatory if a Fast-Start Failover is configured.
Since Oracle 12.2 we can have up to 3 observers and the maximum number of observers is increased to 4 since Oracle 21c. One important thing is that even if we have multiple observers, only one observer is the master and all other are backup observers. Only the master observer can initiate a fast-start failover process.
The question we often ask is where to host my observers. Does the support of multiple observers close this question?
In this blog I am trying to test many scenarios so that we will have an idea of where to put my observers.
I will suppose that I have 3 datacenters
-The primary datacenter hosting the primary server oraadserver
-The secondary datacenter hosting the primary server oraadserver1
-The third datacenter where I have the server oraadserver3 I can use for observer for example
The fast-start failover is already configured, and I have 3 observers
| 
 1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
 | 
DGMGRL> show configuration verboseConfiguration - db21  Protection Mode: MaxPerformance  Members:  DB21_SITE1 - Primary database    DB21_SITE2 - (*) Physical standby database  (*) Fast-Start Failover target  Properties:    FastStartFailoverThreshold      = '15'    OperationTimeout                = '30'    TraceLevel                      = 'USER'    FastStartFailoverLagLimit       = '30'    CommunicationTimeout            = '180'    ObserverReconnect               = '0'    ObserverPingInterval            = '0'    ObserverPingRetry               = '0'    FastStartFailoverAutoReinstate  = 'TRUE'    FastStartFailoverPmyShutdown    = 'TRUE'    BystandersFollowRoleChange      = 'ALL'    ObserverOverride                = 'FALSE'    ExternalDestination1            = ''    ExternalDestination2            = ''    PrimaryLostWriteAction          = 'CONTINUE'    ConfigurationWideServiceName    = 'DB21_CFG'    ConfigurationSimpleName         = 'db21'    DrainTimeout                    = '0'Fast-Start Failover: Enabled in Potential Data Loss Mode  Lag Limit:          30 seconds  Threshold:          15 seconds  Ping Interval:      3000 milliseconds  Ping Retry:         0  Active Target:      DB21_SITE2  Potential Targets:  "DB21_SITE2"    DB21_SITE2 valid  Observers:      (*) oraadserver1                      oraadserver21                      oraadserver31  Shutdown Primary:   TRUE  Auto-reinstate:     TRUE  Observer Reconnect: (none)  Observer Override:  FALSEConfiguration Status:SUCCESSDGMGRL> | 
Case 1 : The master observer is running on oraadserver so the observer is located in the primary datacenter
| 
 1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
 | 
DGMGRL> show observerConfiguration - db21  Fast-Start Failover:     ENABLED  Primary:            DB21_SITE1  Active Target:      DB21_SITE2Observer "oraadserver1" - Master  Host Name:                    oraadserver  Last Ping to Primary:         0 seconds ago  Last Ping to Target:          0 seconds ago  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver.log  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.datObserver "oraadserver21" - Backup  Host Name:                    oraadserver2  Last Ping to Primary:         2 seconds ago  Last Ping to Target:          2 seconds ago  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver2.log  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.datObserver "oraadserver31" - Backup  Host Name:                    oraadserver3  Last Ping to Primary:         0 seconds ago  Last Ping to Target:          2 seconds ago  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver3.log  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.datDGMGRL> | 
The first test I am doing is to simulate the loss of the first datacenter and to see if a fast-start failover will happen. The loss of the primary datacenter means that I lose both primary database and master observer
Ok let’s poweroff the primary server
| 
 1 
 | 
[root@oraadserver ~]# poweroff | 
In the logfile of one observer located in a remaining datacenter (oraadserver3) we can see following lines
| 
 1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
 | 
[W000 2022-04-15T12:48:16.563+02:00] Primary database cannot be reached.[W000 2022-04-15T12:48:16.563+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 2 seconds[W000 2022-04-15T12:48:17.563+02:00] Try to connect to the primary.[W000 2022-04-15T12:48:19.891+02:00] Primary database cannot be reached.[W000 2022-04-15T12:48:19.891+02:00] Fast-Start Failover threshold has expired.[W000 2022-04-15T12:48:19.891+02:00] Try to connect to the standby.[W000 2022-04-15T12:48:19.891+02:00] Check if the standby is ready for failover.[W000 2022-04-15T12:48:19.899+02:00] Fast-Start Failover is not possible because this observer is not the master.[W000 2022-04-15T12:48:20.902+02:00] Try to connect to the primary.[W000 2022-04-15T12:48:28.908+02:00] Primary database cannot be reached.[W000 2022-04-15T12:48:28.908+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 7 seconds | 
As expected, the fast_start failover did not happen because the master observer was down. But the question is why another observer was not promoted as a master. Yes I have 3 observers, I am expecting that when the master crash that a backup observer will become the master.
I then restart the primary server and confirm that the db_site1 is still the primary database
| 
 1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
 | 
DGMGRL> show configurationConfiguration - db21  Protection Mode: MaxPerformance  Members:  DB21_SITE1 - Primary database    DB21_SITE2 - (*) Physical standby databaseFast-Start Failover: Enabled in Potential Data Loss ModeConfiguration Status:SUCCESS   (status updated 31 seconds ago)DGMGRL> | 
Ok, we restart everything and still have the master observer in the primary datacenter
| 
 1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
 | 
DGMGRL> show fast_start failoverFast-Start Failover: Enabled in Potential Data Loss Mode  Protection Mode:    MaxPerformance  Lag Limit:          30 seconds  Threshold:          15 seconds  Ping Interval:      3000 milliseconds  Ping Retry:         0  Active Target:      DB21_SITE2  Potential Targets:  "DB21_SITE2"    DB21_SITE2 valid  Observers:      (*) oraadserver1                      oraadserver21                      oraadserver31  Shutdown Primary:   TRUE  Auto-reinstate:     TRUE  Observer Reconnect: (none)  Observer Override:  FALSEConfigurable Failover Conditions  Health Conditions:    Corrupted Controlfile          YES    Corrupted Dictionary           YES    Inaccessible Logfile            NO    Stuck Archiver                  NO    Datafile Write Errors          YES  Oracle Error Conditions:    (none)DGMGRL> | 
And let’s kill the observer without crashing the datacenter (we only crash the observer not the primary database)
| 
 1 
2 
3 
4 
5 
6 
7 
8 
 | 
[oracle@oraadserver ~]$ ps -ef | grep -i observeroracle   12816     1  0 12:55 ?        00:00:01 /u01/app/oracle/product/dbhome_1/bin/dgmgrl START OBSERVER NONAME FILE IS 'fsfo.dat'oracle   12988 12959  0 12:57 pts/2    00:00:00 grep --color=auto -i observer[oracle@oraadserver ~]$[oracle@oraadserver ~]$ kill -9 12816[oracle@oraadserver ~]$ | 
We can see in this case that the observer located in another datacenter was promoted to a master one as few minutes after. A fast-start failover will happen if now we crash the primary datacenter.
| 
 1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
 | 
DGMGRL> show fast_start failoverFast-Start Failover: Enabled in Potential Data Loss Mode  Protection Mode:    MaxPerformance  Lag Limit:          30 seconds  Threshold:          15 seconds  Ping Interval:      3000 milliseconds  Ping Retry:         0  Active Target:      DB21_SITE2  Potential Targets:  "DB21_SITE2"    DB21_SITE2 valid  Observers:      (*) oraadserver21                      oraadserver1                      oraadserver31  Shutdown Primary:   TRUE  Auto-reinstate:     TRUE  Observer Reconnect: (none)  Observer Override:  FALSEConfigurable Failover Conditions  Health Conditions:    Corrupted Controlfile          YES    Corrupted Dictionary           YES    Inaccessible Logfile            NO    Stuck Archiver                  NO    Datafile Write Errors          YES  Oracle Error Conditions:    (none)DGMGRL> | 
So seems that if we lose at the same time the master observer and the primary database, no backup observer is promoted to a master.
Case 2 : The master observer is running on oraadserver2 so the observer is located in the secondary datacenter
In this second test, the master observer is in the same datacenter that the standby database. Let’s simulate a crash of the secondary datacenter by crashing the standby server and see what happens
| 
 1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
 | 
DGMGRL> show fast_start failover;Fast-Start Failover: Enabled in Potential Data Loss Mode  Protection Mode:    MaxPerformance  Lag Limit:          30 seconds  Threshold:          15 seconds  Ping Interval:      3000 milliseconds  Ping Retry:         0  Active Target:      DB21_SITE2  Potential Targets:  "DB21_SITE2"    DB21_SITE2 valid  Observers:      (*) oraadserver21                      oraadserver1                      oraadserver31  Shutdown Primary:   TRUE  Auto-reinstate:     TRUE  Observer Reconnect: (none)  Observer Override:  FALSEConfigurable Failover Conditions  Health Conditions:    Corrupted Controlfile          YES    Corrupted Dictionary           YES    Inaccessible Logfile            NO    Stuck Archiver                  NO    Datafile Write Errors          YES  Oracle Error Conditions:    (none)DGMGRL> | 
Let’s poweroff the standby server
| 
 1 
 | 
[root@oraadserver2 ~]# poweroff | 
As expected, there was not a fast-start failover as I lose both standby database and observer because no backup observer was promoted.
And what is also important is that my primary database was shut down by Oracle. Indeed if the alert log of the primary database we can see following lines
| 
 1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
 | 
Thread 1 advanced to log sequence 33 (LGWR switch),  current SCN: 77729099  Current log# 1 seq# 33 mem# 0: /u01/app/oracle/oradata/DB21/onlinelog/o1_mf_1_hx1xy9yc_.log  Current log# 1 seq# 33 mem# 1: /u01/app/oracle/fast_recovery_area/DB21/onlinelog/o1_mf_1_hx1xybv4_.log2022-04-15T13:09:26.832279+02:00ARC0 (PID:12144): Archived Log entry 907 added for B-1101901028.T-1.S-32 ID 0x465cfcd1 LAD:1 [krse.c:4912]2022-04-15T13:10:01.882983+02:00Fast-Start Failover reconfiguration in progress.2022-04-15T13:10:04.874482+02:00DMON: FSFP network call timeout. Killing process FSFP.2022-04-15T13:10:04.898659+02:00Process termination requested for pid 12003 [source = rdbms], [info = 2] [request issued by pid: 11934, uid: 54323]2022-04-15T13:10:07.914848+02:00Starting background process FSFP2022-04-15T13:10:07.986554+02:00FSFP started with pid=7, OS id=137252022-04-15T13:10:11.906564+02:00Primary has heard from neither observer nor target standby within FastStartFailoverThreshold seconds.It is likely an automatic failover has already occurred. Primary is shutting down.2022-04-15T13:10:11.911704+02:00Errors in file /u01/app/oracle/diag/rdbms/db21_site1/DB21/trace/DB21_lg00_11908.trc:ORA-16830: primary isolated from fast-start failover partners longer than FastStartFailoverThreshold seconds: shutting downUSER (ospid: 11908): terminating the instance due to ORA error 168302022-04-15T13:10:12.031189+02:00System state dump requested by (instance=1, osid=11908 (LG00)), summary=[abnormal instance termination].2022-04-15T13:10:12.031406+02:00Memory (Avail / Total) = 792.82M / 3789.53MSwap (Avail / Total) = 3072.00M /  3072.00M2022-04-15T13:10:12.125885+02:00System State dumped to trace file /u01/app/oracle/diag/rdbms/db21_site1/DB21/trace/DB21_diag_11877.trc2022-04-15T13:10:12.699552+02:00Dumping diagnostic data in directory=[cdmp_20220415131012], requested by (instance=1, osid=11908 (LG00)), summary=[abnormal instance termination].2022-04-15T13:10:13.866769+02:00Instance terminated by USER, pid = 119082022-04-15T13:12:58.049262+02:00 | 
This means that if your master observer is located in the same datacenter that the standby server, if your standby datacenter crash,
-No automatic failover will happen
-Your primary database will be shutdown
Case 3 : The master observer is running on oraadserver3 so the observer is located in the third datacenter
| 
 1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
 | 
DGMGRL> show observerConfiguration - db21  Fast-Start Failover:     ENABLED  Primary:            DB21_SITE1  Active Target:      DB21_SITE2Observer "oraadserver31" - Master  Host Name:                    oraadserver3  Last Ping to Primary:         1 second ago  Last Ping to Target:          1 second ago  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver3.log  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.datObserver "oraadserver1" - Backup  Host Name:                    oraadserver  Last Ping to Primary:         1 second ago  Last Ping to Target:          0 seconds ago  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver.log  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.datObserver "oraadserver21" - Backup  Host Name:                    oraadserver2  Last Ping to Primary:         1 second ago  Last Ping to Target:          0 seconds ago  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver2.log  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.datDGMGRL> | 
Now let’s crash the third datacenter which only host the master observer, no primary or standby database is running on this datacenter.
| 
 1 
 | 
[root@oraadserver3 ~]# poweroff | 
A few minutes after, a backup observer was automatically promoted to a master one.
| 
 1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
 | 
DGMGRL> show observerConfiguration - db21  Fast-Start Failover:     ENABLED  Primary:            DB21_SITE1  Active Target:      DB21_SITE2Observer "oraadserver1" - Master  Host Name:                    oraadserver  Last Ping to Primary:         0 seconds ago  Last Ping to Target:          2 seconds ago  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver.log  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.datObserver "oraadserver21" - Backup  Host Name:                    oraadserver2  Last Ping to Primary:         0 seconds ago  Last Ping to Target:          2 seconds ago  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver2.log  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.datObserver "oraadserver31" - Backup  Host Name:                    oraadserver3  Last Ping to Primary:         59 seconds ago  Last Ping to Target:          59 seconds ago  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver3.log  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat | 
To resume we can see that
Prmiary database and master observer in the same datacenter
-loss of datacenter = No automatic failover because no master observer promoted
Standby database and master observer in the same datacenter
-loss of datacenter = No automatic failover because no master observer promoted + shutdown of primary database
Master observer in a third datacenter
-loss of datacenter = a backup observer will be promoted to a master one.
Conclusion
I will conclude with a question
Where will you put your master observer if you have
2 datacenters?
3 datacenter?
Hope this blog will help
							
							
							
							
							
							
							
							
							
Saurabh
01.11.2023I will conclude with a question
Where will you put your master observer if you have
2 datacenters? Answer:oraadserver2
3 datacenter? Answer:oraadserver3