Introduction
Oracle Data Guard is a brilliant component of Oracle Database Enterprise Edition. On its own, it’s a strong argument in favor of Enterprise Edition. This is because Data Guard is a proven Disaster Recovery solution: it’s integrated, reliable and well known by DBAs. And it comes at no additional cost if you don’t need the Active Guard feature. Several days ago, I helped colleagues of mine troubleshooting a Data Guard configuration: let’s summarize how I would proceed to make my standby database back to sync if I have a problem. In this example, I consider having a simple environment with one primary and one standby in Maximum Availability. If you have multiple standby databases, the tasks are the same.
How to check your Data Guard setup?
The easiest way to check your Data Guard setup is by using the Data Guard CLI called the “broker” from one of the servers. The SHOW CONFIGURATION LAG will display 2 key things: your actual configuration (who is primary, who is standby) and the lag between the databases. The lag should be 0 in most cases:
. oraenv <<< DBTEST
dgmgrl / "show configuration lag;"
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Wed Feb 19 15:06:17 2025
Version 19.20.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "DBTEST_IN"
Connected as SYSDG.
Configuration - dtg
Protection Mode: MaxAvailability
Members:
dbtest_in - Primary database
dbtest_cr - Physical standby database
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 0 seconds (computed 1 second ago)
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS (status updated 26 seconds ago)
If the lag is bigger than a couple of seconds or if you have errors in your configuration, you will need to do some troubleshooting.
This is an example of a Data Guard configuration not working correctly:
Configuration - dtg
Protection Mode: MaxAvailability
Members:
dbtest_in - Primary database
Error: ORA-16810: multiple errors or warnings detected for the member
dbtest_cr - Physical standby database
Warning: ORA-16854: apply lag could not be determined
Fast-Start Failover: Disabled
Configuration Status:
ERROR (status updated 56 seconds ago)
Check the Fast Recovery Area on your standby database
A common problem you may have on a standby database is the Fast Recovery Area (FRA) being full. You probably know that a standby database will apply the changes from the primary without waiting for the archivelog to be shipped, but the archivelog is shipped anyway to the standby. This is because there is no guarantee that the standby database is always up and running, so archivelogs must be transported to the other site for later apply if needed. Another thing is that you probably enabled Flashback Database on both databases, and archivelogs are required for a Flashback Database operation. These shipped archivelogs will naturally land in the FRA, and unless you configured a deletion policy, they will never be deleted. As you probably don’t do backups on the standby database, nothing could flag these archivelogs as reclaimable (meaning that they are now useless and can be deleted). If your FRA is quite big, you may discover this problem several weeks or months after the initial setup.
Check the FRA usage on your standby with this query:
select sum(PERCENT_SPACE_USED-PERCENT_SPACE_RECLAIMABLE) "Real FRA usage %" from v$flash_recovery_area_usage;
Real FRA usage %
----------------
32.86
If the FRA is almost full, you can remove older archivelogs, for example those older than 2 days if your standby has a 1-day lag:
rman target /
delete force noprompt archivelog all completed before 'sysdate-2';
exit;
Then check again the FRA and the lag of your Data Guard setup.
Check standby_file_management parameter
Another thing that can break your sync is the standby_file_management parameter having an incorrect value. In most cases, it must be set to AUTOMATIC: it means that any file created on the primary will be created on the standby. It’s the way it is supposed to work:
show parameter standby_file_management
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
standby_file_management string AUTO
If standby_file_management is set to MANUAL, your standby will not be in sync anymore as soon as a new datafile is created on the primary. You will need to manually create the file on the standby to continue the sync. Not very convenient.
The MANUAL mode exists because some older configurations had different filesystems on primary and on standby and didn’t use OMF, meaning that standby database cannot guess where to put the new file.
Both primary and standby databases must have this parameter set to AUTO nowadays.
Cross test connexions
Your Data Guard setup can only work if your databases are able to communicate together. If you are not so sure if something changed on your network, for example a new firewall rule, check your connexions from both servers using the SYS account.
From your primary:
sqlplus sys@DBTEST_CR as sysdba
sqlplus sys@DBTEST_IN as sysdba
From your standby:
sqlplus sys@DBTEST_CR as sysdba
sqlplus sys@DBTEST_IN as sysdba
Check password file
When your standby database is MOUNTED, the only way to authenticate the SYS user is by using the password file. If you changed the SYS password on the primary, it will be changed inside the database (and replicated) as well as in the local password file, but the password file on the standby site won’t be updated. You must then copy the password file from the primary database to the standby database. Copy is done with a scp command, for example from my primary server:
srvctl config database -db DBTEST_IN | grep Password
Password file: /u01/app/odaorahome/oracle/product/19.0.0.0/dbhome_4/dbs/orapwDBTEST
scp `srvctl config database -db OP1 | grep Password | awk '{print $3;}'` oracle@oda-x11-cr:`srvctl config database -db OP1 | grep Password | awk '{print $3;}'`
A restart of the standby database may be needed.
Check alert_DBTEST.log and drcDBTEST.log on both servers
Never miss an error reported in the alert_DBTEST.log on both sides. I would recommend disabling the Data Guard configuration, doing a tail -f on both alert_DBTEST.log files, and enabling back the configuration:
. oraenv <<< DBTEST
dgmgrl / "disable configuration;"
sleep 60
dgmgrl / "enable configuration;"
There are also dedicated trace files for Data Guard, at the same place as alert_DBTEST.log: drcDBTEST.log. You may find additional information for troubleshooting your configuration in these files.
Remove and recreate the configuration
Data Guard configuration is just a couple of parameters stored in a file on both sides. It’s easy to drop and create again this configuration without actually rebuilding the standby database. If you want to make sure that nothing survives from your old configuration, just remove the broker files before creating the configuration again.
From the primary:
dgmgrl sys
edit configuration set protection mode as maxperformance;
remove configuration;
exit;
sqlplus / as sysdba
sho parameter dg_broker_config_file
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
dg_broker_config_file1 string /u01/app/odaorahome/oracle/pro
duct/19.0.0.0/dbhome_4/dbs/dr1
DBTEST_IN.dat
dg_broker_config_file2 string /u01/app/odaorahome/oracle/pro
duct/19.0.0.0/dbhome_4/dbs/dr2
DBTEST_IN.dat
alter system set dg_broker_start=FALSE;
host rm /u01/app/odaorahome/oracle/product/19.0.0.0/dbhome_4/dbs/dr1DBTEST_IN.dat
host rm /u01/app/odaorahome/oracle/product/19.0.0.0/dbhome_4/dbs/dr2DBTEST_IN.dat
exit
From the standby:
sqlplus / as sysdba
sho parameter dg_broker_config_file
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
dg_broker_config_file1 string /u01/app/odaorahome/oracle/pro
duct/19.0.0.0/dbhome_4/dbs/dr1
DBTEST_CR.dat
dg_broker_config_file2 string /u01/app/odaorahome/oracle/pro
duct/19.0.0.0/dbhome_4/dbs/dr2
DBTEST_CR.dat
alter system set dg_broker_start=FALSE;
host rm /u01/app/odaorahome/oracle/product/19.0.0.0/dbhome_4/dbs/dr1DBTEST_CR.dat
host rm /u01/app/odaorahome/oracle/product/19.0.0.0/dbhome_4/dbs/dr2DBTEST_CR.dat
alter system set dg_broker_start=TRUE;
exit
From the primary:
sqlplus / as sysdba
alter system set dg_broker_start=TRUE;
exit;
dgmgrl sys
create configuration DTG as primary database is 'DBTEST_IN' connect identifier is 'DBTEST_IN';
add database 'DBTEST_CR' as connect identifier is 'DBTEST_CR';
enable configuration;
edit database 'DBTEST_CR' set property LogXptMode='SYNC';
edit database 'DBTEST_IN' set property LogXptMode='SYNC';
edit database 'DBTEST_CR' set property StandbyFileManagement='AUTO';
edit database 'DBTEST_IN' set property StandbyFileManagement='AUTO';
EDIT DATABASE 'DBTEST_CR' SET PROPERTY 'ArchiveLagTarget'=1200;
EDIT DATABASE 'DBTEST_IN' SET PROPERTY 'ArchiveLagTarget'=1200;
Edit database 'DBTEST_CR' set property StaticConnectIdentifier='(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=oda-x11-cr)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=DBTEST_CR)(INSTANCE_NAME=DBTEST)(SERVER=DEDICATED)))';
Edit database 'DBTEST_IN' set property StaticConnectIdentifier='(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=oda-x11-in)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=DBTEST_IN)(INSTANCE_NAME=DBTEST)(SERVER=DEDICATED)))';
edit configuration set protection mode as maxavailability;
show configuration lag;
exit;
Note that the StaticConnectIdentifier property is only mandatory when using a port different than 1521.
Recover standby database from service
If you still struggle to get your standby back in sync, because too many archivelogs are missing or because the archivelogs are not on the primary site anymore, you can use this nice RMAN command on your standby:
sqlplus / as sysdba
alter system set dg_broker_start=false;
exit;
srvctl stop database -db DBTEST_CR
sleep 10
srvctl start database -db DBTEST_CR -o mount
rman target /
recover database from service 'DBTEST_IN';
exit;
sqlplus / as sysdba
alter system set dg_broker_start=true;
exit;
This RECOVER DATABASE FROM SERVICE will do an incremental backup on the primary to recover the standby without needing the missing archivelogs. It’s convenient and much faster than rebuilding the standby from scratch.
Check SCN
In the good old days of Data Guard on Oracle 9i, the broker didn’t exist and you had to configure everything yourself. At this time, I used to have a look at the SCN on both databases for monitoring the lag. Nothing changed regarding the SCN: on a primary, you will never see the same SCN each time you query its value. This is because the query itself will increase the SCN by 1, as well as other background queries are running:
select current_scn from v$database;
CURRENT_SCN
-----------
271650667
select current_scn from v$database;
CURRENT_SCN
-----------
271650674
select current_scn from v$database;
CURRENT_SCN
-----------
271650675
select current_scn from v$database;
CURRENT_SCN
-----------
271650678
On a standby database, the SCN can only increase if changes are pushed by a primary. And for sure, the SCN will always be lower than the one on the primary. If your standby database is not opened, meaning that you don’t have the Active Guard option, you will query the same SCN for a couple of minutes, and you will see big jumps in figures from time to time:
select current_scn from v$database;
CURRENT_SCN
-----------
271650664
select current_scn from v$database;
CURRENT_SCN
-----------
271650664
select current_scn from v$database;
CURRENT_SCN
-----------
271651042
select current_scn from v$database;
CURRENT_SCN
-----------
271651042
Conclusion
With these few tips, troubleshooting Data Guard might be easier now.
Ramy
12.03.2025Very helpful 👌
Many thanks