Introduction
These days, I was reviewing and updating the “Oracle Backup and Recovery Workshop” from dbi services. It’s all about RMAN, a powerful tool that always impresses my students. But I must admit that this workshop is less popular than before. Today, backup of an Oracle database is not a hot topic anymore. First, nothing is really new regarding RMAN compared to a few years ago. Hardware reliability has also increased significantly, and data centers are more secured than ever. Both making failures less likely to happen. Another point is that Disaster Recovery has become a standard in most projects, meaning that for a lot of failures, enabling the standby database will solve the problem in a minute or two. Finally, Oracle-embedded flashback technologies bring you faster solutions compared to restoring a backup when database is not broken (these technologies are also part of the workshop). That said, it doesn’t mean that you don’t need a strong backup strategy. This is still your very last protection for dramatic scenarios. And you need to test it from time to time. Let’s review what’s important and how to check and test your backup strategy.
Why you still need a reliable backup at any moment?
Let’s remind the main reasons for having a reliable backup:
- you may loose everything on your primary and secondary site
- you may have logically corrupted data from a software error or a human error, also replicated to your standby database
- you may want to recover partial data that flashback technologies cannot bring back
- you may have a security breach widely altering your data (corrupted storage, ransomware, aso)
For sure, this is rather unlikely to happen, but as a DBA, you’re supposed to have a plan in case of any kind of disaster. And a solid RMAN backup is your trustable ally.
Why you probably won’t use a RMAN restore in most cases?
You probably have a Disaster Recovery (DR) setup, meaning 2 sites with database replication using Data Guard (for Enterprise Edition) or Dbvisit Standby (for Standard Edition 2). In most cases, if you need a quick solution for bringing back your database to life after a crash, you will failover to the DR database. It’s a couple of minutes, and if you test your DR database on a regular basis (I mean doing a switchover at least once a year), this operation will succeed without any doubt. A failover is adequate when your primary server is not responding or when your primary database has corrupted files. Doing a failover is always the best thing to do if you want minimum data loss.
If you need data loss, I mean if there is a data corruption due to an error (incorrect SQL commands, application bug), you will probably use the Oracle-embedded flashback technologies. Flashback relies on multiple different technologies for different purposes. If you drop a table, you can bring it back from the recycle bin (FLASHBACK TABLE TO BEFORE DROP), if you delete lines from a table, you can do a query in time (AS OF TIMESTAMP) for example. If the overall database needs to go back in time, and if you’re using Enterprise Edition, you will use FLASHBACK DATABASE. FLASHBACK DATABASE, once enabled, is a mechanism that generates flashback logs, a kind of “INCR -1” automatic backup. With this feature, you can go back at any point in time from a couple of minutes to several hours. It’s much faster than an equivalent RMAN restore. And you can do multiple flashback operations to find the desired Point In Time.
Restoring a backup will be done when nothing else is possible, just because it’s slower to put back older datafiles on disk from backupsets, and recover them.
Probably the most important: make sure RPO and RTO are aligned with the company strategy
As a DBA, you must ensure that these metrics, defined by the company, are met:
- Recovery Point Objective: until when you must be able to go back in time regarding data
- Recovery Time Objective: the maximum time you need to restore the database to any Point In Time within the RPO
You should consider these RPO/RTO as a contract between the DBA and the management. The DBA will require resources, like adequate infrastructure, disk capacity and licenses to reach these objectives.
Regarding RPO, it’s mainly a matter of storage capacity. For RTO, it involves considering database edition, number of licenses/cores and storage bandwidth.
You must test your backup on a regular basis, just to ensure that you still respect the RTO with an increasing amount of data months after months.
Too often, RPO and RTO are defined by the DBA himself. In other words, the DBA makes what’s possible with the resources he has. This is definitely not the best approach.
Check and test your backups
First, you should never trust your backup! Not because it’s not reliable, but because backup is done when the database is opened, meaning that backups are not consistent. Restore is only possible if a complete set of backups (full + incremental + archivelogs) is available, with an adequate controlfile. Green lights from your backup monitoring tool or the successful word at the end of your backup log is simply not enough. The only trustable backup strategy is the one you test on a regular basis.
These are the checks and tests I would do to make sure my backup is OK.
Check RMAN views
With this statement, you are able to guess the backup strategy and see if it works fine at a glance.
set lines 180
set pages 300
col min for 999.9
col status for a20
alter session set NLS_DATE_FORMAT="DD/MM-HH24:MI";
select start_time "Start", round (input_bytes/1024/1024,1) "Source MB", round(output_bytes/1024/1024) "Backup MB", input_type "Type", status "Status", round(elapsed_seconds/60,1) "Min", round(input_bytes/1024/1024/elapsed_seconds,1) "read MB/s", round(output_bytes/1024/1024/elapsed_seconds,1) "write MB/s", round(compression_ratio,1) RATIO from v$rman_backup_job_details where start_time >= SYSDATE-7 order by 1 desc;
Start Source MB Backup MB Type Status Min read MB/s write MB/s RATIO
-------------- ---------- ---------- ------------- ------------ ------ ---------- ---------- ----------
25/08-15:10:06 1443 380 ARCHIVELOG COMPLETED .7 36.1 9.5 3.8
25/08-14:10:06 123 50 ARCHIVELOG COMPLETED .1 20.6 8.3 2.5
25/08-13:10:05 33 27 ARCHIVELOG COMPLETED .1 6.5 5.4 1.2
25/08-12:10:05 32 27 ARCHIVELOG COMPLETED .1 8 6.8 1.2
25/08-11:10:05 46 31 ARCHIVELOG COMPLETED .1 7.7 5.1 1.5
25/08-10:10:05 44 31 ARCHIVELOG COMPLETED .1 7.4 5.1 1.4
25/08-09:10:04 31 27 ARCHIVELOG COMPLETED .1 5.1 4.5 1.1
25/08-08:10:05 32 27 ARCHIVELOG COMPLETED .1 8 6.8 1.2
25/08-07:10:05 30 26 ARCHIVELOG COMPLETED .1 4.9 4.4 1.1
25/08-06:10:05 30 26 ARCHIVELOG COMPLETED .1 5.9 5.3 1.1
25/08-05:10:04 30 27 ARCHIVELOG COMPLETED .1 5 4.4 1.1
25/08-04:10:05 32 27 ARCHIVELOG COMPLETED .1 8 6.8 1.2
25/08-03:10:05 38 29 ARCHIVELOG COMPLETED .1 6.3 4.9 1.3
25/08-02:10:05 30 27 ARCHIVELOG COMPLETED .1 5 4.4 1.1
25/08-01:10:05 30 26 ARCHIVELOG COMPLETED .1 5.9 5.3 1.1
25/08-00:10:05 39 30 ARCHIVELOG COMPLETED .1 7.8 6 1.3
24/08-22:30:06 133858 17371 DB INCR COMPLETED 26.3 84.8 11 7.7
24/08-22:10:05 1238 376 ARCHIVELOG COMPLETED .5 41.3 12.5 3.3
24/08-21:10:05 29 26 ARCHIVELOG COMPLETED .1 7.2 6.6 1.1
24/08-20:10:05 29 26 ARCHIVELOG COMPLETED .1 4.1 3.8 1.1
24/08-19:10:05 34 28 ARCHIVELOG COMPLETED .1 5.6 4.6 1.2
24/08-18:10:05 39 29 ARCHIVELOG COMPLETED .1 7.8 5.8 1.3
24/08-17:10:05 1037 204 ARCHIVELOG COMPLETED .5 35.8 7 5.1
24/08-16:10:05 42 31 ARCHIVELOG COMPLETED .1 6 4.4 1.4
...
In this example, everything is fine (Status=COMPLETED), a full (INCR level 0) is done every night, archivelog backup is done every hour and compression is enabled. Here, read MB/s and write MB/s give you estimated values. If you use incremental backups (INCR level 1), you can identify them with the “Backup MB” column: the figure should be much lower than the one of an INCR level 0.
Having a catalog is definitely more convenient as you can get an overview for all your databases from a unique statement:
set lines 180
set pages 300
col min for 999.9
col status for a20
alter session set NLS_DATE_FORMAT="DD/MM-HH24:MI:SS";
select db_name, start_time "Start", round (input_bytes/1024/1024,1) "Source MB", round(output_bytes/1024/1024,1) "Backup MB", input_type "Type", status "Status", round(elapsed_seconds/60,1) "Min", round(input_bytes/1024/1024/elapsed_seconds,1) "read MB/s", round(output_bytes/1024/1024/elapsed_seconds,1) "write MB/s", round(compression_ratio,1) RATIO from rc_rman_backup_job_details where start_time >= SYSDATE-7 order by 1,2 desc;
...
But this query is not enough to make sure that you can restore your database.
Check if retention is the one you expect
It’s not rare that a DBA is surprised when he checks backup retention and discovers a much lower value than expected. Retention can be changed in a minute, often for solving storage capacity issues. This could have dramatic consequences if you miss the RPO your boss sign up for. But most probably, nobody will know about that…
rman target /
show RETENTION POLICY ;
RMAN configuration parameters for database with db_unique_name ROP_S1 are:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 7 DAYS;
With this retention, you should be able to restore the database at a consistent state between now and 7 days before. It implies keeping backupsets of database for more than 7 days, because you need the FULL or INCR 0 backup before 7 days to reach the oldest Point In Time. And for sure, all the INCR 1 and archivelog backups done after this FULL or INCR 0.
But checking the retention is not enough to make sure you have everything needed for a restore.
Check if backups are known by RMAN
When deleting backups, this retention can be overridden if needed. So there is no guarantee that backups are still in the catalog. You should be able to find backups within this retention, backups of database (FULL or INCR 0) but also backups of archivelogs and INCR 1:
LIST BACKUP OF DATABASE COMPLETED BETWEEN 'sysdate -7' AND 'sysdate';
...
LIST BACKUP OF ARCHIVELOG COMPLETED BETWEEN 'sysdate -7' AND 'sysdate';
...
This is pretty verbose but you will find the various backusets and what’s inside.
Unfortunately, this is still not enough to make sure you can restore your database.
Do a crosscheck to validate that backup pieces are available
LIST BACKUP is only having a look at backup references in the catalog, it doesn’t guarantee that backupset files (backup pieces) are available. Therefore, a crosscheck is needed to match references with files on disk:
CROSSCHECK BACKUPSET;
You can do crosscheck only on a subset of backups, for example the most recents:
CROSSCHECK BACKUPSET COMPLETED AFTER 'sysdate -1';
...
Or backups between 2 dates:
CROSSCHECK BACKUPSET COMPLETED BETWEEN 'sysdate -7' AND 'sysdate -6';
...
All backupsets must be AVAILABLE. If not, they are EXPIRED, meaning not on disk anymore. This is not a normal behavior.
Is it enough for making sure you can restore? Not at all, it doesn’t mean the restore is possible.
Do a restore database validate with and without a Point In Time
Without any downtime and disk usage, you can do a RESTORE DATABASE VALIDATE. It will read backupsets from the backup location and validate that these backupsets are OK to restore the datafiles.
run {
ALLOCATE CHANNEL C1 DEVICE TYPE DISK;
ALLOCATE CHANNEL C2 DEVICE TYPE DISK;
ALLOCATE CHANNEL C3 DEVICE TYPE DISK;
RESTORE DATABASE VALIDATE;
}
This RESTORE DATABASE VALIDATE is also possible with a Point In Time. RMAN is then able to simulate a full restore of the adequate backupsets within the rentention.
run {
SET UNTIL TIME "TO_DATE('10-OCT-2025 14:00:00','DD-MON-YYYY HH24:MI:SS')";
ALLOCATE CHANNEL C1 DEVICE TYPE DISK;
ALLOCATE CHANNEL C2 DEVICE TYPE DISK;
ALLOCATE CHANNEL C3 DEVICE TYPE DISK;
RESTORE DATABASE VALIDATE;
}
Note that it restores the datafiles to /dev/null, meaning nowhere. This is why a RECOVER DATABASE VALIDATE does not exist, it’s the main limit. You will be able to restore the database, but you might not be able to recover the datafiles and open the database.
This test is nice, but not good enough. The recover is mandatory to open the database.
Do a restore AND recover on another server with or without a Point In Time
The best test would be a complete restore/recover on another server. Considering this server has the same system settings, the same Oracle version and the same filesystems, it’s rather easy. First, identify the latest controlfile and spfile autobackup, then restore the spfile and the controlfile from this backupset. RMAN is able to start an instance without spfile, for the only purpose of restoring the spfile:
su - oracle
echo "BMC3PRD:/u02/app/oracle/product/19.0.0.0/dbhome_1:Y" >> /etc/oratab
ls -lrt /backup/BMC3PRD/ | tail -n 1
-rw-r-----. 1 oracle oinstall 19136512 Sep 4 09:20 o1_mf_s_1156169494_lr8gd9h4_.bkp
. oraenv <<< BMC3PRD
rman target /
startup nomount;
startup failed: ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/u02/app/oracle/product/19.0.0.0/dbhome_1/dbs/initBMC3PRD.ora'
starting Oracle instance without parameter file for retrieval of spfile
Oracle instance started
Total System Global Area 1070428224 bytes
Fixed Size 8904768 bytes
Variable Size 281018368 bytes
Database Buffers 771751936 bytes
Redo Buffers 8753152 bytes
restore spfile from '/backup/BMC3PRD/o1_mf_s_1156169494_lr8gd9h4_.bkp';
Starting restore at 04-SEP-25
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=433 device type=DISK
channel ORA_DISK_1: restoring spfile from AUTOBACKUP /backup/BMC3PRD/o1_mf_s_1156169494_lr8gd9h4_.bkp
channel ORA_DISK_1: SPFILE restore from AUTOBACKUP complete
Finished restore at 04-SEP-25
startup force nomount;
restore controlfile from '/backup/BMC3PRD/o1_mf_s_1156169494_lr8gd9h4_.bkp';
Starting restore at 04-SEP-25
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=26 device type=DISK
channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:05
output file name=+RECO/BMC3PRD_CR/CONTROLFILE/current.486.1210937913
Finished restore at 04-SEP-25
startup force mount;
run {
ALLOCATE CHANNEL C1 DEVICE TYPE DISK;
ALLOCATE CHANNEL C2 DEVICE TYPE DISK;
ALLOCATE CHANNEL C3 DEVICE TYPE DISK;
ALLOCATE CHANNEL C4 DEVICE TYPE DISK;
RESTORE DATABASE ;
RECOVER DATABASE ;
ALTER DATABASE OPEN RESETLOGS;
}
...
Testing a restore/recover at a random Point In Time within your retention window is also quite easy: identify the autobackup of spfile and controlfile corresponding to the first one after your Point in Time target, and specify an UNTIL clause in your restore/recover database block:
rman target /
startup nomount;
startup failed: ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/u02/app/oracle/product/19.0.0.0/dbhome_1/dbs/initBMC3PRD.ora'
starting Oracle instance without parameter file for retrieval of spfile
Oracle instance started
Total System Global Area 1070428224 bytes
Fixed Size 8904768 bytes
Variable Size 281018368 bytes
Database Buffers 771751936 bytes
Redo Buffers 8753152 bytes
restore spfile from '/backup/BMC3PRD/o1_mf_s_1156169494_124efa11_.bkp';
...
startup force nomount;
restore controlfile from '/backup/BMC3PRD/o1_mf_s_1156169494_124efa11_.bkp';
...
startup force mount;
run {
SET UNTIL TIME "TO_DATE('10-SEP-2025 09:27:42','DD-MON-YYYY HH24:MI:SS')";
ALLOCATE CHANNEL C1 DEVICE TYPE DISK;
ALLOCATE CHANNEL C2 DEVICE TYPE DISK;
ALLOCATE CHANNEL C3 DEVICE TYPE DISK;
ALLOCATE CHANNEL C4 DEVICE TYPE DISK;
RESTORE DATABASE ;
RECOVER DATABASE ;
ALTER DATABASE OPEN RESETLOGS;
}
...
This time, this test is enough to make sure your backups are OK, because you just did the exact same procedure as if you were facing a real disaster case.
Duplicate from backup with or without a Point In Time
You may not want to restore the database with the same name than the one used for production database, for obvious reasons. You can then restore with a different name thanks to the DUPLICATE command. This command is used for duplicating a target (source) to an auxiliary database (destination), but you can use it without any target, only pointing to a folder containing the backupsets. Starting the instance without a spfile is still possible with RMAN, but you’ll need to connect using target keyword first:
echo "RESTO:/u02/app/oracle/product/19.0.0.0/dbhome_1:Y" >> /etc/oratab
. oraenv <<< RESTO
rman target /
startup nomount;
exit;
rman auxiliary /
run {
ALLOCATE AUXILIARY CHANNEL C1 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL C2 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL C3 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL C4 DEVICE TYPE DISK;
DUPLICATE DATABASE TO RESTO BACKUP LOCATION '/backup/BMC3PRD/' NOFILENAMECHECK;
}
It also works with a Point In Time:
rman auxiliary /
run {
SET UNTIL TIME "TO_DATE('10-OCT-2025 16:30:00','DD-MON-YYYY HH24:MI:SS')";
ALLOCATE AUXILIARY CHANNEL C1 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL C2 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL C3 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL C4 DEVICE TYPE DISK;
DUPLICATE DATABASE TO RESTO BACKUP LOCATION '/backup/BMC3PRD/' NOFILENAMECHECK;
}
All these commands are fine when using OMF: I would recommend using OMF everywhere. If you don’t use OMF, you will need to take care of file names and use path conversions within RMAN commands.
Please, never do a DUPLICATE on a server where a database with the same name as the source database exists: at some point, RMAN needs to restore the controlfile with the name of the source database before changing for the new name, it can be hazardous…
This test is as good as the previous one: it can definitely validate your backups as if you were restoring your production database.
Conclusion
I would summarize this blog post with these 3 things:
- you still need a strong backup strategy: do not underestimate the high importance your backups still have
- RPO and RTO are decided by your company and the DBA asks for resources allowing him to reach these targets. He must warn if resources cannot guarantee these RPO/RTO
- do regular and extensive checks of your backups, and simulate a disaster scenario at least once a year: this is good for validating RPO/RTO and good for validating the reliability of your backup strategy. This is also good for your training as you normally won’t do regular restore/recover operations