By Franck Pachot
.
Do you remember last year, when 12c arrived with multitenant, David Hueber warned us about the fact that a single PDB can, under certain conditions, generate a complete system downtime? We are beta testers and opened a SR for that. Now one year later the first patchset is out and obviously I checked if the issue was fixed. It’s a patchset afterall, which is expected to fix issues before than bringing new features.
So the issue was that when the SYSTEM tablespace is lost in a PDB, then we cannot restore it without shutting down the whole CDB. This is because we cannot take the SYSTEM tablespace offline, and we cannot close the PDB as a checkpoint cannot be done. There is no SHUTDOWN ABORT for a PDB that can force to it. Conclusion: if you loose one SYSTEM tablespace, either you accept to wait for a maintenance window before bring it back online, or you have to stop the whole CDB with a shutdown abort.
When I receive a new release, I like to check new parameters, even the undocumented ones. And in 12.1.0.2 there is a new underscore parameter _enable_pdb_close_abort which has the description ‘Enable PDB shutdown abort (close abort)’. Great. It has a default value of false but maybe this is how the bug has been addressed.
Before trying that parameter, let’s reproduce the case:
Here are my datafiles:
RMAN> report schema;
using target database control file instead of recovery catalog
Report of database schema for database with db_unique_name CDB_SITE1
List of Permanent Datafiles
===========================
File Size(MB) Tablespace RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1 790 SYSTEM YES /u01/app/oracle/oradata/CDB/system01.dbf
3 680 SYSAUX NO /u01/app/oracle/oradata/CDB/sysaux01.dbf
4 215 UNDOTBS1 YES /u01/app/oracle/oradata/CDB/undotbs01.dbf
5 250 PDB$SEED:SYSTEM NO /u01/app/oracle/oradata/CDB/pdbseed/system01.dbf
6 5 USERS NO /u01/app/oracle/oradata/CDB/users01.dbf
7 540 PDB$SEED:SYSAUX NO /u01/app/oracle/oradata/CDB/pdbseed/sysaux01.dbf
8 250 PDB1:SYSTEM NO /u01/app/oracle/oradata/CDB/PDB1/system01.dbf
9 570 PDB1:SYSAUX NO /u01/app/oracle/oradata/CDB/PDB1/sysaux01.dbf
10 5 PDB1:USERS NO /u01/app/oracle/oradata/CDB/PDB1/PDB1_users01.dbf
11 250 PDB2:SYSTEM NO /u01/app/oracle/oradata/CDB/PDB2/system01.dbf
12 570 PDB2:SYSAUX NO /u01/app/oracle/oradata/CDB/PDB2/sysaux01.dbf
13 5 PDB2:USERS NO /u01/app/oracle/oradata/CDB/PDB2/PDB2_users01.dbf
List of Temporary Files
=======================
File Size(MB) Tablespace Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1 197 TEMP 32767 /u01/app/oracle/oradata/CDB/temp01.dbf
2 100 PDB$SEED:TEMP 32767 /u01/app/oracle/oradata/CDB/pdbseed/pdbseed_temp012014-06-15_09-46-11-PM.dbf
3 20 PDB1:TEMP 32767 /u01/app/oracle/oradata/CDB/PDB1/temp012014-06-15_09-46-11-PM.dbf
4 20 PDB2:TEMP 32767 /u01/app/oracle/oradata/CDB/PDB2/temp012014-06-15_09-46-11-PM.dbf
then I just remove the PDB2 SYSTEM datafile:
rm /u01/app/oracle/oradata/CDB/PDB2/system01.dbf
And I go to sqlplus in order to check the state of my pdb. Remeber, I want to see if I can restore the datafile without doing a shutdown abort on my CDB instance.
SQL*Plus: Release 12.1.0.2.0 Production on Mon Jul 27 20:31:45 2014
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to an idle instance.
SQL> select name,open_mode from v$pdbs;
select name,open_mode from v$pdbs
*
ERROR at line 1:
ORA-01034: ORACLE not available
Process ID: 0
Session ID: 0 Serial number: 0
Oh… that’s bad… Let’s look at the alert.log:
Errors in file /u01/app/oracle/diag/rdbms/cdb/CDB/trace/CDB_ckpt_21620.trc:
ORA-01243: system tablespace file suffered media failure
ORA-01116: error in opening database file 11
ORA-01110: data file 11: '/u01/app/oracle/oradata/CDB/PDB2/system01.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
USER (ospid: 21620): terminating the instance due to error 1243
System state dump requested by (instance=1, osid=21620 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/CDB/trace/CDB_diag_21608_20140727202844.trc
2014-07-27 20:28:49.596000 +02:00
Instance terminated by USER, pid = 21620
The CKPT process has terminated the instance. The whole CDB is down.
That’s worse. In 12.1.0.1 we had to bring down the instance, but at least we were able to choose the time and warn the users. Not here. In 12.1.0.2 it crashes immediately when a checkpoint occurs.
I’ve opened a bug for that (Bug 19001390 – PDB SYSTEM TABLESPACE MEDIA FAILURE CAUSES THE WHOLE CDB TO CRASH) which is expected to be fixed for the next release (12.2).
Ok the good news is that once the CDB is down, recovery is straightforward:
Recovery Manager: Release 12.1.0.2.0 - Production on Mon Jul 27 21:36:22 2014
Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.
connected to target database (not started)
RMAN> startup
Oracle instance started
database mounted
database opened
Total System Global Area 838860800 bytes
Fixed Size 2929936 bytes
Variable Size 616565488 bytes
Database Buffers 213909504 bytes
Redo Buffers 5455872 bytes
RMAN> list failure;
using target database control file instead of recovery catalog
Database Role: PRIMARY
List of Database Failures
=========================
Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- -------
3353 CRITICAL OPEN 27-JUL-14 System datafile 11: '/u01/app/oracle/oradata/CDB/PDB2/system01.dbf' is missing
245 HIGH OPEN 27-JUL-14 One or more non-system datafiles need media recovery
RMAN> advise failure;
Database Role: PRIMARY
List of Database Failures
=========================
Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- -------
3353 CRITICAL OPEN 27-JUL-14 System datafile 11: '/u01/app/oracle/oradata/CDB/PDB2/system01.dbf' is missing
245 HIGH OPEN 27-JUL-14 One or more non-system datafiles need media recovery
analyzing automatic repair options; this may take some time
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=132 device type=DISK
analyzing automatic repair options complete
Mandatory Manual Actions
========================
no manual actions available
Optional Manual Actions
=======================
1. If file /u01/app/oracle/oradata/CDB/PDB2/system01.dbf was unintentionally renamed or moved, restore it
2. Automatic repairs may be available if you shutdown the database and restart it in mount mode
3. If you restored the wrong version of data file /u01/app/oracle/oradata/CDB/PDB2/sysaux01.dbf, then replace it with the correct one
4. If you restored the wrong version of data file /u01/app/oracle/oradata/CDB/PDB2/PDB2_users01.dbf, then replace it with the correct one
Automated Repair Options
========================
Option Repair Description
------ ------------------
1 Restore and recover datafile 11; Recover datafile 12; Recover datafile 13
Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/cdb/CDB/hm/reco_3711091289.hm
RMAN> repair failure;
Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/cdb/CDB/hm/reco_3711091289.hm
contents of repair script:
# restore and recover datafile
restore ( datafile 11 );
recover datafile 11;
# recover datafile
recover datafile 12, 13;
Do you really want to execute the above repair (enter YES or NO)? YES
executing repair script
Starting restore at 27-JUL-14
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00011 to /u01/app/oracle/oradata/CDB/PDB2/system01.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/CDB/FECFFDC5F6D31F5FE043D74EA8C0715F/backupset/2014_07_28/o1_mf_nnndf_TAG20140728T150921_9xdlw21n_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/CDB/FECFFDC5F6D31F5FE043D74EA8C0715F/backupset/2014_07_28/o1_mf_nnndf_TAG20140728T150921_9xdlw21n_.bkp tag=TAG20140728T150921
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
Finished restore at 27-JUL-14
Starting recover at 27-JUL-14
using channel ORA_DISK_1
starting media recovery
media recovery complete, elapsed time: 00:00:01
Finished recover at 27-JUL-14
Starting recover at 27-JUL-14
using channel ORA_DISK_1
starting media recovery
media recovery complete, elapsed time: 00:00:00
Finished recover at 27-JUL-14
repair failure complete
RMAN> alter pluggable database PDB2 open;
Statement processed
I whish that one day the PDB will have true isolation so that I can give DBA rights to the application owner on his PDB. But that means that, at least:
- A PDB failure cannot crash the CDB instance.
- A PDB admin cannot create datafiles anywhere on my server.
- A PDB admin cannot run anything as the instance owner user (usually oracle)