By Mouhamadou Diaw
With Oracle 12.2, in a Data Guard environment corrupted data blocks can be automatically replaced with uncorrupted copies of those blocks.
There are just some requirements:
• The physical standby database must be operating in real-time query mode, which requires an Oracle Active Data Guard license.
• The physical standby database must be running real-time apply.
Automatic block media recovery works in two directions depending on whether the corrupted blocks are encountered on the primary or on the standby.
This recovery can happen if corrupted data are encountered in the primary database or in the secondary database.
In this blog we are going to see how this feature works.
Below the configuration we are using
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
DGMGRL> show configuration; Configuration - ORCL_DR Protection Mode: MaxAvailability Members: ORCL_SITE - Primary database ORCL_SITE1 - Physical standby database ORCL_SITE2 - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS (status updated 19 seconds ago) DGMGRL> |
We can verifiy the status of our databases
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
|
DGMGRL> show database 'ORCL_SITE' ; Database - ORCL_SITE Role: PRIMARY Intended State: TRANSPORT- ON Instance(s): ORCL Database Status: SUCCESS DGMGRL> show database 'ORCL_SITE1' ; Database - ORCL_SITE1 Role: PHYSICAL STANDBY Intended State: APPLY- ON Transport Lag: 0 seconds (computed 1 second ago) Apply Lag: 0 seconds (computed 1 second ago) Average Apply Rate: 5.00 KByte/s Real Time Query: ON Instance(s): ORCL Database Status: SUCCESS DGMGRL> show database 'ORCL_SITE2' ; Database - ORCL_SITE2 Role: PHYSICAL STANDBY Intended State: APPLY- ON Transport Lag: 0 seconds (computed 1 second ago) Apply Lag: 0 seconds (computed 1 second ago) Average Apply Rate: 4.00 KByte/s Real Time Query: ON Instance(s): ORCL Database Status: SUCCESS DGMGRL> |
The feature works for any protection mode, but in our case the tests are done with a MaxAvailability mode
If corrupt data blocks are on the primary database, then the primary automatically searches for good copies of those blocks on a standby and, if they are found, has them shipped back to the primary.
This only requirement is that the primary requires a LOG_ARCHIVE_DEST_n to the standby. That already should be the case in a Data Guard environment.
If corrupted block is located on the standby, the standby will automatically request uncorrupted copies of those blocks to the primary. The condition for this mechanism to work is
• The LOG_ARCHIVE_CONFIG parameter is configured with a DG_CONFIG list and a LOG_ARCHIVE_DEST_n parameter is configured for the primary database.
or
• The FAL_SERVER parameter is configured and its value contains an Oracle Net service name for the primary database.
In this demonstration we will simulate data corruption in the primary database. But the scenario is the same for a corrupted blocks at standby side and will work same.
From the primary we can verify that LOG_ARCHIVE_DEST_n is set.
1
2
3
4
5
6
7
|
SQL> select dest_name,DESTINATION,status from v$archive_dest where destination is not null ; DEST_NAME DESTINATION STATUS -------------------- ------------------------------ --------- LOG_ARCHIVE_DEST_1 USE_DB_RECOVERY_FILE_DEST VALID LOG_ARCHIVE_DEST_2 ORCL_SITE1 VALID LOG_ARCHIVE_DEST_3 ORCL_SITE2 VALID |
For the demonstration let’s consider a table of user SCOTT in a tablespace mytab
1
2
3
4
5
6
7
|
SQL> select table_name,tablespace_name from dba_tables where owner= 'SCOTT' and table_name= 'EMPBIS' ; TABLE_NAME TABLESPACE_NAME --------------- ------------------------------ EMPBIS MYTAB SQL> |
Now let’s identify blocks for the table on the primary and let’s corrupt them.
1
2
3
4
5
6
7
|
SQL> select * from ( select distinct dbms_rowid.rowid_block_number(rowid) from scott.empbis); DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) ------------------------------------ 131 SQL> |
And then let’s run following command to corrupt corresponding blocks.
1
2
3
4
5
|
[oracle@primaserver ORCL]$ dd of= /u01/app/oracle/oradata/ORCL/mytab01 .dbf bs=8192 seek=131 conv=notrunc count=1 if = /dev/zero 1+0 records in 1+0 records out 8192 bytes (8.2 kB) copied, 0.000315116 s, 26.0 MB /s [oracle@primaserver ORCL]$ |
In a normal environment, accessing to corrupted data by a SELECT will return errors.
But in our case on the primary if we flush the buffer_cache, and and we do a select on the table, rows are returned without errors.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
|
13:41:18 SQL> alter system flush buffer_cache; System altered. 13:41:22 SQL> select * from scott.empbis; EMPNO ENAME JOB MGR HIREDATE SAL COMM ---------- ---------- --------- ---------- --------- ---------- ---------- DEPTNO ---------- 7369 SMITH CLERK 7902 17- DEC -80 800 20 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 EMPNO ENAME JOB MGR HIREDATE SAL COMM ---------- ---------- --------- ---------- --------- ---------- ---------- DEPTNO ---------- 7566 JONES MANAGER 7839 02-APR-81 2975 20 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 EMPNO ENAME JOB MGR HIREDATE SAL COMM ---------- ---------- --------- ---------- --------- ---------- ---------- DEPTNO ---------- 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 7788 SCOTT ANALYST 7566 19-APR-87 3000 20 7839 KING PRESIDENT 17-NOV-81 5000 10 EMPNO ENAME JOB MGR HIREDATE SAL COMM ---------- ---------- --------- ---------- --------- ---------- ---------- DEPTNO ---------- 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 7876 ADAMS CLERK 7788 23-MAY-87 1100 20 7900 JAMES CLERK 7698 03- DEC -81 950 30 EMPNO ENAME JOB MGR HIREDATE SAL COMM ---------- ---------- --------- ---------- --------- ---------- ---------- DEPTNO ---------- 7902 FORD ANALYST 7566 03- DEC -81 3000 20 7934 MILLER CLERK 7782 23-JAN-82 1300 10 14 rows selected. 13:41:27 SQL> |
Indeed Oracle automatically recovered corrupted blocks. And in the primary alert log at the same time we can see that a recovery was done.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
ALTER SYSTEM: Flushing buffer cache inst=0 container=0 global 2018-01-26T13:41:26.540640+01:00 Hex dump of ( file 2, block 131) in trace file /u01/app/oracle/diag/rdbms/orcl_site/ORCL/trace/ORCL_ora_3359 .trc Corrupt block relative dba: 0x00800083 ( file 2, block 131) Completely zero block found during multiblock buffer read Reading datafile '/u01/app/oracle/oradata/ORCL/mytab01.dbf' for corruption at rdba: 0x00800083 ( file 2, block 131) Reread ( file 2, block 131) found same corrupt data (no logical check) Automatic block media recovery requested for ( file # 2, block# 131) 2018-01-26T13:41:26.545798+01:00 Corrupt Block Found TIME STAMP (GMT) = 01 /26/2018 13:41:25 CONT = 0, TSN = 7, TSNAME = MYTAB RFN = 2, BLK = 131, RDBA = 8388739 OBJN = 74352, OBJD = 74352, OBJECT = EMPBIS, SUBOBJECT = SEGMENT OWNER = SCOTT, SEGMENT TYPE = Table Segment 2018-01-26T13:41:27.002934+01:00 Automatic block media recovery successful for ( file # 2, block# 131) 2018-01-26T13:41:27.005015+01:00 Automatic block media recovery successful for ( file # 2, block# 131) |
Conclusion:
We have seen, that an active DataGuard environment may help with corrupted data.