As a consultant working for customers, I’m often in the situation that I have an answer to a problem, but the recommended solution cannot be implemented due to some restrictions. E.g. the recommendation would be to adjust the code, but that is not feasible. In such cases you are forced to try to help without code changes.
Recently I was confronted with the following issue: A process takes too long. Digging deeper I could see that most of the time was spent on this SQL:
DELETE FROM COM_TAB WHERE 1=1
The execution plan looked as follows:
-------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | DELETE STATEMENT | | | | 16126 (100)| | | 1 | DELETE | COM_TAB | | | | | | 2 | INDEX FAST FULL SCAN| PK_COM_TAB | 10M| 306M| 16126 (1)| 00:00:01 | --------------------------------------------------------------------------------------------
My initial reaction was of course to say that deleting all data in a table with a delete statement is not a good idea. Better is to turn the DML into DDL and use e.g. “truncate table”. All options for deleting lots of rows in a table fast are provided by Chris Saxon in his Blog here.
In this case changing the SQL was not possible, so what are the alternatives?
As I was involved in this a long time after the issue happened I checked the ASH of AWR-History:
SQL> select SQL_EXEC_START, session_state, event, count(*)*10 secs_in_state FROM dba_hist_active_sess_history where sql_id='53gwjb0gjn1np' 2 group by sql_exec_start, session_state, event order by 1,4 desc; SQL_EXEC_START SESSION EVENT SECS_IN_STATE ------------------- ------- ---------------------------------------------------------------- ------------- 19.06.2020 10:13:02 WAITING free buffer waits 560 19.06.2020 10:13:02 WAITING enq: CR - block range reuse ckpt 370 19.06.2020 10:13:02 ON CPU 130 19.06.2020 10:13:02 WAITING reliable message 10 19.06.2020 10:56:01 WAITING enq: CR - block range reuse ckpt 550 19.06.2020 10:56:01 WAITING free buffer waits 230 19.06.2020 10:56:01 ON CPU 140 19.06.2020 10:56:01 WAITING log file switch (checkpoint incomplete) 60 19.06.2020 11:39:38 WAITING enq: CR - block range reuse ckpt 610 19.06.2020 11:39:38 WAITING free buffer waits 180 19.06.2020 11:39:38 ON CPU 170 19.06.2020 11:39:38 WAITING log file switch (checkpoint incomplete) 80 19.06.2020 11:39:38 WAITING write complete waits 40 19.06.2020 12:23:47 WAITING enq: CR - block range reuse ckpt 450 19.06.2020 12:23:47 WAITING free buffer waits 280 19.06.2020 12:23:47 ON CPU 150 19.06.2020 12:23:47 WAITING log file switch (checkpoint incomplete) 90 19.06.2020 12:23:47 WAITING write complete waits 30 19.06.2020 12:23:47 WAITING log buffer space 10
So obviously the DBWR had a problem writing dirty blocks to disk and getting free space in the cache. When the issue happened above the following parameter were active:
filesystemio_options='ASYNCH'
Changing it to
filesystemio_options='SETALL'
improved the situation a lot, but caused waits on “db file sequential read”.
I.e. with filesystemio_options=’ASYNCH’ we do cache lots of repeatedly touched blocks in the filesystem cache, but suffer from slower (non-direct) writes by the DB-writer. With filesystemio_options=’SETALL’ we gain by doing direct IO by the DB-writer, but have to read repeatedly touched blocks from disk more often.
The table just had 1 index, the index for the primary key.
So what to do here?
Several recommendations came to mind:
– With filesystemio_options=’ASYNCH’: Increase the redologs to not do a checkpoint while the statement is running
– With filesystemio_options=’SETALL’: Increase the buffer cache to keep blocks in memory for longer and avoid single block IOs
The most interesting question was: Why is the optimizer deciding to go over the index here first? With a bad clustering factor it would make more sense to do a full table scan than to use the index. And this has actually been validated with a hint:
DELETE /*+ FULL(COM_TAB) */ FROM COM_TAB WHERE 1=1
improved the situation.
An improvement should be achievable by using an Index Organized Table here as we only have a primary key index on the table, so that we just wipe out the data in the index and do not have to visit the same table block repeatedly again. The best however is to create a testcase and reproduce the issue. Here’s what I did:
I created 2 tables
TDEL_GOOD_CF
TDEL_BAD_CF
which do have more blocks than I have in the db-cache. As the name suggests one table had an index with a better clustering factor and one an index with a bad clustering factor:
SQL> select table_name, blocks, num_rows from tabs where table_name like 'TDEL_%'; TABLE_NAME BLOCKS NUM_ROWS -------------------------------- ---------- ---------- TDEL_BAD_CF 249280 544040 TDEL_GOOD_CF 248063 544040
Remark: To use lots of blocks I stored only 2 rows per block by using a high PCTFREE.
SQL> select index_name, leaf_blocks, clustering_factor from ind where table_name like 'TDEL_%'; INDEX_NAME LEAF_BLOCKS CLUSTERING_FACTOR -------------------------------- ----------- ----------------- PK_TDEL_BAD_CF 1135 532313 PK_TDEL_GOOD_CF 1135 247906
The database cache size was much smaller than the blocks in the table:
SQL> select bytes/8192 BLOCKS_IN_BUFFER_CACHE from v$sgastat where name='buffer_cache'; BLOCKS_IN_BUFFER_CACHE ---------------------- 77824
Test with filesystemio_options=’SETALL’:
SQL> set autotrace trace timing on SQL> delete from TDEL_BAD_CF where 1=1; 544040 rows deleted. Elapsed: 00:01:08.76 Execution Plan ---------------------------------------------------------- Plan hash value: 2076794500 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | DELETE STATEMENT | | 544K| 2656K| 315 (2)| 00:00:01 | | 1 | DELETE | TDEL_BAD_CF | | | | | | 2 | INDEX FAST FULL SCAN| PK_TDEL_BAD_CF | 544K| 2656K| 315 (2)| 00:00:01 | ---------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 88 recursive calls 2500712 db block gets 1267 consistent gets 477213 physical reads 388185816 redo size 195 bytes sent via SQL*Net to client 384 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 7 sorts (memory) 0 sorts (disk) 544040 rows processed
Please consider the 477213 physical reads (blocks read), i.e. almost 2 times the number of blocks in the table.
The ASH-data looked as follows:
select sql_id, sql_plan_line_id, session_state, event, p1, count(*) from v$active_session_history where sql_id='ck5fw78yqh93g' group by sql_id,sql_plan_line_id, session_state, event, p1 order by 6; SQL_ID SQL_PLAN_LINE_ID SESSION EVENT P1 COUNT(*) ------------- ---------------- ------- -------------------------------- ---------- ---------- ck5fw78yqh93g 1 WAITING db file scattered read 7 1 ck5fw78yqh93g 1 ON CPU 7 11 ck5fw78yqh93g 1 WAITING db file sequential read 7 56
P1 is the file_id when doing IO. File ID 7 is the USERS-Tablepspace where my table and index are in.
So obviously Oracle didn’t consider the clustering factor when building the plan with the index. The cost of 315 is just the cost for the INDEX FAST FULL SCAN:
Fast Full Index Scan Cost ~ ((LEAF_BLOCKS/MBRC) x MREADTIM)/ SREADTIM + CPU
REMARK: I do not have system statistics gathered.
LEAF_BLOCKS=1135
MBRC=8
MREADTIM=26ms
SREADTIM=12ms
Fast Index Scan Cost ~ ((1135/8) x 26)/ 12 + CPU = 307 + CPU = 315
The costs for accessing the table are not considered at all. I.e. going through the index and from there to the table to delete the rows results in visiting the same table block several times.
Here the test with the table having a better clustering factor on the index:
SQL> delete from TDEL_GOOD_CF where 1=1; 544040 rows deleted. Elapsed: 00:00:30.48 Execution Plan ---------------------------------------------------------- Plan hash value: 4284904063 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | DELETE STATEMENT | | 544K| 2656K| 315 (2)| 00:00:01 | | 1 | DELETE | TDEL_GOOD_CF | | | | | | 2 | INDEX FAST FULL SCAN| PK_TDEL_GOOD_CF | 544K| 2656K| 315 (2)| 00:00:01 | ----------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 115 recursive calls 2505121 db block gets 1311 consistent gets 249812 physical reads 411603188 redo size 195 bytes sent via SQL*Net to client 385 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 9 sorts (memory) 0 sorts (disk) 544040 rows processed select sql_id, sql_plan_line_id, session_state, event, p1, count(*) from v$active_session_history where sql_id='0nqk3fmcwrrzm' group by sql_id,sql_plan_line_id, session_state, event, p1 order by 6; SQL_ID SQL_PLAN_LINE_ID SESSION EVENT P1 COUNT(*) ------------- ---------------- ------- -------------------------------- ---------- ---------- 0nqk3fmcwrrzm 1 ON CPU 7 3 0nqk3fmcwrrzm 1 WAITING db file sequential read 7 26
I.e. it did run much faster with the better clustering factor and only had to do half the physical reads.
Here the test with the full table scan on the table with the index having a bad clustering factor:
cbleile@orcl@orcl> delete /*+ FULL(T) */ from TDEL_BAD_CF T where 1=1; 544040 rows deleted. Elapsed: 00:00:08.39 Execution Plan ---------------------------------------------------------- Plan hash value: 4058645893 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | DELETE STATEMENT | | 544K| 2656K| 67670 (1)| 00:00:01 | | 1 | DELETE | TDEL_BAD_CF | | | | | | 2 | TABLE ACCESS FULL| TDEL_BAD_CF | 544K| 2656K| 67670 (1)| 00:00:01 | ---------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 161 recursive calls 1940687 db block gets 248764 consistent gets 252879 physical reads 269882276 redo size 195 bytes sent via SQL*Net to client 401 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 7 sorts (memory) 0 sorts (disk) 544040 rows processed select sql_id, sql_plan_line_id, session_state, event, p1, count(*) from v$active_session_history where sql_id='4272c7xv86d0k' group by sql_id,sql_plan_line_id, session_state, event, p1 order by 6; SQL_ID SQL_PLAN_LINE_ID SESSION EVENT P1 COUNT(*) ------------- ---------------- ------- -------------------------------- ---------- ---------- 4272c7xv86d0k 2 ON CPU 7 2 4272c7xv86d0k 1 ON CPU 7 3 4272c7xv86d0k 2 WAITING db file scattered read 7 3
I.e. if Oracle would consider the clustering factor here and do the delete with the full table scan then it would obviously run much faster.
Last test with an IOT:
cbleile@orcl@orcl> delete from TDEL_IOT where 1=1; 544040 rows deleted. Elapsed: 00:00:06.90 Execution Plan ---------------------------------------------------------- Plan hash value: 515699456 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | DELETE STATEMENT | | 544K| 2656K| 66065 (1)| 00:00:01 | | 1 | DELETE | TDEL_IOT | | | | | | 2 | INDEX FAST FULL SCAN| SYS_IOT_TOP_77456 | 544K| 2656K| 66065 (1)| 00:00:01 | ------------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 144 recursive calls 521556 db block gets 243200 consistent gets 243732 physical reads 241686612 redo size 194 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 7 sorts (memory) 0 sorts (disk) 544040 rows processed select sql_id, sql_plan_line_id, session_state, event, p1, count(*) from v$active_session_history where sql_id='cf6nj64yybkpq' group by sql_id,sql_plan_line_id, session_state, event, p1 order by 6; SQL_ID SQL_PLAN_LINE_ID SESSION EVENT P1 COUNT(*) ------------- ---------------- ------- -------------------------------- ---------- ---------- cf6nj64yybkpq 2 ON CPU 7 1 cf6nj64yybkpq 1 ON CPU 7 1 cf6nj64yybkpq 2 WAITING db file scattered read 7 3
As we were not allowed to adjust the code or replace the table with an IOT the measures to improve this situation were to
– set filesystemio_options=’SETALL’
REMARK: That change needs good testing as it may have negative effects on other SQL, which gain from the filesystem cache.
– add a hint with a SQL-Patch to force a full table scan
REMARK: Creating a SQL-Patch to add the hint
FULL(TDEL_BAD_CF)
to the statement was not easily possible, because Oracle does not consider this hint in DML:
var rv varchar2(32); declare v_sql CLOB; begin select sql_text into v_sql from dba_hist_sqltext where sql_id='ck5fw78yqh93g'; :rv:=dbms_sqldiag.create_sql_patch( sql_text => v_sql, hint_text=>'FULL(TDEL_BAD_CF)', name=>'force_fts_when_del_all', description=>'force fts when del all rows'); end; / print rv delete from TDEL_BAD_CF where 1=1; 544040 rows deleted. Elapsed: 00:01:01.79 Execution Plan ---------------------------------------------------------- Plan hash value: 2076794500 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | DELETE STATEMENT | | 544K| 2656K| 315 (2)| 00:00:01 | | 1 | DELETE | TDEL_BAD_CF | | | | | | 2 | INDEX FAST FULL SCAN| PK_TDEL_BAD_CF | 544K| 2656K| 315 (2)| 00:00:01 | ---------------------------------------------------------------------------------------- Hint Report (identified by operation id / Query Block Name / Object Alias): Total hints for statement: 1 (N - Unresolved (1)) --------------------------------------------------------------------------- 1 - DEL$1 N - FULL(TDEL_BAD_CF) Note ----- - SQL patch "force_fts_when_del_all" used for this statement
I.e. according the Note the SQL patch was used, but the Hint report showed it to be “Unresolved”.
So I had to use the full hint specification:
FULL(@"DEL$1" "TDEL_BAD_CF"@"DEL$1")
To get this full specification you can do an explain plan of the hinted statement and look at the outline data:
SQL> explain plan for 2 delete /*+ FULL(TDEL_BAD_CF) */ from TDEL_BAD_CF where 1=1; Explained. SQL> select * from table(dbms_xplan.display(format=>'+OUTLINE')); ... Outline Data ------------- /*+ BEGIN_OUTLINE_DATA FULL(@"DEL$1" "TDEL_BAD_CF"@"DEL$1") OUTLINE_LEAF(@"DEL$1") ALL_ROWS DB_VERSION('19.1.0') OPTIMIZER_FEATURES_ENABLE('19.1.0') IGNORE_OPTIM_EMBEDDED_HINTS END_OUTLINE_DATA */
So here’s the script to create the SQL Patch correctly:
var rv varchar2(32); declare v_sql CLOB; begin select sql_text into v_sql from dba_hist_sqltext where sql_id='ck5fw78yqh93g'; :rv:=dbms_sqldiag.create_sql_patch( sql_text => v_sql, hint_text=>'FULL(@"DEL$1" "TDEL_BAD_CF"@"DEL$1")', name=>'force_fts_when_del_all', description=>'force fts when del all rows'); end; / print rv SQL> delete from TDEL_BAD_CF where 1=1; 544040 rows deleted. Elapsed: 00:00:06.57 Execution Plan ---------------------------------------------------------- Plan hash value: 4058645893 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | DELETE STATEMENT | | 544K| 2656K| 67670 (1)| 00:00:01 | | 1 | DELETE | TDEL_BAD_CF | | | | | | 2 | TABLE ACCESS FULL| TDEL_BAD_CF | 544K| 2656K| 67670 (1)| 00:00:01 | ---------------------------------------------------------------------------------- Note ----- - SQL patch "force_fts_when_del_all" used for this statement Statistics ---------------------------------------------------------- 207 recursive calls 1940517 db block gets 248759 consistent gets 252817 physical reads 272061432 redo size 195 bytes sent via SQL*Net to client 384 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 16 sorts (memory) 0 sorts (disk) 544040 rows processed
Summary: This is a specific corner case where the Oracle optimizer should consider the clustering factor in DML when calculating plan costs but it doesn’t. The workaround in this case was to hint the statement or add a SQL-Patch to hint the statement without modifying it in the code.