As a consultant working for customers, I’m often in the situation that I have an answer to a problem, but the recommended solution cannot be implemented due to some restrictions. E.g. the recommendation would be to adjust the code, but that is not feasible. In such cases you are forced to try to help without code changes.

Recently I was confronted with the following issue: A process takes too long. Digging deeper I could see that most of the time was spent on this SQL:

DELETE FROM COM_TAB WHERE 1=1 

The execution plan looked as follows:

--------------------------------------------------------------------------------------------
| Id  | Operation             | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT      |                    |       |       | 16126 (100)|          |
|   1 |  DELETE               | COM_TAB            |       |       |            |          |
|   2 |   INDEX FAST FULL SCAN| PK_COM_TAB         |    10M|   306M| 16126   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

My initial reaction was of course to say that deleting all data in a table with a delete statement is not a good idea. Better is to turn the DML into DDL and use e.g. “truncate table”. All options for deleting lots of rows in a table fast are provided by Chris Saxon in his Blog here.

In this case changing the SQL was not possible, so what are the alternatives?

As I was involved in this a long time after the issue happened I checked the ASH of AWR-History:

SQL> select SQL_EXEC_START, session_state, event, count(*)*10 secs_in_state FROM dba_hist_active_sess_history where sql_id='53gwjb0gjn1np'
  2  group by sql_exec_start, session_state, event order by 1,4 desc;

SQL_EXEC_START      SESSION EVENT                                                            SECS_IN_STATE
------------------- ------- ---------------------------------------------------------------- -------------
19.06.2020 10:13:02 WAITING free buffer waits                                                          560
19.06.2020 10:13:02 WAITING enq: CR - block range reuse ckpt                                           370
19.06.2020 10:13:02 ON CPU                                                                             130
19.06.2020 10:13:02 WAITING reliable message                                                            10
19.06.2020 10:56:01 WAITING enq: CR - block range reuse ckpt                                           550
19.06.2020 10:56:01 WAITING free buffer waits                                                          230
19.06.2020 10:56:01 ON CPU                                                                             140
19.06.2020 10:56:01 WAITING log file switch (checkpoint incomplete)                                     60
19.06.2020 11:39:38 WAITING enq: CR - block range reuse ckpt                                           610
19.06.2020 11:39:38 WAITING free buffer waits                                                          180
19.06.2020 11:39:38 ON CPU                                                                             170
19.06.2020 11:39:38 WAITING log file switch (checkpoint incomplete)                                     80
19.06.2020 11:39:38 WAITING write complete waits                                                        40
19.06.2020 12:23:47 WAITING enq: CR - block range reuse ckpt                                           450
19.06.2020 12:23:47 WAITING free buffer waits                                                          280
19.06.2020 12:23:47 ON CPU                                                                             150
19.06.2020 12:23:47 WAITING log file switch (checkpoint incomplete)                                     90
19.06.2020 12:23:47 WAITING write complete waits                                                        30
19.06.2020 12:23:47 WAITING log buffer space                                                            10

So obviously the DBWR had a problem writing dirty blocks to disk and getting free space in the cache. When the issue happened above the following parameter were active:

filesystemio_options='ASYNCH'

Changing it to

filesystemio_options='SETALL'

improved the situation a lot, but caused waits on “db file sequential read”.

I.e. with filesystemio_options=’ASYNCH’ we do cache lots of repeatedly touched blocks in the filesystem cache, but suffer from slower (non-direct) writes by the DB-writer. With filesystemio_options=’SETALL’ we gain by doing direct IO by the DB-writer, but have to read repeatedly touched blocks from disk more often.

The table just had 1 index, the index for the primary key.

So what to do here?

Several recommendations came to mind:

– With filesystemio_options=’ASYNCH’: Increase the redologs to not do a checkpoint while the statement is running
– With filesystemio_options=’SETALL’: Increase the buffer cache to keep blocks in memory for longer and avoid single block IOs

The most interesting question was: Why is the optimizer deciding to go over the index here first? With a bad clustering factor it would make more sense to do a full table scan than to use the index. And this has actually been validated with a hint:

DELETE /*+ FULL(COM_TAB) */ FROM COM_TAB WHERE 1=1

improved the situation.

An improvement should be achievable by using an Index Organized Table here as we only have a primary key index on the table, so that we just wipe out the data in the index and do not have to visit the same table block repeatedly again. The best however is to create a testcase and reproduce the issue. Here’s what I did:

I created 2 tables

TDEL_GOOD_CF
TDEL_BAD_CF

which do have more blocks than I have in the db-cache. As the name suggests one table had an index with a better clustering factor and one an index with a bad clustering factor:

SQL> select table_name, blocks, num_rows from tabs where table_name like 'TDEL_%';

TABLE_NAME                           BLOCKS   NUM_ROWS
-------------------------------- ---------- ----------
TDEL_BAD_CF                          249280     544040
TDEL_GOOD_CF                         248063     544040

Remark: To use lots of blocks I stored only 2 rows per block by using a high PCTFREE.

SQL> select index_name, leaf_blocks, clustering_factor from ind where table_name like 'TDEL_%';

INDEX_NAME                       LEAF_BLOCKS CLUSTERING_FACTOR
-------------------------------- ----------- -----------------
PK_TDEL_BAD_CF                          1135            532313
PK_TDEL_GOOD_CF                         1135            247906

The database cache size was much smaller than the blocks in the table:

SQL> select bytes/8192 BLOCKS_IN_BUFFER_CACHE from v$sgastat where name='buffer_cache';

BLOCKS_IN_BUFFER_CACHE
----------------------
                 77824

Test with filesystemio_options=’SETALL’:

SQL> set autotrace trace timing on
SQL> delete from TDEL_BAD_CF where 1=1;

544040 rows deleted.

Elapsed: 00:01:08.76

Execution Plan
----------------------------------------------------------
Plan hash value: 2076794500

----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT      |                |   544K|  2656K|   315	 (2)| 00:00:01 |
|   1 |  DELETE               | TDEL_BAD_CF    |       |       |            |          |
|   2 |   INDEX FAST FULL SCAN| PK_TDEL_BAD_CF |   544K|  2656K|   315	 (2)| 00:00:01 |
----------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
         88  recursive calls
    2500712  db block gets
       1267  consistent gets
     477213  physical reads
  388185816  redo size
        195  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          7  sorts (memory)
          0  sorts (disk)
     544040  rows processed

Please consider the 477213 physical reads (blocks read), i.e. almost 2 times the number of blocks in the table.
The ASH-data looked as follows:

select sql_id, sql_plan_line_id, session_state, event, p1, count(*)
from v$active_session_history
where sql_id='ck5fw78yqh93g'
group by sql_id,sql_plan_line_id, session_state, event, p1
order by 6;


SQL_ID	      SQL_PLAN_LINE_ID SESSION EVENT                                    P1   COUNT(*)
------------- ---------------- ------- -------------------------------- ---------- ----------
ck5fw78yqh93g                1 WAITING db file scattered read                    7          1
ck5fw78yqh93g                1 ON CPU                                            7         11
ck5fw78yqh93g                1 WAITING db file sequential read                   7         56

P1 is the file_id when doing IO. File ID 7 is the USERS-Tablepspace where my table and index are in.

So obviously Oracle didn’t consider the clustering factor when building the plan with the index. The cost of 315 is just the cost for the INDEX FAST FULL SCAN:

Fast Full Index Scan Cost ~ ((LEAF_BLOCKS/MBRC) x MREADTIM)/ SREADTIM + CPU

REMARK: I do not have system statistics gathered.

LEAF_BLOCKS=1135
MBRC=8
MREADTIM=26ms
SREADTIM=12ms

Fast Index Scan Cost ~ ((1135/8) x 26)/ 12 + CPU = 307 + CPU = 315

The costs for accessing the table are not considered at all. I.e. going through the index and from there to the table to delete the rows results in visiting the same table block several times.

Here the test with the table having a better clustering factor on the index:

SQL> delete from TDEL_GOOD_CF where 1=1;

544040 rows deleted.

Elapsed: 00:00:30.48

Execution Plan
----------------------------------------------------------
Plan hash value: 4284904063

-----------------------------------------------------------------------------------------
| Id  | Operation             | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT      |                 |   544K|  2656K|   315   (2)| 00:00:01 |
|   1 |  DELETE               | TDEL_GOOD_CF    |       |       |            |          |
|   2 |   INDEX FAST FULL SCAN| PK_TDEL_GOOD_CF |   544K|  2656K|   315   (2)| 00:00:01 |
-----------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
        115  recursive calls
    2505121  db block gets
       1311  consistent gets
     249812  physical reads
  411603188  redo size
        195  bytes sent via SQL*Net to client
        385  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          9  sorts (memory)
          0  sorts (disk)
     544040  rows processed


select sql_id, sql_plan_line_id, session_state, event, p1, count(*)
from v$active_session_history
where sql_id='0nqk3fmcwrrzm'
group by sql_id,sql_plan_line_id, session_state, event, p1
order by 6;

SQL_ID	      SQL_PLAN_LINE_ID SESSION EVENT                                    P1   COUNT(*)
------------- ---------------- ------- -------------------------------- ---------- ----------
0nqk3fmcwrrzm                1 ON CPU                                            7          3
0nqk3fmcwrrzm                1 WAITING db file sequential read                   7         26

I.e. it did run much faster with the better clustering factor and only had to do half the physical reads. 

Here the test with the full table scan on the table with the index having a bad clustering factor:

cbleile@orcl@orcl> delete /*+ FULL(T) */ from TDEL_BAD_CF T where 1=1;

544040 rows deleted.

Elapsed: 00:00:08.39

Execution Plan
----------------------------------------------------------
Plan hash value: 4058645893

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |             |   544K|  2656K| 67670   (1)| 00:00:01 |
|   1 |  DELETE            | TDEL_BAD_CF |       |       |            |          |
|   2 |   TABLE ACCESS FULL| TDEL_BAD_CF |   544K|  2656K| 67670   (1)| 00:00:01 |
----------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
        161  recursive calls
    1940687  db block gets
     248764  consistent gets
     252879  physical reads
  269882276  redo size
        195  bytes sent via SQL*Net to client
        401  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          7  sorts (memory)
          0  sorts (disk)
     544040  rows processed

select sql_id, sql_plan_line_id, session_state, event, p1, count(*)
from v$active_session_history
where sql_id='4272c7xv86d0k'
group by sql_id,sql_plan_line_id, session_state, event, p1
order by 6;

SQL_ID	      SQL_PLAN_LINE_ID SESSION EVENT                                    P1   COUNT(*)
------------- ---------------- ------- -------------------------------- ---------- ----------
4272c7xv86d0k                2 ON CPU                                            7          2
4272c7xv86d0k                1 ON CPU                                            7          3
4272c7xv86d0k                2 WAITING db file scattered read                    7          3

I.e. if Oracle would consider the clustering factor here and do the delete with the full table scan then it would obviously run much faster.

Last test with an IOT:

cbleile@orcl@orcl> delete from TDEL_IOT where 1=1;

544040 rows deleted.

Elapsed: 00:00:06.90

Execution Plan
----------------------------------------------------------
Plan hash value: 515699456

-------------------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time             |
-------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT      |                   |   544K|  2656K| 66065   (1)| 00:00:01 |
|   1 |  DELETE               | TDEL_IOT          |       |       |            |          |
|   2 |   INDEX FAST FULL SCAN| SYS_IOT_TOP_77456 |   544K|  2656K| 66065   (1)| 00:00:01 |
-------------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
        144  recursive calls
     521556  db block gets
     243200  consistent gets
     243732  physical reads
  241686612  redo size
        194  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          7  sorts (memory)
          0  sorts (disk)
     544040  rows processed

select sql_id, sql_plan_line_id, session_state, event, p1, count(*)
from v$active_session_history
where sql_id='cf6nj64yybkpq'
group by sql_id,sql_plan_line_id, session_state, event, p1
order by 6;

SQL_ID	      SQL_PLAN_LINE_ID SESSION EVENT                                    P1   COUNT(*)
------------- ---------------- ------- -------------------------------- ---------- ----------
cf6nj64yybkpq                2 ON CPU                                            7          1
cf6nj64yybkpq                1 ON CPU                                            7          1
cf6nj64yybkpq                2 WAITING db file scattered read                    7          3

As we were not allowed to adjust the code or replace the table with an IOT the measures to improve this situation were to
– set filesystemio_options=’SETALL’
REMARK: That change needs good testing as it may have negative effects on other SQL, which gain from the filesystem cache.
– add a hint with a SQL-Patch to force a full table scan

REMARK: Creating a SQL-Patch to add the hint

FULL(TDEL_BAD_CF)

to the statement was not easily possible, because Oracle does not consider this hint in DML:

var rv varchar2(32);
declare
   v_sql CLOB;
begin
   select sql_text into v_sql from dba_hist_sqltext where sql_id='ck5fw78yqh93g';
   :rv:=dbms_sqldiag.create_sql_patch(
             sql_text  => v_sql,
             hint_text=>'FULL(TDEL_BAD_CF)',
             name=>'force_fts_when_del_all',
             description=>'force fts when del all rows');
end;
/
print rv

delete from TDEL_BAD_CF where 1=1;

544040 rows deleted.

Elapsed: 00:01:01.79

Execution Plan
----------------------------------------------------------
Plan hash value: 2076794500

----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT      |                |   544K|  2656K|   315	 (2)| 00:00:01 |
|   1 |  DELETE               | TDEL_BAD_CF    |       |       |            |          |
|   2 |   INDEX FAST FULL SCAN| PK_TDEL_BAD_CF |   544K|  2656K|   315	 (2)| 00:00:01 |
----------------------------------------------------------------------------------------

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (N - Unresolved (1))
---------------------------------------------------------------------------

   1 -	DEL$1
	 N -  FULL(TDEL_BAD_CF)

Note
-----
   - SQL patch "force_fts_when_del_all" used for this statement

I.e. according the Note the SQL patch was used, but the Hint report showed it to be “Unresolved”.

So I had to use the full hint specification:

FULL(@"DEL$1" "TDEL_BAD_CF"@"DEL$1")

To get this full specification you can do an explain plan of the hinted statement and look at the outline data:

SQL> explain plan for
  2  delete /*+ FULL(TDEL_BAD_CF) */ from TDEL_BAD_CF where 1=1;

Explained.

SQL> select * from table(dbms_xplan.display(format=>'+OUTLINE'));

...

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"DEL$1" "TDEL_BAD_CF"@"DEL$1")
      OUTLINE_LEAF(@"DEL$1")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

So here’s the script to create the SQL Patch correctly:

var rv varchar2(32);
declare
   v_sql CLOB;
begin
   select sql_text into v_sql from dba_hist_sqltext where sql_id='ck5fw78yqh93g';
   :rv:=dbms_sqldiag.create_sql_patch(
             sql_text  => v_sql,
             hint_text=>'FULL(@"DEL$1" "TDEL_BAD_CF"@"DEL$1")',
             name=>'force_fts_when_del_all',
             description=>'force fts when del all rows');
end;
/
print rv

SQL> delete from TDEL_BAD_CF where 1=1;

544040 rows deleted.

Elapsed: 00:00:06.57

Execution Plan
----------------------------------------------------------
Plan hash value: 4058645893

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time	 |
----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |             |   544K|  2656K| 67670   (1)| 00:00:01 |
|   1 |  DELETE            | TDEL_BAD_CF |       |       |            |          |
|   2 |   TABLE ACCESS FULL| TDEL_BAD_CF |   544K|  2656K| 67670   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Note
-----
   - SQL patch "force_fts_when_del_all" used for this statement

Statistics
----------------------------------------------------------
        207  recursive calls
    1940517  db block gets
     248759  consistent gets
     252817  physical reads
  272061432  redo size
        195  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
         16  sorts (memory)
          0  sorts (disk)
     544040  rows processed

Summary: This is a specific corner case where the Oracle optimizer should consider the clustering factor in DML when calculating plan costs but it doesn’t. The workaround in this case was to hint the statement or add a SQL-Patch to hint the statement without modifying it in the code.