dbi Blog

By Franck Pachot

.
In a previous blog, I discussed the difference between rownum and row_number(), in particular their behaviour in implicitely adding a first_rows(n) to the optimizer. That reminded me that I forgot to blog about an issue I encountered and which concerns both approaches. It was on an Exadata: a nice full table scan with smartscan was taking a long time. And forcing to an index access – with a very bad index – was better. The query had a very selective predicate (and that’s why SmartScan should be very good here) but where the predicate cannot be used by the index access.

In order to explain this, I’ll do the same query but without any predicates. It’s on a simple TEST table with 1 million rows.

select /*+ gather_plan_statistics */ * from (
   select * from TEST order by n 
 ) where rownum < 10

And here is the execution plan with execution statistics

---------------------------------------------------------------------------------------------
 | Id  | Operation                               | Name | E-Rows | A-Rows | Buffers | Reads  |
 ---------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT                        |      |        |     10 |    2137 |   2135 |
 |*  1 |  COUNT STOPKEY                          |      |        |     10 |    2137 |   2135 |
 |   2 |   VIEW                                  |      |   1000K|     10 |    2137 |   2135 |
 |*  3 |    SORT ORDER BY STOPKEY                |      |   1000K|     10 |    2137 |   2135 |
 |*  4 |     TABLE ACCESS STORAGE FULL FIRST ROWS| TEST |   1000K|   1000K|    2137 |   2135 |
 ---------------------------------------------------------------------------------------------

It sounds good. SmartScan is used (TABLE ACCESS STORAGE). We read 1 million rows, then sort them and keep only the top-10. But it takes longer than our expectation. Let’s have a look at some SmartScan statistics:

NAME                                                                   VALUE
 ---------------------------------------------------------------- -----------
 cell IO uncompressed bytes                                        17,596,416
 cell blocks processed by cache layer                                   3,329
 cell blocks processed by data layer                                    2,148
 cell blocks processed by txn layer                                     3,329
 cell num fast response sessions                                            1
 cell num fast response sessions continuing to smart scan                   1
 cell physical IO bytes eligible for predicate offload             17,498,112
 cell physical IO interconnect bytes                               31,273,680
 cell physical IO interconnect bytes returned by smart scan        15,848,144
 cell scans                                                                 2
 physical read bytes                                               17,506,304
 physical read total bytes                                         32,923,648

Something is wrong here. We have 17MB eligible to SmartScan, which is the 2135 blocks reads we see in the execution plan (I have the default 8k blocks).
But I’ve exchanged 31MB through interconnect. What are those additional 16MB ? I’m doing only SmartScan here, according to the execution plan.

Well. Easy to check. The full table scan must read all blocks up the the high water mark.

select blocks,blocks*8192 MBYTES from user_tables where table_name='TEST';
 SQL>
     BLOCKS     MBYTES
 ---------- ----------
       4013   32882245

So my table has 4013 blocks and is 32MB. Not only my execution plan above is not optimal. But it is false. It is just impossible to full scan my table with only 2137 block reads. I have to read 4013 blocks, which is more than 30MB.
So in the statistics above, the right values are ‘physical read total bytes’ and ‘cell physical IO interconnect bytes’. But it seems that only 17MB of data has been read through smart scan.

There we have to know a little bit more about an exadata optimization that is not very well documented. The clue was ‘FIRST ROWS’ in the execution plan, just after ‘STORAGE’. If it’s a special feature, we can check how to enable/disable it with an underscore parameter. Names and descriptions are in sys.x$ksppi:

SQL> select KSPPINM,KSPPDESC from sys.x$ksppi where KSPPDESC like '%first%rows%';
KSPPINM
 --------------------------------------------------------------------------------
 KSPPDESC
 --------------------------------------------------------------------------------
 _kcfis_fast_response_enabled
 Enable smart scan optimization for fast response (first rows)
_sort_elimination_cost_ratio
 cost ratio for sort eimination under first_rows mode
_optimizer_rownum_pred_based_fkr
 enable the use of first K rows due to rownum predicate
_optimizer_fkr_index_cost_bias
 Optimizer index bias over FTS/IFFS under first K rows mode

From its description, _kcfis_fast_response_enabled is related with SmartScan and with ‘first rows’ (which we have in the execution plan and which is coming from the rownum < 10.
So let’s disable it:


alter session set "_kcfis_fast_response_enabled"=false;

and execute again, with execution plan and statistics.

---------------------------------------------------------------------------------------------
 | Id  | Operation                               | Name | E-Rows | A-Rows | Buffers | Reads  |
 ---------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT                        |      |        |     10 |    4019 |   4017 |
 |*  1 |  COUNT STOPKEY                          |      |        |     10 |    4019 |   4017 |
 |   2 |   VIEW                                  |      |   1000K|     10 |    4019 |   4017 |
 |*  3 |    SORT ORDER BY STOPKEY                |      |   1000K|     10 |    4019 |   4017 |
 |*  4 |     TABLE ACCESS STORAGE FULL FIRST ROWS| TEST |   1000K|   1000K|    4019 |   4017 |
 ---------------------------------------------------------------------------------------------

The first observation was that the execution time was much faster. And here we see the 4000 blocks of my table that we have to read.

NAME                                                                                VALUE
 ---------------------------------------------------------------- ------------------------
 cell IO uncompressed bytes                                                     33,120,256
 cell blocks processed by cache layer                                                5,913
 cell blocks processed by data layer                                                 4,043
 cell blocks processed by txn layer                                                  5,913
 cell physical IO bytes eligible for predicate offload                          32,915,456
 cell physical IO interconnect bytes                                            29,833,624
 cell physical IO interconnect bytes returned by smart scan                     29,825,432
 cell scans                                                                              2
 physical read bytes                                                            32,923,648
 physical read total bytes                                                      32,923,648

All statistics are right now. 32MB eligible to SmartScan. 32MB returned by SmartScan.

So what happened?

When we use first_rows(10), either explicitely or coming from rownum < 10, Oracle knows that we need only 10 rows. And Exadata has an optimization to avoid SmartScan for only few rows because it has an overhead to start. This is the ‘Enable smart scan optimization for fast response (first rows)’ feature and you can see it in the ‘cell num fast response sessions’ statistic above. No SmartScan occurs when the result is expected to come quickly. However, if it is longer than expected, it can switch so SmartScan later (‘cell num fast response sessions continuing to smart scan’). But there are two problems with that.
First, the statistics are wrong. The statistics from the first phase (the non-SmartScan one) are just lost. We see it in the execution plan where about 2000 reads are missing. And we see it also in ‘physical read bytes’ where 16MB are missing.
Second, and worse, this optimization is not good at all in our case. No matter we need only 10 rows. We have to read 1 million rows because we have to sort them before filtering them. And 1 million is not a good candidate for ‘fast response (first rows)’ optimization. Here, 16MB has been read without SmartScan, and when I checked the wait events, they were all single block reads. The fastest machine has chosen to use the slowest way to get data…

This is just one example of first rows issues. There are many others (see Jonathan Lewis blog). In general, first_rows_n optimization is good when it avoids to sort all the rows, using the index which is maintained ordered. And by the way, if you see ‘first rows’ without an order by, then it’s suspicous: who wants the top-n of a random row set ?
In order to have a better control on first rows optimizations, I prefer to avoid rownum and use the row_number() function or the 12c FETCH FIRST ROWS syntax. See the previous blog about that.

update

Thanks to Hemant K Chitale comment, I’ve fixed the unit (this testcase was in MB)

Post Views: 2,326

Oracle Exadata – poor optimization for FIRST_ROWS

By Franck Pachot

So what happened?

update

Oracle Team

Leave a Reply:

Related blog articles

By Franck Pachot

So what happened?

update

Oracle Team

Leave a Reply:

Related blog articles

Another file system for Linux: bcachefs (3) – Mirroring/Replicas

Another file system for Linux: bcachefs (2) – multi device file systems

Another file system for Linux: bcachefs (1) – basics

PostgreSQL 17: pg_buffercache_evict()