Last night, I had an interesting discussion with one of my MVP French friend that faces a weird situation where a query that uses a statement level RECOMPILE hint produces an execution plan that disappointed him. He told me we can simulate the same situation by using the AdventureWorks database and Sales.SalesOrderHeader table.

First, we have to add a nonclustered index on the orderdate column as follows:

CREATE NONCLUSTERED INDEX [idx_sales_salesorderheader_orderdate] ON [Sales].[SalesOrderHeader]
(
       [OrderDate] ASC
)

 

Now, let me show you the query. In fact we compare two queries and two potential behaviours. Indeed, the first query will use a local variable and an inequality operator while the second query will be pretty the same except we add the recompile hint option.

declare @date datetime = getdate()
 
select top 1 SalesOrderID
from sales.salesorderheader
where orderdate >= @date
order by SalesOrderID
go
 
declare @date datetime = getdate()
 
select top 1 SalesOrderID
from sales.salesorderheader
where orderdate >= @date
order by SalesOrderID
option(recompile)

 

Let’s compare the estimated execution plan of the both queries. In fact, the estimated plan produced by the query optimizer is the same in both cases.

blog_35_-_1_-_estimated_execution_plan

Next, let’s compare their real execution plans.

–> Concerning the first query:

blog_35_-_2_-_real_execution_plan_query_1

–> Concerning the second query (with recompile hint):

blog_35_-_3_-_real_execution_plan_query_2
Interesting, isn’t it? For the same query except the recompile option the query optimizer has decided to use an index scan operator in the first case and an index seek in the second case. At this point, of course we supposed that the recompile hint affects the query optimizer decision but how far? Well, the response lies in the way the query optimizer handles the parameter @date.

In the first query, the query optimizer is not aware of the @date parameter value at the compile time.

 

blog_35_-_4_-_real_execution_plan_query_1_-_predicate

It means that it will not be able to use the density information and instead it will use the standard guess of 30% selectivity for inequality comparisons. If we take a look at the statistic object of the primary key we may see that 30% of the total rows is equal to 31465 * 0.3 = 9439

 

dbcc show_statistics (‘sales.salesorderheader’, ‘PK_SalesOrderHeader_SalesOrderID’) with stat_header;

 

blog_35_-_6_-_statistic_primary_key

But at this point, we may wonder why SQL Server is using the primary key rather than the index on the orderdate column for example. In fact, the primary key is a good candidate for this query includes an ORDER BY clause (order by SalesOrderId). It means that data is already ordered by SalesOrderId and the query optimizer doesn’t need to perform an extra step that consists in sorting data before using the top operator as shown below:

declare @date datetime = getdate()
 
select top 1 SalesOrderID
from sales.salesorderheader
where orderdate >= @date
order by SalesOrderID
 
select top 1 SalesOrderID
from sales.salesorderheader with (index (idx_sales_salesorderheader_orderdate) )
where orderdate >= @date
order by SalesOrderID
go

 

blog_35_-_7_-_query_1_-_sort

Let’s continue with the second query and notice how SQL Server is handling the @date parameter this time: the local variable value is transformed as parameter as shown below:

blog_35_-_8_-_real_execution_plan_query_2_-_predicate

Hmm… does it mean that SQL Server is aware of local variable value when using the statement RECOMPILE hint? In fact, yes it does and this detail changes completely the story because in this case SQL Server is able to use the histogram of the index idx_salesorderheader_orderdate. However, we are in such situation where the predicate value is not represented on the histogram and SQL Server will use an estimate of 1 that implies to use an index seek operator. Likewise, we may wonder why SQL Server has decided to seek the index idx_salesorderheader_orderdate with a TopN Sort operator here. Once again, the query optimizer is cost-based and this strategy is surely the less costly. Let’s demonstrate by comparing the both strategies:

declare @date datetime = getdate()
 
select top 1 SalesOrderID
from sales.salesorderheader with (index (PK_SalesOrderHeader_SalesOrderID) )
where orderdate >= @date
order by SalesOrderID
option(recompile)
 
select top 1 SalesOrderID
from sales.salesorderheader
where orderdate >= @date
order by SalesOrderID
option(recompile)

 

blog_35_-_9_-_real_execution_plan_query__-_force_using_pk
Let’s take a look at the statistics related on both CPU and IO consumption of the both queries

–> Concerning the first query (clustered index scan operator)

Table ‘SalesOrderHeader’. Scan count 1, logical reads 689, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
(1 row(s) affected)
 
SQL Server Execution Times:
   CPU time = 16 ms, elapsed time = 2 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 1 ms.

 

–> Concerning the second query (index seek operator). Yes, in this case you have only to read 2 pages (the index root page and one data page at leaf level)

Table ‘SalesOrderHeader’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
(1 row(s) affected)
 
SQL Server Execution Times:
   CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.

 

The bottom line: in certain cases we may be surprised by the choices made by the query optimizer but most of the time, we can trust it and this is what I wanted to point out in the blog post! The truth often lies elsewhere 🙂

Happy optimization!

By David Barbarin