Since SQL Server 2014 CTP2, Microsoft has introduced a new kind of index which is the Range index. We have now two ways for indexing an In-Memory table: using either an Hash index or a Range index.

These two indexes are slightly different.
In fact, a Hash index is a set of buckets, 8-bytes memory pointers, which points to the actual row data or row chain data. SQL Server 2014 uses a hash function to map values to buckets. To minimize Hash collisions the number of bucket should be twice the number of index values. A Hash collision occurs when the Hash function returns the same bucket value for two different data value for example “Audi” and “Mercedes”. Hash indexes are optimized for equality predicates.

In contrast, for Range indexes, the leaf level of a Bw-Tree gives you a pointer to the first data row in the row chain. Like with hash indexes, rows with the same indexed value form a row chain. This kind of index is particularly efficient with inequality predicates like ‹ or › .If you want to know more about Range Indexes, please read the excellent blog of my colleague David Barbarin.

To compare these two kind of indexes I will create a new database with first two tables, one In-Memory table with a hash index on a DateTime column and another one (disk-based table) with a non-clustered index on the same column. I will fill these tables with one millions records. Here is the script I used:

-- database creation
use master
go
if exists(select * from sys.databases where name = 'Test_InMemoryOLTP_Optimized')
drop database Test_InMemoryOLTP_Optimized
go
create database Test_InMemoryOLTP_Optimized
go
--create a disk table
use Test_InMemoryOLTP_Optimized
go
if exists(select * from sys.objects where name = 'Person_OnDisk')
drop table dbo.Person_OnDisk
go
create table dbo.Person_OnDisk
(
Person_OnDisk_ID int not null primary key,
Person_OnDisk_Name nvarchar(200) not null,
Person_OnDisk_Date datetime not null,
index Person_OnDisk_Person_OnDisk_Date nonclustered (Person_OnDisk_Date)
)
--enable database for memory optimized tables
-- add memory_optimized_data filegroup
alter database Test_InMemoryOLTP_Optimized
add filegroup Test_InMemory_mod contains MEMORY_OPTIMIZED_DATA
go
--add container to the filegroup
alter database Test_InMemoryOLTP_Optimized
add file (name='InMemory_mod', filename='d:\InMemory\InMemory_mod')
to filegroup Test_InMemory_mod
go
use Test_InMemoryOLTP_Optimized
go
if exists(select * from sys.objects where name = 'Person_InMemory')
drop table dbo.Person_InMemory
go
create table dbo.Person_InMemory
(
Person_InMemory_ID int not null primary key
nonclustered hash with (bucket_count = 2000000),
Person_InMemory_Name nvarchar(200) COLLATE Latin1_General_100_BIN2 NOT NULL,
Person_InMemory_Date datetime not null index Person_InMemory_Person_InMemory_Date
nonclustered hash (Person_InMemory_Date) with (bucket_count = 2000000)
)
with (memory_optimized = on, durability = schema_and_data)
--fill in disk table
set nocount on
go
begin tran
declare @cpt int = 0
While @cpt < 1000000
begin
insert into dbo.Person_OnDisk values
(@cpt, 'Name_'+cast(@cpt as varchar(10)), dateadd(dd,round(@cpt/10000,0,1),'2014-01-01'))
set @cpt += 1
end
commit
--fill in In-Memory table
set nocount on
go
begin tran
declare @cpt int = 0
While @cpt < 1000000
begin
insert into dbo.Person_InMemory values
(@cpt, 'Name_'+cast(@cpt as varchar(10)), dateadd(dd,round(@cpt/10000,0,1),'2014-01-01'))
set @cpt += 1
end
commit

First remark, filling in the In-Memory table is three time faster than for disk-based table, respectively on my machine 5 seconds for the former against 15 seconds for the latter. You can also see that I decided to persist my In-Memory table data as I used the durability option schema_and_data instead of schema_only which means that after a restart of my SQL Server service the data will not be persistent, so definitively lost.

Now, I will try to search for records where date is equal to 14-01-2014 and then I will compare results between my In-Memory and my disk-based tables. I will use IO and Time statistics options to get a better view of each query performance:

-- use equality operator with disk table and In-Memory table with Hash index
USE Test_InMemoryOLTP_Optimized
GO
 
SET NOCOUNT ON
SET STATISTICS IO ON
SET STATISTICS TIME ON
 
SELECT *
FROM dbo.Person_OnDisk
WHERE Person_OnDisk_Date = '2014-01-14'
 
SELECT *
FROM dbo.Person_InMemory
WHERE Person_InMemory_Date = '2014-01-14'

 

The result of those queries is:
Table on disk:
Table ‘Person_OnDisk’. Scan count 1, logical reads 5812, physical reads 3, read-ahead reads 5805, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 1631 ms.

 

Table In-Memory:

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 128 ms.

 

The disk table made multiple logical reads (5812), three physical reads and 5805 read-aheads whereas the In_Memory table as it is in memory did not perform any disk I/O’s activity.
Concerning the execution time, my disk-based table consumed 31 ms of CPU against and ran in 1631 ms against no CPU and 128 ms for my In-Memory table…
In-Memory table is definitively faster than my disk-based table.

Let see now if I used an inequality predicate. So I am using the same queries with an ›:

SET NOCOUNT ON
SET STATISTICS IO ON
SET STATISTICS TIME ON
 
SELECT *
FROM dbo.Person_OnDisk
WHERE Person_OnDisk_Date > '2014-01-14' and Person_OnDisk_Date '2014-02-20'
 
 
SELECT *
FROM dbo.Person_InMemory
WHERE Person_InMemory_Date > '2014-01-14' and Person_InMemory_Date '2014-02-20'

 

The result is:
Disk table:
Table ‘Person_OnDisk’. Scan count 1, logical reads 5812, physical reads 3, read-ahead reads 5805, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 171 ms, elapsed time = 3150 ms.

In-Memory table:
SQL Server Execution Times:
CPU time = 204 ms, elapsed time = 2153 ms.

I/O’s for both tables did not change, but concerning the In-Memory table we saw that the gain is not very interesting compare to equality predicate see before. The Hash index is not tuned for inequality predicate. I will now change the Hash index of my In-Memory table to a Range index. To do it, I will have to drop and recreate my table, as it is not possible to alter an In-Memory table, and to reload data.

 

--Change the Hash index by a Range index
if exists(select * from sys.objects where name = 'Person_InMemory')
drop table dbo.Person_InMemory
go
 
create table dbo.Person_InMemory
(
Person_InMemory_ID int not null primary key
nonclustered hash with (bucket_count = 2000000),
Person_InMemory_Name nvarchar(200) COLLATE Latin1_General_100_BIN2 NOT NULL,
Person_InMemory_Date datetime not null index Person_InMemory_Person_InMemory_Date
nonclustered (Person_InMemory_Date)
)
with (memory_optimized = on, durability = schema_and_data)

 

In-Memory table:
SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 2155 ms.

The CPU time is now more than four time lower with a Range Index compare to a Hash index.
If I take a look at the execution plan I can see the optimizer is using my range index defined in my script with an index seek operation against a table scan when using an hash index with an equality operator.

Before creating an In-Memory table remember that you will have to think about which kind of predicates will be used in queries to choose the right index: Hash or Range or even both. In addition, don’t forget that an In-Memory table accepts a maximum of eight indexes. Finally, if you need to create another index for an In-Memory table you will have to drop and recreate the table.
I hope this blog will help. If you want to know more about SQL Server 2014 In-Memory OLTP please joins us to our next Event in June, details and inscription here.


Thumbnail [60x60]
by
Stéphane Savorgnano