dbi services Blog
Welcome to the dbi services Blog! This IT blog focuses on database, middleware, and OS technologies such as Oracle, Microsoft SQL Server & SharePoint, EMC Documentum, MySQL, PostgreSQL, Sybase, Unix/Linux, etc. The dbi services blog represents the view of our consultants, not necessarily that of dbi services. Feel free to comment on our blog postings.
SQL Server 2014: Rebuilding Clustered Columnstore Indexes
I was surprised that you have to rebuild indexes that are stored In Memory. My previous posting "SQL Server 2014 - New Features: xVelocity memory optimized columnstore index" explains all processes connected with columnstore indexes. In this article, I will explain why, when, and how to rebuild them.
Why do we need to rebuild the Clustered Columnstore Indexes?
When a row is deleted, SQL Server marks the row as logically deleted. Physically, the row is still there.
This is true for the columnstore, but if the row is in the deltastore, the row is directly logically and physically deleted.
In addition to the DELETE query, an UPDATE query is a deletion followed by an insert.
We can easily consider that a lot of rows are marked as deleted during the lifetime of a database. The next question is: How can we determine when a rebuild must be started?
The traditional question: When and How?
A few specifically interesting Dynamic Management Views (DMVs) are delivered with SQL Server 2014 for Clustered Columnstore Indexes.These views help us understand what is happening in this new feature.
The most important DMV is sys.column_store_row_groups. This view has two interesting columns:
- total_rows is the total number of rows stored
- deleted_rows is the total number of rows marked as deleted
You can find more information here.
A comparison between these 2 columns gives us an idea of when to rebuild a Clustered Columnstore Index.
When the number of deleted rows increases and represents a large percentage of the total rows, performance problems can occur on the table.
To reduce the size of the table and of course reduce the disk I/O during a read, the best way is to really delete all rows marked as deleted.
Then, rebuilding the index is required.
The following example will help you understand this problem.
Rebuilding a Clustered Columnstore Index
I have created the Clustered Columnstore Index CCSI_Person_Person on Person's table from the AdventureWorks database:
My first action is an update from a title for all rows with an "e":
10446 rows are updated.
My second action is a delete from same rows, and such as for the update, I have 10446 rows in the column delete_rows:
Afterwards, I change my clause WHERE and I delete all rows with an "a":
18269 rows are marked as deleted.
Now, using a Select command and retrieve a total count of 1703 rows.
With sys.column_store_row_groups and a simple calculation, I have 91% of rows marked as deleted, kind of phantom rows...
If I open the Index properties, no rebuild operation is needed as the index fragmentation is at 0%.
This is good, I do not perform a rebuild because I have fragmentation.
Columnstore index is not a row-store index type and is stored in memory. Do not forget that!
You can directly rebuild with a right-click on the index:
Or also via T-SQL script: ALTER INDEX CCSI_Person_Person REBUILD PARTITION = ALL
Using my script, I will add two options: ONLINE and DROP_EXISTING
DROP_EXISTING is not possible!
For ONLINE, same result as for DROP_EXISTING: impossible
Let's check if my rebuilding really deletes all rows marked as deleted:
As expected all rows previously marked deleted are now physically deleted.
With Clustered Columnstore Indexes, we need to plan Index Rebuild Operation on a regular basis. The best way to do this is to insert this in your index work maintenance plan.
For your information, the rebuilding task in the maintenance plan does not include Clustered Columnstore Indexes rebuilds.
I hope an option will be added in the RTM (Release To Manufacture) version!