In this blog post, I would like to share with you a weird issue I faced when I implemented a sliding Windows partition scenario on SQL Server 2014 SP1 at one of my customer. The idea was to keep SQL Server audit records from several SQL Server instances into a global archiving table during two years in order to meet the customer audit specifications. Data older than year – 2 should be deleted. Based on a first implementation, we estimated the total data size to 800GB per year.
So we decided to implement the sliding Windows scenario as follows:
- Right-based partition function is used in this context
- Two partitions exist to store current and year – 1 audit data
- Partition that contains data older than one year will be switch and then drop
- One additional empty partition exists to avoid data movement that may lead to high resource consumption (see my previous blog post here)
In addition, we also implemented two stored procedures in order to automate sliding partition process based on the Microsoft article here.
And here came the (weird) issue. I always used the following (simplified) script so far to get table information regardless if a table is partitioned or not.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | SELECT OBJECT_SCHEMA_NAME(p.object_id) AS SchemaName, OBJECT_NAME(p.object_id) AS ObjectName, i. name as index_name, i.index_id, p.partition_number, g. name AS [filegroup_name], au.data_space_id, p. rows , p.data_compression_desc FROM sys.partitions AS p JOIN sys.allocation_units AS au ON au.container_id = p.partition_id -- p.hobt_id JOIN sys.indexes i ON p.object_id = i.object_id AND p.index_id = i.index_id JOIN sys.filegroups AS g ON g.data_space_id = au.data_space_id WHERE p.object_id = object_id( 'dbo.t_sql_logins' ) AND p.index_id = 1; GO |
Here the output of the initial scenario in my case:
Let’s say that the partition 1 contains Year-1 data. Si I included this query to detect dynamically which partition to switch inside the stored procedure that deals with the left side of the partitioned table (that includes switch and merge operations) and this is exactly where the problem occurs.
The new scenario is as follows:
After switching the old partition and then merge the existing left boundary, I noticed metadata inconsistency regarding the following error message. In short, this message indicates my query doesn’t detect correctly which partition hosts data to delete because it seems to not be correctly aligned with archive table. They are not on the same filegroup.
ALTER TABLE SWITCH statement failed. index ‘security_audit.dbo.test_archive.PK_test_archive’ is in filegroup ‘FG_AUDIT_SQL_03’ and partition 1 of index ‘security_audit.dbo.test.PK_test’ is in filegroup ‘FG_AUDIT_SQL_02’
Fortunately, Dan Guzman (MVP Data Platform) puts me on the right track with the following query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | SELECT OBJECT_SCHEMA_NAME(p.object_id) AS SchemaName , OBJECT_NAME(p.object_id) AS ObjectName , i. name AS IndexName , p.index_id AS IndexID , p.partition_number AS PartitionNumber , fg2. name AS FileGroupName , fg2.data_space_id , p. rows AS Rows , p.data_compression_desc FROM sys.partitions AS p INNER JOIN sys.indexes AS i ON i.object_id = p.object_id AND i.index_id = p.index_id INNER JOIN sys.data_spaces AS ds ON ds.data_space_id = i.data_space_id INNER JOIN sys.partition_schemes AS ps ON ps.data_space_id = ds.data_space_id INNER JOIN sys.partition_functions AS pf ON pf.function_id = ps.function_id INNER JOIN sys.destination_data_spaces AS dds2 ON dds2.partition_scheme_id = ps.data_space_id AND dds2.destination_id = p.partition_number INNER JOIN sys.filegroups AS fg2 ON fg2.data_space_id = dds2.data_space_id WHERE p.object_id in ( object_id( 'dbo.t_sql_logins' ) ) AND P.index_id = 1 |
You may notice that the output is not pretty the same than the previous query.
This time the partition number 1 seems to be more accurate with the FG_AUDIT_SQL_03 value. So why this difference between the two queries? If we take a further look at the both queries, we may see a noticeable difference with the use of sys.allocation_units DMV in the first query versus sys.destination_data_spaces DMV in the second one.
Let’s divide and simplify the above queries into small pieces and let’s have a look at the following result:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | SELECT partition_id, partition_number, hobt_id FROM sys.partitions WHERE object_id in ( object_id( 'dbo.t_sql_logins' ) ); SELECT allocation_unit_id, container_id, data_space_id FROM sys.allocation_units WHERE container_id IN ( 72057594040811520, 72057594040877056, 72057594041073664); SELECT name , data_space_id FROM sys.filegroups; SELECT destination_id, data_space_id FROM sys.destination_data_spaces; |
Red arrows concern the first query and the green arrows concern the second one. The arrows represent the value used for joining DMVs together. You may notice the filegroup name is different between the both queries which makes me think that the data_space_id value from the sys.allocation_units DMV is inconsistent in this case. In addition, we may notice two same data_space_id column values for each different value of container_id. The point here is that if I rebuild the corresponding index, this inconsistency disappears as shown below:
After further investigations, I found out the following Microsoft KB3095958 that talks about metadata inconsistency after a partition switch and SQL Server 2014 SP1. Applying this KB on my environment fixed this specific issue in my case.
Happy partitioning!
By David Barbarin