SQL Server has CPU limitations depending on the Edition. If your VM is not configured properly you can reach these limits very easily.
In a previous post, I described the effect VM misconfigurations can have on performance because not all available cores are used as expected.
See: SQL Server CPU limitation for Express and Standard Edition
In this article I will share a similar case but this time in a context where the Automatic soft-NUMA feature comes into play.
Standard Edition limitations
In the article mentioned just above we saw that the Standard edition is limited to 4 sockets.
If you decide to give your VM 8 cores but it is also configured with 8 sockets, you will find yourself limited to 4 sockets, therefore 4 cores.
It can look like this from the Task Manager:
Non-Uniform Memory Access (NUMA) is a computer system architecture that is used with multiprocessor designs in which some regions of memory have greater access latencies.
Some memory regions are connected directly to one or more processors, with all processors connected to each other through various types of interconnection fabric. For large multiprocessor systems, this arrangement results in less contention for memory and increased system performance.
As illustrated above accessing local memory is faster than accessing the memory associated with other NUMA nodes.
SQL Server includes optimizations that recognize and adapt to a computer’s NUMA topology.
SQL Server Automatic Soft-NUMA
There are benefits to having Soft-NUMA enabled. Some internal processes are partitioned by soft-NUMA. SQL Server creates one I/O Completion Thread by NUMA node.
If you have hardware NUMA you can have an increased number of Lazy Writers workers. This can be helpful for large systems to improve performance and scalability.
See: How It Works: Soft NUMA, I/O Completion Thread, Lazy Writer Workers and Memory Nodes
Soft-NUMA applies only for CPU Scheduling and Connectivity. Memory locality is not impacted.
Standard Edition limitations and Automatic Soft-NUMA
The origin of this article is a case that I encountered with a customer.
Let’s look at sys.dm_os_sys_info to see the resources available for the SQL Server instance.
SELECT cpu_count, socket_count, hyperthread_ratio , scheduler_count , softnuma_configuration_desc, numa_node_count FROM sys.dm_os_sys_info
The VM is configured with 12 logical cores, which are visible for SQL Server as “cpu_count”. The cores are distributed over 6 sockets.
Because we have more than 8 cores Automatic soft-NUMA comes into play. It is enabled and 2 soft NUMA nodes are created. There is no hardware NUMA here.
So we have 8 cores and 2 NUMA nodes. We could expect to have 2 NUMAnodes of 4 cores each. This can be checked with sys.dm_os_schedulers.
SELECT parent_node_id, scheduler_id , status, is_online FROM sys.dm_os_schedulers WHERE scheduler_id < 100
We can see that the 2 NUMA nodes are composed of 6 schedulers each but that 4 schedulers are deactivated because of licensing.
The second NUMA node has only 2 schedulers; schedulers with id 6 and 7.
This situation is very interesting. We would have preferred to have 2 nodes of 4 cores.
I was curious about the scheduler load across NUMA nodes. I came up with the following query on sys.dm_os_workers.
SELECT COUNT(*) AS WorkerCount , parent_node_id , COUNT(DISTINCT scheduler_id) AS SchedulersCount , COUNT(*)/COUNT(DISTINCT scheduler_id) AS AvgWorkersBySchedulers FROM sys.dm_os_workers AS w JOIN sys.dm_os_schedulers AS s ON w.scheduler_address = s.scheduler_address WHERE s.status = 'VISIBLE ONLINE' GROUP BY parent_node_id
The second NUMA node with only 2 schedulers has about 40% of all workers.
There seems to be a load difference between the 2 NUMA nodes.
Going a little bit further we can have look at tasks and associated sessions per scheduler.
SELECT COUNT(*) AS Tasks , s.parent_node_id , t.scheduler_id , s.is_idle , s.load_factor , STRING_AGG(COALESCE(CAST(t.session_id AS VARCHAR), 'n/a'), ', ') AS session_id FROM sys.dm_os_tasks AS t LEFT JOIN sys.dm_os_workers AS w on w.task_address = t.task_address JOIN sys.dm_os_schedulers AS s on s.scheduler_address = w.scheduler_address WHERE s.status = 'VISIBLE ONLINE' GROUP BY s.parent_node_id, t.scheduler_id, s.is_idle, s.load_factor
We can see a larger number of tasks on schedulers 6 and 7 which belong to the second NUMA node. The CPU load is on average higher on these two schedulers.
What happens is that new connections made to SQL Server are assigned to NUMA nodes in a round-robin fashion.
Indeed with sys.dm_exec_connections, we can see that the distribution of connections is almost balanced on the 2 NUMA nodes.
SELECT COUNT(*) AS NodeConnections , node_affinity FROM sys.dm_exec_connections GROUP BY node_affinity
In this particular case, we simply reconfigured the VM to have only 2 sockets and 6 cores per socket. The 12 cores can thus be used without being constrained by the limitation of 4 sockets maximum in the Standard edition.
Things look better now.
The configuration change was done together with a change to the MAXDOP setting. We have set the MAXDOP value to the number of cores per NUMA node so in this case 6.
A slight decrease in CPU usage has been observed on this VM. As there is little activity on this SQL Server instance the change did not make a glaring difference.
Since the SQL Server connection assignment is done in a round-robin manner, having an uneven number of schedulers per NUMA node can result in an uneven CPU load.
In this customer context, there is not a very important SQL activity and the consequences of this sub-optimal NUMA configuration are negligible.
I found this subject interesting, beyond the license limitation it is a reminder that it is important to be aware of the NUMA configuration of your SQL Server environment.
I hope the few SQL queries above can help you understand a bit more about how the NUMA architecture works from a SQL Server perspective.