SQL Server has CPU limitations depending on the Edition. If your VM is not configured properly you can reach these limits very easily.
In a previous post, I described the effect VM misconfigurations can have on performance because not all available cores are used as expected.
See: SQL Server CPU limitation for Express and Standard Edition

In this article I will share a similar case but this time in a context where the Automatic soft-NUMA feature comes into play.

Standard Edition limitations

In the article mentioned just above we saw that the Standard edition is limited to 4 sockets.
If you decide to give your VM 8 cores but it is also configured with 8 sockets, you will find yourself limited to 4 sockets, therefore 4 cores.
It can look like this from the Task Manager:

NUMA architecture

Non-Uniform Memory Access (NUMA) is a computer system architecture that is used with multiprocessor designs in which some regions of memory have greater access latencies.
Some memory regions are connected directly to one or more processors, with all processors connected to each other through various types of interconnection fabric. For large multiprocessor systems, this arrangement results in less contention for memory and increased system performance.

A NUMA architecture divides memory and processors into groups, called “NUMA nodes”.

As illustrated above accessing local memory is faster than accessing the memory associated with other NUMA nodes.
SQL Server includes optimizations that recognize and adapt to a computer’s NUMA topology.

SQL Server Automatic Soft-NUMA

Starting with version 2016, during startup, SQL Server interrogates the hardware layout and automatically configures Soft NUMA on systems reporting 8 or more CPUs per NUMA node.

There are benefits to having Soft-NUMA enabled. Some internal processes are partitioned by soft-NUMA. SQL Server creates one I/O Completion Thread by NUMA node.
If you have hardware NUMA you can have an increased number of Lazy Writers workers. This can be helpful for large systems to improve performance and scalability.
See: How It Works: Soft NUMA, I/O Completion Thread, Lazy Writer Workers and Memory Nodes

Soft-NUMA applies only for CPU Scheduling and Connectivity. Memory locality is not impacted.

Standard Edition limitations and Automatic Soft-NUMA

The origin of this article is a case that I encountered with a customer.
Let’s look at sys.dm_os_sys_info to see the resources available for the SQL Server instance.

SELECT
	cpu_count, socket_count, hyperthread_ratio
	, scheduler_count
	, softnuma_configuration_desc, numa_node_count
FROM sys.dm_os_sys_info

The VM is configured with 12 logical cores, which are visible for SQL Server as “cpu_count”. The cores are distributed over 6 sockets.
Because we have more than 8 cores Automatic soft-NUMA comes into play. It is enabled and 2 soft NUMA nodes are created. There is no hardware NUMA here.

We also get the information from the ErrorLog.

We can notice we have only 8 schedulers on the instance instead of 12. This is caused by the Standard Edition limitation of 4 sockets (of 2 cores in this case).

So we have 8 cores and 2 NUMA nodes. We could expect to have 2 NUMAnodes of 4 cores each. This can be checked with sys.dm_os_schedulers.

SELECT
	parent_node_id, scheduler_id
	, status, is_online
FROM sys.dm_os_schedulers
WHERE scheduler_id < 100

We can see that the 2 NUMA nodes are composed of 6 schedulers each but that 4 schedulers are deactivated because of licensing.
The second NUMA node has only 2 schedulers; schedulers with id 6 and 7.
This situation is very interesting. We would have preferred to have 2 nodes of 4 cores.

I was curious about the scheduler load across NUMA nodes. I came up with the following query on sys.dm_os_workers.

SELECT
	COUNT(*) AS WorkerCount
	, parent_node_id
	, COUNT(DISTINCT scheduler_id) AS SchedulersCount
	, COUNT(*)/COUNT(DISTINCT scheduler_id) AS AvgWorkersBySchedulers
FROM sys.dm_os_workers AS w
	JOIN sys.dm_os_schedulers AS s
		ON w.scheduler_address = s.scheduler_address
WHERE s.status = 'VISIBLE ONLINE'
GROUP BY parent_node_id

The second NUMA node with only 2 schedulers has about 40% of all workers.
There seems to be a load difference between the 2 NUMA nodes.

Compared with another instance having a similar configuration but in Enterprise Edition, the workers seem better distributed on the schedulers.

Going a little bit further we can have look at tasks and associated sessions per scheduler.

SELECT COUNT(*) AS Tasks
	, s.parent_node_id
	, t.scheduler_id
	, s.is_idle
	, s.load_factor
	, STRING_AGG(COALESCE(CAST(t.session_id AS VARCHAR), 'n/a'), ', ') AS session_id
FROM sys.dm_os_tasks AS t
	LEFT JOIN sys.dm_os_workers AS w
		on w.task_address = t.task_address
			JOIN sys.dm_os_schedulers AS s
				on s.scheduler_address = w.scheduler_address
WHERE s.status = 'VISIBLE ONLINE'
GROUP BY s.parent_node_id, t.scheduler_id, s.is_idle, s.load_factor

We can see a larger number of tasks on schedulers 6 and 7 which belong to the second NUMA node. The CPU load is on average higher on these two schedulers.

Connections round-robin

What happens is that new connections made to SQL Server are assigned to NUMA nodes in a round-robin fashion.

Indeed with sys.dm_exec_connections, we can see that the distribution of connections is almost balanced on the 2 NUMA nodes.

SELECT COUNT(*) AS NodeConnections
	, node_affinity
FROM sys.dm_exec_connections
GROUP BY node_affinity

Fix

In this particular case, we simply reconfigured the VM to have only 2 sockets and 6 cores per socket. The 12 cores can thus be used without being constrained by the limitation of 4 sockets maximum in the Standard edition.
Things look better now.

The configuration change was done together with a change to the MAXDOP setting. We have set the MAXDOP value to the number of cores per NUMA node so in this case 6.
A slight decrease in CPU usage has been observed on this VM. As there is little activity on this SQL Server instance the change did not make a glaring difference.

Conclusion

Since the SQL Server connection assignment is done in a round-robin manner, having an uneven number of schedulers per NUMA node can result in an uneven CPU load.

In this customer context, there is not a very important SQL activity and the consequences of this sub-optimal NUMA configuration are negligible.

I found this subject interesting, beyond the license limitation it is a reminder that it is important to be aware of the NUMA configuration of your SQL Server environment.
I hope the few SQL queries above can help you understand a bit more about how the NUMA architecture works from a SQL Server perspective.


Thumbnail [60x60]
by
Steven Naudet