Beginning with Update 11, Microsoft SQL Server 2022 introduced a breaking change that limits the maximum number of logical processors in a single NUMA node supported by SQL Server to 64. When SQL Server is running on a system in which more than 64 logical cores are present in a single NUMA node, the corresponding SQL Server service will not start until the condition is corrected.
vSphere provides a simplified, flexible mechanism for controlling the number of cores per socket and the resulting NUMA topology presented to a VM. You should use these configuration options to appropriately size your Microsoft SQL Server VMs to ensure that, regardless of the number of logical processors allocated to the VM, no more than 64 cores are present in a single NUMA node.
By efficiently controlling the NUMA topology presented to the VM, you can avoid both the disruption associated with the breaking change and take advantage of the benefits of SMT.
Configuration recommendation
The following is a high-level description of how Customers can safely assign logical processors to their SQL Server workloads without disabling hyperthreading.
The example presented here is based on the currently shipping version of vSphere (8.0 as at the time of this writing), but the concept and options are available in all recent modern versions of vSphere.
On vSphere, logical processors can be assigned to a VM in one of the following ways:
• Let the hypervisor determine the optimal topology (default behavior)
• Manually specify the number of cores per socket and the number of desired NUMA nodes (recommended solution)
Let the hypervisor determine the optimal topology (default behavior)
This option, shown in the screenshot below, results in a 2-socket, 2-NUMA node topology (each with 72 logical processors). This configuration will trigger the breaking change and prevent the SQL Server VM from starting.
Warning: Whether hyperthreading is enabled or not, this problem will still arise because the ESXi host contains more than 64 physical cores in a socket. Therefore, Broadcom does not recommend this option.
Manually specify the number of cores per socket and the number of desired NUMA nodes
The screenshot below shows that we selected 48 cores per socket—a lower number than the problematic 64. This indicates that a 4-socket topology, with no more than 48 logical CPUs per socket, will be visible to the guest operating system. (Also refer to Configure the CPU Topology of a Virtual Machine.)
If we further select the desired number of NUMA nodes we want to present (4, in this case), ESXi intelligently scales down the number of cores in each socket to 36 cores per socket to provide the total number of desired logical processors to present to our VM. This means that the guest operating system will see 4 NUMA nodes, each with 36 logical processors in it.
The screenshot below illustrates this manually configured topology that the SQL Server VM sees, irrespective of the hardware's hyperthreading state and the ESXi host's underlying physical NUMA topology.
Lastly, Microsoft SQL Server receives and works with this same presentation. As seen in the screenshots below, the SQL Server Soft NUMA feature accepts the presentation and uses the topology to construct its own NUMA topology.
With this sample configuration, you can successfully run your large-capacity SQL workloads on vSphere platforms and maintain the benefits of hyperthreading, which will not trigger the breaking change.
When designing the VM topology, it is crucial to consider the physical compute and NUMA characteristics. Even though the above solution suggests building additional smaller vNUMA nodes to solve the limitation, you still need to make sure the vNUMA nodes and their vCPU counts are evenly divisible and fit into the physical socket(s).
Example: Dual socket EPYC 9654 with SMT, where each socket has 96 cores and 192 threads.
In this case, to utilize the whole host, you would optimally create 4 vNUMA nodes, each with 48 vCPUs, such that 2 vNUMA nodes would fit into each socket. This symmetrical sizing based on physical core count ensures optimal performance by not over-subscribing a socket.