Users may experience the following symptoms on high-performance SQL Server virtual machines (VMs):
- Degraded SQL Server performance despite high CPU and RAM allocations.
- Internal SQL Server blocking or high wait times.
- High Disk Read/Write latency reported within the Guest OS, even when storage array latency appears healthy.
- Concerns regarding uneven NUMA core distribution or sub-optimal CPU/RAM settings.
VMware vSphere ESX 8.x
While NUMA misconfiguration can lead to performance degradation due to remote memory access, performance issues in high-I/O SQL workloads are often caused by:
1. Storage I/O Bottlenecks: A single SCSI controller reaching its maximum queue depth when managing multiple high-I/O virtual disks.
2. Storage Latency: Underperforming storage paths or misconfigured VAAI (vStorage APIs for Array Integration) causing command failures.
3. Outdated Firmware: Missing BIOS updates on the physical host affecting resource management and stability.
To resolve these issues, follow the validation and configuration steps below:
1. Verify NUMA Alignment
Review the ESXi logs or use `esxtop` (Memory view - NHN) to ensure:
- The VM's virtual NUMA (vNUMA) topology aligns with the host's physical NUMA topology.
- All memory is showing as "Local" in the NUMA Home Node (NHN).
- There are no frequent active migrations of vCPUs across NUMA nodes.
2. Optimize SCSI Controller Configuration
For high-I/O VMs like SQL Server, VMware recommends distributing disk I/O across multiple controllers:
- Use Multiple PVSCSI Adapters:** Configure the VM with up to four (4) VMware Paravirtual SCSI (PVSCSI) adapters.
- Distribute Load:** Place the Operating System, SQL Data files, and Transaction Logs on separate virtual SCSI controllers to leverage multiple I/O queues.
3. Address Storage Latency
Check `vmkernel` logs for storage-related warnings:
- Monitor for `ScsiVsi` or `StorageDeviceVsi` throttle warnings.
- Ensure VAAI is fully supported and functional for all LUNs. Use `esxcli storage core device vaai status get` to verify.
- Verify that the datastore capacity does not exceed 85% to maintain optimal metadata performance.
4. Hardware and Guest OS Tuning
- Update BIOS: Ensure the physical host is running the latest BIOS/firmware version recommended by the hardware vendor (e.g., Cisco UCS, Dell PowerEdge).
- SQL MaxDOP: Configure the 'Max Degree of Parallelism' in SQL Server to half the number of physical cores, up to a maximum of 8.
- Allocation Unit Size: Use a 64KB allocation unit size when formatting partitions for SQL Server data and logs.