During periods of high storage throughput, NFS traffic experiences sustained latency spikes between 20-25 msec
vSphere ESX
The environment utilizes Blade servers where chassis-level physical network resources are shared.
Currently, the NFS vmkernel interface is configured to use a specific uplink (vmnicX) across multiple hosts.
Although the virtual switch has multiple uplinks available, traffic is statically pinned or failing over to specific vmnicX.
In a blade chassis, multiple blades share the same physical I/O modules (IOMs). Because multiple hosts are simultaneously driving high-bandwidth NFS traffic through the same specific virtual interface (vmnicX), these hosts contend for the same underlying physical lane on the chassis backplane. This aggregation of traffic on a single physical link causes saturation and subsequent latency
Rebalance the NFS traffic load by distributing the active uplinks across the cluster. Distributing the vmkernel traffic across multiple vmnics/uplinks prevents the saturation of a single physical I/O module lane, effectively increasing the available aggregate bandwidth and eliminating the bottleneck
The issue is present in ESX versions that precede version 8.x