Nutanix clusters may experience significant CVM (Controller VM) networking latencies. While the physical NICs may not show active errors, the following indicators are present:
nicstats.vsish commands reveals that the VMXNET3 "1st ring" is full or running out of buffers.vSphere ESXi 8.x
The issue is typically two-fold, involving both physical and virtual buffer exhaustion:
Physical Layer: The physical NIC's internal Receive (RX) Ring Buffer overflows when the host cannot process incoming packets fast enough.
Virtual Layer: Even after increasing physical RX buffers, the virtual interface (VMXNET3) within the Guest OS (CVM) may still drop packets if its internal ring buffers are too small to handle the traffic bursts between the hypervisor and the VM.
Step 1: The ESXi host's reporting large "Receive missed errors" in the nic stats. In ESXi, "Receive Missed Errors" typically signal a bottleneck where the physical NIC’s internal Receive (RX) Ring Buffer overflows. This happens when the rate of incoming traffic exceeds the host's ability to "drain" the buffer and process packets.
Example:NIC statistics for vmnic2:Receive missed errors: 22418684NIC statistics for vmnic3:Receive missed errors: 66450
Its required to increase the vmnic's RX buffer size:
To identify the current RX buffer size:
esxcli network nic stats get -n vmnic#
To increase the RX buffer on the nic's:
esxcli network nic ring current set -n vmnicX -r xxxxwhere
X is the vmnic ie vmnic2 & 3
xxxx is the max value it needs to be set to.
Please refer the KB 415206 for more information.
Step 2: Verify VMXNET3 Buffer Status
Log into the ESXi host via SSH and identify the port number for the affected CVM. Run the following command to check for buffer exhaustion:
vsish -e get /net/portsets/<Switch_Name>/ports/<PortNumber>/vmxnet3/rxSummary | grep "1st ring"If # of times the 1st ring is full is greater than 0, the Guest OS buffers must be increased.
Step 3: Increase VMXNET3 Ring Buffer in Guest OS
Adjust the VMXNET3 ring buffer values within the Nutanix CVM (Guest OS level). Increasing these values allows the VM to handle larger bursts of traffic.
Note: Please refer to Broadcom KB 324556 for specific OS commands to tune rx-ring-size.
Step 4: Validate Driver and Firmware Compliance
Ensure the physical NIC (e.g., bnxtnet) is running versions supported by the VMware Compatibility Guide (HCL). Discrepancies between installed versions and the HCL can lead to inefficient buffer management. Contact your hardware vendor to align driver/firmware versions with the HCL.