Nutanix clusters reports latency alerts between CVM's (Controller VM). Alert "X ms latency is experienced from CVM < Source CVM IP> to CVM <Destination VM IP>. Please check the network configuration and possible bottlenecks in the network."
search cancel

Nutanix clusters reports latency alerts between CVM's (Controller VM). Alert "X ms latency is experienced from CVM < Source CVM IP> to CVM <Destination VM IP>. Please check the network configuration and possible bottlenecks in the network."

book

Article ID: 435468

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Nutanix clusters may experience significant CVM (Controller VM) networking latencies. While the physical NICs may not show active errors, the following indicators are present:

  • Nutanix alerts regarding CVM communication latency or "CVM backplane network slowness."



  • ESXi hosts showed high "Receive missed errors" in nicstats.

    NIC statistics for vmnicX:
          Packets received: 2040984856
          Packets sent: 2894479722
          Bytes received: 1170176003221
          Receive missed errors: 2894472345

  • Running vsish commands reveals that the VMXNET3 "1st ring" is full or running out of buffers.

Environment

vSphere ESXi 8.x

Cause

The issue is typically two-fold, involving both physical and virtual buffer exhaustion:

  1. Physical Layer: The physical NIC's internal Receive (RX) Ring Buffer overflows when the host cannot process incoming packets fast enough.

  2. Virtual Layer: Even after increasing physical RX buffers, the virtual interface (VMXNET3) within the Guest OS (CVM) may still drop packets if its internal ring buffers are too small to handle the traffic bursts between the hypervisor and the VM.

Resolution

Step 1: The ESXi host's reporting large "Receive missed errors" in the nic stats. In ESXi, "Receive Missed Errors" typically signal a bottleneck where the physical NIC’s internal Receive (RX) Ring Buffer overflows. This happens when the rate of incoming traffic exceeds the host's ability to "drain" the buffer and process packets.

Example:

NIC statistics for vmnic2:
Receive missed errors: 22418684
NIC statistics for vmnic3:
Receive missed errors: 66450

Its required to increase the vmnic's RX buffer size:

To identify the current RX buffer size:

esxcli network nic stats get -n vmnic#

To increase the RX buffer on the nic's:

esxcli network nic ring current set -n vmnicX -r xxxx

where 

X is the vmnic ie vmnic2 & 3
xxxx is the max value it needs to be set to.

Please refer the KB 415206 for more information.


Step 2:
Verify VMXNET3 Buffer Status
Log into the ESXi host via SSH and identify the port number for the affected CVM. Run the following command to check for buffer exhaustion:

vsish -e get /net/portsets/<Switch_Name>/ports/<PortNumber>/vmxnet3/rxSummary | grep "1st ring"

If # of times the 1st ring is full is greater than 0, the Guest OS buffers must be increased.

Step 3: Increase VMXNET3 Ring Buffer in Guest OS
Adjust the VMXNET3 ring buffer values within the Nutanix CVM (Guest OS level). Increasing these values allows the VM to handle larger bursts of traffic.

Note: Please refer to Broadcom KB 324556 for specific OS commands to tune rx-ring-size.

Step 4: Validate Driver and Firmware Compliance
Ensure the physical NIC (e.g., bnxtnet) is running versions supported by the VMware Compatibility Guide (HCL). Discrepancies between installed versions and the HCL can lead to inefficient buffer management. Contact your hardware vendor to align driver/firmware versions with the HCL.

Additional Information

Related KB:  Large packet loss (dropped packets in Virtual Machines) in the guest OS using VMXNET3 in ESXi