Packet Loss and Buffer Exhaustion on F5 BIG-IP VE in Azure VMware Solution (AVS)
search cancel

Packet Loss and Buffer Exhaustion on F5 BIG-IP VE in Azure VMware Solution (AVS)

book

Article ID: 441778

calendar_today

Updated On:

Products

VMware NSX VMware vSphere ESXi 8.0

Issue/Introduction

Users may experience the following symptoms on an F5 BIG-IP Virtual Edition (VE) or similar high-throughput appliances within an Azure VMware Solution (AVS) or NSX-T environment:

  • Significant packet loss on Tier-1 (T1) routers or specific Edge Virtual Machines (e.g., EVM01).
  • High CPU utilization reported in vCenter (~140% or higher due to Turbo Boost/Hyperthreading) while Guest OS reports lower usage (~60%).
  • Network performance degradation and intermittent latency for inbound traffic.
  • Statistics show steady increments in the following counters:
    • Guest/vNIC: running out of buffers and 1st ring is full.
    • ESXi Host/pNIC: outOfBuffer or Receive missed errors on physical uplinks (e.g., Mellanox vmnic0).

Environment

VMware vSphere ESXi 8.0
VMware NSX

Cause

This issue is typically caused by buffer exhaustion at both the virtual NIC (VMXNET3) and physical NIC layers. High traffic volume (e.g., >700K PPS) can saturate the ring buffers if the network stack cannot process packets fast enough. This is often exacerbated by:

  1. CPU Contention: Insufficient guaranteed CPU cycles for the VM to process the interrupt-driven network load.
  2. Storage Latency: High storage I/O latency (observed spikes up to 21ms) can cause VMkernel world stalls, delaying network packet processing.
  3. Resource Constraints: Jitter in CPU scheduling preventing the timely clearing of the Rx rings.

Resolution

To resolve the packet loss, apply a combination of resource reservations and performance tunings to the Virtual Machine and Host configuration.

  1. Increase Guest OS Ring Buffers
    Increasing the buffer size provides a larger cushion for traffic bursts.
    • Windows: In Device Manager, under VMXNET3 Properties > Advanced, increase Small Rx Buffers and Rx Ring #1 Size to their maximum (typically 4096 or 8192).
    • Linux/F5: Use 'ethtool -G <interface> rx 4096'. 
       
  2. Configure Resource Reservations
    Ensure the VM has guaranteed access to physical CPU cores to reduce scheduling delays.
    • Set a CPU Reservation in vCenter that meets or exceeds the workload's peak requirement.
    • Set Latency Sensitivity to High (VM Edit Settings > VM Options > Advanced > Latency Sensitivity).
       
  3. Apply High-Performance vXLAN/vNIC Tunings
    Add the following parameters to the VM's '.vmx' configuration (requires power-off):
    • 'ethernetX.pnicFeatures = 4': Enables Virtual Network Interrupt Coalescing to reduce CPU interrupt overhead.
    • 'ethernetX.ctxPerDev = 3': Enables multi-threaded execution for the vNIC path, allowing load distribution across multiple physical cores.
    • 'sched.cpu.latencySensitivity = High': Ensures the scheduler prioritizes this VM's execution threads.
       
  4. Verify Physical NIC Status
    If outOfBuffer errors persist on the physical NIC despite VM tunings, ensure the ESXi host is not oversubscribed and check for physical link-level issues or firmware mismatches on the host uplinks.

Additional Information