VCF/NSX Transport Node Reporting "Abnormally" in Aria Operations with High LRO Aborts
search cancel

VCF/NSX Transport Node Reporting "Abnormally" in Aria Operations with High LRO Aborts

book

Article ID: 438188

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Customers using Broadcom BCM5741x or BCM5751x NICs in a VMware Cloud Foundation (VCF) or NSX environment may observe the following symptoms:

  • Aria Operations (formerly vROps) Alert: esx-### TransportNode is acting abnormally since [Timestamp]
  • NSX Manager Alarms: Transport Node Controller/Manager Connectivity is not UP or TRANSPORT NODE DOWN alerts for the same host.
  • Performance Issues: Intermittent network connectivity loss, high latency, or RDP/ping failures for VMs on the NSX overlay (Geneve) network.
  • Recovery: Issues eventually self-resolve.  A vMotion of affected VMs or host maintenance will also resolve the issue.

Verification of the Issue:

  • Review the ESXi support bundle to confirm if LRO aborts are occurring on the vmnics used for the NSX datapath.

              Run the following command from the root directory of the log bundle:  

              ag "NIC statistics for vmnic|LRO aborts rx: [1-9]" commands/nicinfo.sh.txt

If "LRO aborts rx" values are high (as shown above) on vmnics used for NSX traffic, proceed to the workaround.

Environment

VMware Aria Operations (formerly vRealize Operations)

VMware NSX (VMware NSX)

Cause

This issue is caused by a hardware limitation on Broadcom BCM5741x and BCM5751x adapters, which do not support hardware Large Receive Offload (LRO) for GENEVE-encapsulated traffic. When GENEVE traffic is present, the NIC attempts to perform hardware LRO but aborts the process, leading to high CPU overhead on the software stack and subsequent management connectivity drops. Performance drops on BCM5741x NICs with GENEVE traffic

Resolution

Workaround:

To resolve the performance and connectivity issues, you must disable hardware LRO in the bnxtnet driver, which forces the host to use software LRO.

Caution: Do not disable hardware LRO on vmnics used for vSAN storage traffic, as this can cause storage performance degradation.

  • Log in to the ESXi host CLI via SSH.
  • Set the disable_tpa parameter for the bnxtnet module. The value is a comma-separated list of 0 (enabled) and 1 (disabled) corresponding to the vmnic port order.
  • Example: To disable LRO on vmnic2 and vmnic3 while keeping it enabled on vmnic0 and vmnic1:

         esxcli system module parameters set -m bnxtnet -p 'disable_tpa=0,0,1,1'

  • Reboot the ESXi host for the changes to take effect.

Additional Information

Performance drops on BCM5741x NICs with GENEVE traffic

Increased application latency detected in an NSX Environment on an NSX-T prepared host