Rx Errors observed on edge nodes
search cancel

Rx Errors observed on edge nodes

book

Article ID: 381857

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Rx errors counters can be observed increasing on the Edge nodes, can be verified with command: get physical-port fp-ethX stats
    "rx_bytes": 50938244949145,
    "rx_drop_no_match": 131405052,
    "rx_errors": 1518865,
    "rx_misses": 0,
    "rx_nombufs": 0,
    "rx_packets": 44566952565,
    "tx_bytes": 48976905544684,
    "tx_drops": 0,
    "tx_errors": 0,
    "tx_packets": 42340387881
  • When checked on ESXi host CLI which is hosting theEdge node, we can see in the switchport stats (in vsish mode), the Rx errors counter is equivalent to the counter: number of times packets are dropped by rx try lock queueing
    stats of a vmxnet3 vNIC rx queue {
     LRO pkts rx ok:0
     LRO bytes rx ok:0
     pkts rx ok:45234120478
     bytes rx ok:51673765088900
     unicast pkts rx ok:45100243811
     unicast bytes rx ok:51662023759860
     multicast pkts rx ok:106833514
     multicast bytes rx ok:9166380254
     broadcast pkts rx ok:27043153
     broadcast bytes rx ok:2574948786
     running out of buffers:0
     pkts receive error:0
     1st ring size:4096
     2nd ring size:4096
     # of times the 1st ring is full:0
     # of times the 2nd ring is full:0
     fail to map a rx buffer:0
     request to page in a buffer:0
     # of times rx queue is stopped:0
     failed when copying into the guest buffer:0
     # of pkts dropped due to large hdrs:0
     # of pkts dropped due to max number of SG limits:0
     pkts rx via data ring ok:0
     bytes rx via data ring ok:0
     Whether rx burst queuing is enabled:0
     current backend burst queue length:0
     maximum backend burst queue length so far:0
     aggregate number of times packets are requeued:0
     aggregate number of times packets are dropped by PktAgingList:0
     # of pkts dropped due to large inner (encap) hdrs:0
     number of times packets are dropped by burst queue:0
       number of times packets are dropped by rx try lock queueing:1519081

Environment

VMware NSX 4.x
VMware NSX-T Data Center 3.x

Cause

  • Explanation for "number of times packets are dropped by rx try lock queueing" counter in switchport stats:

    When multiple vmkernel networking thread tries to deliver packet to the same vNIC concurrently, some serialization is needed. To avoid having thread spin waiting for lock which wastes CPU cycles, we instead have those packets queued up and the thread doing that can continue with other work. But for any queue, we have a queue size limit. When this limit is reached, we drop packets.
    The reason a max queue size limit is needed is to avoid having too much packet buffer memory queued somewhere, depleting available packet memory which can start affecting traffic for other vNICs or vmknics.

  • RX errors: These drops can happen if there are multiple pollWorlds delivering packets to a particular vnic queue. We have an upper bound (default 256) of how many packets can be queued before they can be processed for rx delivery. If number of incoming pkts exceed this limit, they will be dropped.

Resolution

  • We can try increasing the queue size to 512 or 1024 on the host using the below command:
    esxcfg-advcfg -s 512 /Net/Vmxnet3RxPollBound
  • The default value is 256, can be verified with the below command:
    esxcfg-advcfg -g /Net/Vmxnet3RxPollBound
  • Once the new size is set, the VNIC needs to be reset or Edge VM has to be Powered off and on.

Additional Information

Increasing default queue size to a larger size may result in longer latency.
A Support case may be required if the issue is not fixed by increasing the queue size or if it results into longer than expected latencies.

 

If you are contacting Broadcom support about this issue, please provide the following:

  • NSX Manager support bundles.
  • ESXi host support bundles for hosts that are failing to configure as transport nodes.
  • Text of any error messages seen in NSX GUI or command lines pertinent to the investigation.

Handling Log Bundles for offline review with Broadcom support.