tx_drops reported on the bond interface of Bare Metal Edge.
search cancel

tx_drops reported on the bond interface of Bare Metal Edge.

book

Article ID: 378762

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • For bare metal edges, when fp-eth* are bonded, whenever the link load is high, the bond interface stats may report an increment in the counter "tx_drops".

  • There may or may not be any impact (slowness, packet drops, etc.) reported on data plane.

  • From the edge log bundle, in the edge folder, the file "physical-ports-stats" reports the below stats:

    $ egrep -i 'name|tx_drops' physical-ports-stats
            "name": "fp-eth0",
            "tx_drops": 0,
            "name": "fp-eth1",
            "tx_drops": 59,
            "name": "fp-eth2",
            "tx_drops": 0,
            "name": "fp-eth3",
            "tx_drops": 0,
            "name": "bond-######8392",
            "tx_drops": 9450816953,

  • As we can see from the above stats, the individual fp-eth* interfaces do not show any tx_drops.

  • These tx_drops are not originating from LACP PDUs:

    "Device Name": "bond-######8392",
            "Name": "lag2",
            "Slaves": [
                {
                    "LACP drops": 3,
                    "Name": "fp-eth0",
                    "Rx LACP errors": 0,
                    "Rx LACP pdus": 869641,
                    "Tx LACP errors": 0,
                    "Tx LACP pdus": 869628
                },
                {
                    "LACP drops": 3,
                    "Name": "fp-eth1",
                    "Rx LACP errors": 0,
                    "Rx LACP pdus": 869638,
                    "Tx LACP errors": 0,
                    "Tx LACP pdus": 869627
                }
            ],
            "name": "bond-######8392",
            "rx_bytes": 82009966648582,
            "rx_drop_no_match": 27180991,
            "rx_errors": 0,
            "rx_misses": 3248,
            "rx_nombufs": 0,
            "rx_packets": 302103348921,
            "tx_bytes": 3900809584547731,
            "tx_drops": 9450816953,
            "tx_errors": 0,
            "tx_packets": 2698624678379

  • Looking at the traffic counters for lag2 / bond-######8392, there is a large discrepancy between the two member ports (fp-eth0 and fp-eth1) for tx traffic. In this case, it largely is using fp-eth1:

    "name": "fp-eth0",
    "rx_packets": 150912676465,
    "tx_packets": 869636

    "name": "fp-eth1",
    "rx_packets": 151190672283,
    "tx_packets": 2698623808449

Environment

VMware NSX

Cause

  • For a bond interface, the tx_drops is a software counter maintained by the dataplane. Whereas, other counters are read from the physical NIC. Therefore, tx_drops is seen only for the bond interface and not reported on the physical fp-eth0 and fp-eth1 interfaces.
  • In this case, as the tx is largely hashed to fp-eth1, this may indicate there is a jumbo flow going out on this interface.
  • The most likely reason for drops on the bond is burst of traffic, or jumbo flow on a single interface (as the single interface is unable to handle all the traffic sent to the bond).
  • A single TCP session or a large single flow (e.g. in the case of a VPN session) cannot be hashed / spread across two interfaces.

Resolution

During the time of the issue, some of the below things (not limited to) can be reviewed to further isolate the cause of tx_drops.

  • Review the utilization (from vROPS or any other monitoring tool) of the individual fp-eth* interfaces and determine if the tx is largely using a particular interface.
  • If yes, perform a packet capture on that interface to understand what kind of traffic is generating more load.

Depending on the type of traffic identified, the system needs to re-architected for better consistent flow across the individual fp-eth* interfaces.