Poor performance for traffic going through an NSX Edge when using ESXi 6.5 or above and pNIC software LRO
search cancel

Poor performance for traffic going through an NSX Edge when using ESXi 6.5 or above and pNIC software LRO

book

Article ID: 326332

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • Poor performance for traffic going through an NSX Edge when using ESXi 6.5 or above
 
  • Hardware LRO is disabled/not supported:
#esxcli network nic queue loadbalancer list
NIC     RxQPair  RxQNoFeature  PreEmptibleQ  RxQLatency  RxDynamicLB  DynamicQPool  NumaIOAwareLB  RSS  LRO  GeneveOAM
------  -------  ------------  ------------  ----------  -----------  ------------  -------------  ---  ---  ---------
vmnic0  UA       ND            UA            UA          NA           UA            NA             UA   UA   UA
vmnic1  UA       ND            UA            UA          NA           UA            NA             UA   UA   UA
vmnic2  UA       ND            UA            UA          NA           UA            NA             UA   UA   UA
vmnic3  UA       ND            UA            UA          NA           UA            NA             UA   UA   UA


Where:
 - U: Setting unsupported by device
 - S: Setting supported by device
 
  • Software LRO is enabled:
#esxcli system settings advanced list -o /Net/NetpollSwLRO 
   Path: /Net/NetpollSwLRO
   Type: integer
   Int Value: 1 <--- 1: enabled, 0: disabled
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Whether to perform SW LRO on pkts in netpoll
  • The output of #net-stats -A -t WwQqihV shows pktsizeout higher than the NSX Edge vNic MTU:
{"name": "TEST-EDGE.eth1", "switch": "DvsPortset-0", "id": 67108882, "mac": "00:50:56:b7:06:b6", "rxmode": 0, "tunemode": 0, "uplink": "false",
  "txpps": 9061, "txmbps": 5.2, "txsize": 72, "txeps": 0.00, "rxpps": 11230, "rxmbps": 136.0, "rxsize": 1513, "rxeps": 0.00,
  "vnic": { "type": "vmxnet3", "ring1sz": 512, "ring2sz": 128, "tsopct": 0.0, "tsotputpct": 0.0, "txucastpct": 100.0, "txeps": 0.0,
    "lropct": 0.0, "lrotputpct": 0.0, "rxucastpct": 100.0, "rxeps": 0.0,
    "maxqueuelen": 0, "requeuecnt": 0.0, "agingdrpcnt": 0.0,
    "txdisc": 0.0, "qstop": 0.0, "txallocerr": 0.0, "txtsosplit": 0.0, "r1full": 0.0, "r2full": 0.0, "sgerr": 0.0},
  "rxqueue": { "count": 1, "details": [
    {"intridx": 0, "pps": 11230, "mbps": 136.0, "errs": 0.0} ]},
  "txqueue": { "count": 1, "details": [
    {"intridx": 0, "pps": 9061, "mbps": 5.2, "errs": 0.0} ]},
  "intr": { "count": 2, "details": [ 7471, 0] },
  "sys": [ "151055" ],
  "vcpu": [ "120137" ],
  "histos":[
    { "name": "pktsizein", "min": 60, "max": 102 ,"mean": 72, "count": 9061,
      "values":[[66, 46.6], [512, 53.4], [1024, 0.0], [1518, 0.0], [4096, 0.0], [9018, 0.0], [16402, 0.0], [32786, 0.0], [65554, 0.0], [131072, 0.0], [262144, 0.0], [262145, 0.0]] },
    { "name": "pktsizeout", "min": 60, "max": 4410 ,"mean": 1661, "count": 10191,
      "values":[[66, 0.0], [512, 0.0], [1024, 0.0], [1518, 89.8], [4096, 10.2], [9018, 0.0], [16402, 0.0], [32786, 0.0], [65554, 0.0], [131072, 0.0], [262144, 0.0], [262145, 0.0]] },
    { "name": "clusterin", "min": 1, "max": 2 ,"mean": 1, "count": 6958,
      "values":[[1, 69.8], [2, 30.2], [4, 0.0], [8, 0.0], [16, 0.0], [32, 0.0], [64, 0.0], [128, 0.0], [265, 0.0], [512, 0.0], [1024, 0.0], [2048, 0.0], [4096, 0.0], [8192, 0.0], [8193, 0.0]] },
    { "name": "clusterout", "min": 1, "max": 2 ,"mean": 1, "count": 6958,
      "values":[[1, 53.5], [2, 46.5], [4, 0.0], [8, 0.0], [16, 0.0], [32, 0.0], [64, 0.0], [128, 0.0], [265, 0.0], [512, 0.0], [1024, 0.0], [2048, 0.0], [4096, 0.0], [8192, 0.0], [8193, 0.0]] },
    { "name": "pktintervalin", "min": 0, "max": 34601 ,"mean": 110, "count": 9061,
      "values":[[0, 23.2], [10, 0.0], [25, 0.0], [50, 0.0], [100, 0.1], [250, 76.5], [500, 0.2], [750, 0.0], [1000, 0.0], [2000, 0.0], [5000, 0.0], [10000, 0.0], [20000, 0.0], [25000, 0.0], [50000, 0.0], [75000, 0.0], [100000, 0.0], [500000, 0.0], [500001, 0.0]] },
    { "name": "pktintervalout", "min": 0, "max": 34581 ,"mean": 98, "count": 10190,
      "values":[[0, 31.7], [10, 0.0], [25, 0.0], [50, 0.0], [100, 0.1], [250, 68.1], [500, 0.1], [750, 0.0], [1000, 0.0], [2000, 0.0], [5000, 0.0], [10000, 0.0], [20000, 0.0], [25000, 0.0], [50000, 0.0], [75000, 0.0], [100000, 0.0], [500000, 0.0], [500001, 0.0]] } \
]},

 
  • A packet capture from the ESXi host done inbound on the Edge vNic connected to VXLAN (in most cases a transit VXLAN) reveals TCP segments with incorrect checksums:
#pktcap-uw --switchport EDGE_VXLAN_VNIC_PORT_ID --dir 1 --ng -o - | tcpdump -envr - | grep incorrect
    11.11.11.11.17229 > 22.22.22.22.rsync: Flags [S], cksum 0x5626 (incorrect -> 0x3bdf), seq 4049220372, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 4], length 0
    11.11.11.11.17229 > 22.22.22.22.rsync: Flags [.], cksum 0x561a (incorrect -> 0x3569), ack 2641433779, win 1825, length 0
    11.11.11.11.17229 > 22.22.22.22.rsync: Flags [P.], cksum 0x5628 (incorrect -> 0x8dbe), seq 0:14, ack 1, win 1825, length 14


Cause

The issue occurs on ESXi host 6.5 and above when pNic/drivers do not support Hardware LRO and "Uplink Software LRO" (feature introduced in ESXi 6.5) performs the LRO operation and aggregates multiple segments into larger segments for delivery.
When those larger segments reach the Edge's vNIC, the vmxnet3 backend finds the guest OS didn't request LRO (i.e. #ethtool -K lro off for Linux) and re-segment the large segment to match the Edge vNic MTU. However, the resulting segments are marked as checksum verified in the vNIC vmxnet3 backend receive descriptors even though we did not insert an actual correct checksum. As a result, the segments have an invalid TCP checksum and are forwarded by the Edge with this incorrect checksum causing the destination to drop the segments. The segments are eventually retransmitted causing performance degradation.

Resolution

The issue is fixed in NSX for vSphere 6.3.6 and NSX for vSphere 6.4.2

Workaround:

To work around the issue, disable Software LRO on the ESXi hosts where the Edge VMs are running and reboot the ESXi host for the changes to apply.

To enable Software LRO: #esxcli system settings advanced set -o /Net/NetpollSwLRO -i 1
To disable Software LRO: #esxcli system settings advanced set -o /Net/NetpollSwLRO -i 0
To verify Software LRO: #esxcli system settings advanced list -o /Net/NetpollSwLRO