VM VXLAN traffic fails on a host prepared for NSX
search cancel

VM VXLAN traffic fails on a host prepared for NSX

book

Article ID: 324199

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • ESXi host prepared for NSX-T Data Center
  • VMs running on the host generate VXLAN traffic based on some third party software
  • The VXLAN traffic is failing at transmit
  • On the ESXi host, the vmxnet3 driver stats show that "encap (outer) header error" is incrementing e.g.
#net-stats -l | grep VXLAN_VM
2100663515          4       0 DvsPortset-1     00:50:56:01:03:9f  VXLAN_VM

#vsish -e get /net/portsets/DvsPortset-1/ports/100663515/vmxnet3/txSummary
stats of a vmxnet3 vNIC tx queue {
   <snip>
   failed to split a giant tso pkt:0
   giant non-tso pkts requiring more than 1 pkt handle:0
   encap (outer) header errors:228   <<<<<
   encap (inner) tso header errors:0
   number of memory region lookup pass in Tx.:0
   <snip>


Environment

VMware NSX 4.0.0.1
VMware NSX-T Data Center

Cause

This issue was introduced when overlay offload functionality was added.
There are two scenarios impacted
  • VXLAN traffic is using standard VXLAN ports 4789 or 8472
  • VXLAN traffic is using non-standard VXLAN ports i.e. not 4789 or 8472

Resolution

For VXLAN traffic using standard VXLAN ports, this issue has been resolved in ESXi 6.7 Patch Release ESXi670-202111001 and ESXi 7 Update 3 see VMware downloads.

For VXLAN traffic using non-standard VXLAN ports, this issue is resolved by this vmxnet3 driver fix .

This configuration is typically observed using Linux VMs, please note that the vmxnet3 driver version will be linked to the Linux kernel version. Therefore, a Linux kernel upgrade may be required to get this fix.
This issue is confirmed resolved in RHEL 8, for other OSs you may need to consult the OS vendor.

Workaround:
For all scenarios, both standard and non-standard VXLAN ports, this issue can be solved by disabling tunnel offload on the VM guest OS.
Note, standard network stack offloads will continue to function as normal.

#ethtool -K eth0 tx-udp_tnl-segmentation off && ethtool -K eth0 tx-udp_tnl-csum-segmentation off