Consistent VXLAN Packet Loss in NSX Environment with BFD enabled.
search cancel

Consistent VXLAN Packet Loss in NSX Environment with BFD enabled.

book

Article ID: 342381

calendar_today

Updated On:

Products

VMware NSX Networking VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • Packet loss between VTEPS of ESXi hosts as well as on VXLAN traffic.
  • VXLAN traffic for a single destination host's VTEP is arriving to any of the destination host's VMNICs 
Occurs in environments where: 
  • More than 1 VTEP is configured per host.
  • BFD is enabled within NSX environment. 
  • Data Plane Learning is enabled for the Physical Environment. 


Environment

VMware NSX for vSphere 6.4.x
VMware ESXi 6.7.x
VMware ESXi 6.5.x

Cause

  • The issue occurs when physical environment (eg. Cisco ACI) uses data packets for IP discovery, rather than just arp snooping, and BFD is enabled within NSX.
  • BFD is enabled on NSX environment when VRNI feature "Virtual Infrastructure latency" monitoring is enabled.
  • Within NSX, when there are multiple VTEP per host - each VTEP is linked with a corresponding VMNIC. This is not the case for BFD traffic however. BFD Tunnel traffic from both VTEP's will leave the ESXi host based on the VXLAN routing table, using the source VMK interface specified for that destination network.
  • Due to the BFD behavior, traffic is leaving multiple VMNICs with the same source IP address (VTEP IP), but different MAC addresses (MAC address of the source VMK)
  • Dataplane learning on the physical network picks up on these packets and detects that VTEP IP address is seen from multiple mac addresses. Directs traffic to either interface based on the latest mac address detected.
Example #1 - Based on output below - we will see BFD traffic leaving from both VMK, with the same source VTEP IP addresses, which will cause the conflict.
[root@exmplesxi01:~] esxcli network ip route ipv4 list -N vxlan
Network        Netmask         Gateway            Interface    Source
--------       --------        ---------          ---------    ----------
default        0.0.0.0         10.0.0.254         vmk3         DHCP
10.1.1.0       255.255.255.0   0.0.0.0            vmk4         MANUAL


Example #2 - Based on output below - we will see BFD traffic all leaving from a single VMK. However, we will still experience the conflict, due to standard VXLAN traffic using the other VMNIC.
[root@exmplesxi02:~] esxcli network ip route ipv4 list -N vxlan
Network        Netmask         Gateway            Interface    Source
--------       --------        ---------          ---------    ----------
default        0.0.0.0         10.0.0.254         vmk4         MANUAL
10.1.1.0       255.255.255.0   0.0.0.0            vmk4         MANUAL

Resolution

Fix planned for incoming ESXi Releases.

Workaround:
To workaround the issue, you must do either of the following:
  • Disable the Virtual Infrastructure latency feature on VRNI, which in turn disabled BFD traffic on the NSX environment.
  • Alternatively, customer could adjust physical network to use ARP snooping instead of full dataplane learning. 


Additional Information

Previous issue regarding the use of VRNI's "Virtual Infrastructure latency" feature:
https://docs.vmware.com/en/VMware-NSX-Data-Center-for-vSphere/6.4/com.vmware.nsx.troubleshooting.doc/GUID-9FD905ED-00B0-445E-BB99-CEFEEEDCDE6B.html