VMware Virtual SAN (vSAN) container connectivity issues when used with NSX
search cancel

VMware Virtual SAN (vSAN) container connectivity issues when used with NSX

book

Article ID: 375803

calendar_today

Updated On:

Products

VMware NSX VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0 VMware vSAN 8.x VMware vSAN 7.x VMware NSX-T Data Center

Issue/Introduction

Symptoms:

  • Issue appears to be on the network used by the vSAN File Service Nodes container VM when used with NSX installed
  • It doesn't look like an ARP issue but more like a layer-2 forwarding issue
  • Some of the NSX Edge node has the wrong VTEP address for the MAC address belonging to the vSAN FS node(s).  
  • ARP gives a correct mapping of IP to MAC address.
  • The ESXi hosts, on the other hand,  don't seem to have this problem.
  • If we ping between the vSAN FS VMs then they answer so there is some discrepancy between the MAC tables on the hosts and the MAC table on the Edge
  • MAC learning enabled as per vSAN FS setup recommendations. 

 

Environment

VMware NSX from 3.2.0.1 and newer
Any ESXi version compatible, vSAN FS Services were introduced in ESXi 7.0

Cause

The issue can be described in multiple steps: 

1. The container startup scripts using arping with -A option, which generate ARP reply packet with  a wrong option

2. Since the replication mode is MTEP replication, and two edges' VTEP are in different subnet as ESXi (as per best practice), the ESXi host picks one edge to perform MTEP replication.

3. When the Edge received the ARP reply, it updated its mac-vtep mapping. Since it's an ARP reply with unicast target mac address in the payload, the Edge does not uses the routing-domain for replication, but  instead uses infrastructure (logical switch/ segment) for replication and the ARP reply is not replicated to the other edge in the cluster that did not learn correctly the updated mac-vtep mapping, causing the issue.


 

Resolution

The solution is to manually run this command inside the container. In this example 192.169.1.1 is the IP address of the eth0 interface of the vSANFS container VM. (replace with the actual address) since we cannot modify the startup script.

/usr/sbin/arping -b -c 1 -U -I eth0 192.168.1.1


The issue is fixed in NSX version 4.2.1

Additional Information

The issue seems to be more likely to happen with NSX Bare Metal Edges .