Symptoms:
NSX for vSphere 6.x
When using Unicast or Hybrid replication modes, ESXi hosts send Broadcast, Multicast and Unknown Unicast (BUM) traffic via designated 'proxy VTEPs' - also called UTEPs or MTEPs - in the destination cluster. Traffic sent to this designated proxy VTEP has the 'replication flag' set and it's that host's responsibility to replicate these frames to all other VTEPs on the same network segment. If one or more hosts in the cluster are in a bad state from a VXLAN perspective, it's possible that they may not be replicating BUM traffic as they are supposed to.
Because each ESXi host in a cluster determines a proxy VTEP independently, it's possible that some hosts may communicate
To confirm that this is the case, it's necessary to determine the pattern of source/destination ESXi hosts that are not functioning. See if all instances where BUM traffic fails use the same UTEP/MTEP for replication. Be sure to always test in the same VXLAN, as each will have a different UTEP/MTEP selected.
This command displays the UTEP/MTEP selected by an ESXi host for a given VXLAN network.
# net-vdl2 -M vtep -s DVSWITCH1 -n 5002
VTEP count: 12
<snip>
Segment ID: 192.168.1.0
VTEP IP: 192.168.1.102
Flags: 1(MTEP)
Notes:
If you experience all problematic data paths use the same proxy VTEP, you need to examine the host carefully to determine if it is the source of the problem. Look for any VXLAN configuration issues reported in the NSX UI, missing or duplicate VTEPs and issues of that sort.
To work around the issue, put the host in the maintenance mode and remove it from cluster so that NSX can no longer use it for proxy VTEP purposes in that network segment.