Communication between VXLAN backed Guest VMs drop shortly after a vMotion Operation. The root cause may be the multicast based underlying physical network, which isn't correctly configured to properly support the multicast group. This procedure describes a test procedure to validate the physical network with a multicast group ping.
How to test multicast operation:
~ # esxcli network ip interface ipv4 get -N vxlan
Name IPv4 Address IPv4 Netmask IPv4 Broadcast Address Type DHCP DNS
---- ------------ ------------- -------------- ------------ --------------
vmk5 10.20.32.37 255.255.255.0 10.20.32.255 DHCP false Where vmk5 is the VTEP interface for VXLAN.
~ # net-vdl2 -l | grep -A5 5549
VXLAN network: 5549
Multicast IP: 225.x.x.6
Control plane: Disabled
MAC entry count: 3
ARP entry count: 0
Port count: 1
- Multicast Group IP is 225.x.x.6
- VTEP interface is vmk5
Failed operation, only local host responds to group ping:
~ # vmkping ++netstack=vxlan -I vmk9 225.x.x.184
PING 225.x.x.184 (225.x.x.184): 56 data bytes
64 bytes from 10.20.32.150: icmp_seq=0 ttl=64 time=0.070 ms
64 bytes from 10.20.32.150: icmp_seq=1 ttl=64 time=0.060 ms
64 bytes from 10.20.32.150: icmp_seq=2 ttl=64 time=0.115 ms
When working, all multicast group member ESXi hosts respond to the ping, which will be reported as DUP, duplicate, responses:
~ # vmkping ++netstack=vxlan -I vmk9 225.x.x.184
PING 225.x.x.184 (225.x.x.184): 56 data bytes
64 bytes from 10.20.32.150: icmp_seq=0 ttl=64 time=0.068 ms
64 bytes from 10.20.32.24: icmp_seq=0 ttl=64 time=0.291 ms (DUP!)
64 bytes from 10.20.32.37: icmp_seq=0 ttl=64 time=0.381 ms (DUP!)
64 bytes from 10.20.32.150: icmp_seq=1 ttl=64 time=0.466 ms
64 bytes from 10.20.32.37: icmp_seq=1 ttl=64 time=0.520 ms (DUP!)
64 bytes from 10.20.32.24: icmp_seq=1 ttl=64 time=0.531 ms (DUP!)
64 bytes from 10.20.32.150: icmp_seq=2 ttl=64 time=0.067 ms
64 bytes from 10.20.32.37: icmp_seq=2 ttl=64 time=0.260 ms (DUP!)
64 bytes from 10.20.32.24: icmp_seq=2 ttl=64 time=0.357 ms (DUP!)