Expected behavior
Symptoms:
ARP Snooping = enabled
VMware tool = disabled
TOFU = disabled
ARP Binding Limit = 1
ARP ND Binding Limit Timeout = 10 (Minutes)
VM1 - web-sv-02a
IP - 172.16.10.18
VNI - 67589
Mac - 00:50:56:ae:92:cd
Host - sa-esxi-05.vclass.local
VM2 - web-sv-01a
IP - 172.16.10.19
VNI - 67589
Mac - 00:50:56:ae:11:91
Host- sa-esxi-04.vclass.local
ARP entries on both the hosts respectively
sa-esxi-05.vclass.local> get logical-switch 67589 arp-table
Logical Switch ARP Table
--------------------------------------------------
Host Kernel Entry
==================================================
IP MAC Flags
LCP Remote Entry
==================================================
IP MAC
172.16.10.19 00:50:56:ae:11:91
LCP Local Entry
==================================================
IP MAC
172.16.10.18 00:50:56:ae:92:cd
sa-esxi-04.vclass.local> get logical-switch 67589 arp-table
Logical Switch ARP Table
--------------------------------------------------
Host Kernel Entry
==================================================
IP MAC Flags
LCP Remote Entry
==================================================
IP MAC
172.16.10.18 00:50:56:ae:92:cd
LCP Local Entry
==================================================
IP MAC
172.16.10.19 00:50:56:ae:11:91
With changed IP as below -- (This is the scenario where both VM1 and VM2 IP's are changed respectively as below)
VM1 172.16.10.18 - 172.16.10.19 (Changed IP) - 00:50:56:ae:92:cd
VM2 172.16.10.19 - 172.16.10.20 (Changed IP) - 00:50:56:ae:11:91
Now initiate the traffic by pinging default G/W from respective VM's.
Below are updated ARP entries on both the hosts respectively --
sa-esxi-05.vclass.local> get logical-switch 67589 arp-table
Logical Switch ARP Table
--------------------------------------------------
Host Kernel Entry
==================================================
IP MAC Flags
LCP Remote Entry
==================================================
IP MAC
172.16.10.20 00:50:56:ae:11:91
LCP Local Entry
==================================================
IP MAC
172.16.10.18 00:50:56:ae:92:cd --->> Stale entry as this shall take 10 minutes to expire according to ARP ND Binding Limit Timeout = 10
172.16.10.19 00:50:56:ae:92:cd --->> Correct entry
sa-esxi-04.vclass.local> get logical-switch 67589 arp-table
Logical Switch ARP Table
--------------------------------------------------
Host Kernel Entry
==================================================
IP MAC Flags
LCP Remote Entry
==================================================
IP MAC
172.16.10.18 00:50:56:ae:92:cd --->> Stale entry as this shall take 10 minutes to expire according to ARP ND Binding Limit Timeout = 10
172.16.10.19 00:50:56:ae:92:cd --->> Correct entry
LCP Local Entry
==================================================
IP MAC
172.16.10.19 00:50:56:ae:11:91 --->> Stale entry as this shall take 10 minutes to expire according to ARP ND Binding Limit Timeout = 10
172.16.10.20 00:50:56:ae:11:91 --->> Correct entry
With these ARP entries on respective hosts, VM traffic will be impacted until the old ARP entries are expired.
The behavior is expected, i.e. we have to wait until the old IP entry expires, in the case of TOEU.
With the scenario shown above, Since the same MAC exists for 2 IP, if the users ping 172.16.10.20 from one of the vm on host 4, the ping will not be completed and VM traffic will be impacted until the old ARP entries are expired.
This is an expected behavior where Segment profile is applied on a Segment with ARP ND Binding Limit Timeout = 10 (Minutes) and TOEU.
This value can be set between 5-120 Minutes.
Workaround:
The workaround in order to avoid the 10 or 5 minutes waiting do the following before the user changes IP:
1. On ESX hosts, where 67589 is the VNI:
net-vdl2 -l -s nsxvswitch -n 67589 -M ARP -r
2. On VM:
ip neighbor flush all
Once the user does the above, the user can change IP and start to ping.
NOTE: The issue would also exist for Network functions virtualization (NFV) VM with HA functionality. GARP's during HA failover clears the old IP discovery binding.