VM Connectivity Failure on Specific VLAN Due to ARP Gateway Conflict
search cancel

VM Connectivity Failure on Specific VLAN Due to ARP Gateway Conflict

book

Article ID: 440216

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • While using your vSphere environment, some VMs on a specific VLAN cannot ping the gateway while others on the same host/VLAN can.
  • When investigating you see that the MAC address for the '_gateway'or 'ipaddress of gateway' ARP response  reported by 'arp -a' on the VM differs between the working and non-working VMs that have the same default gateway:

VM A's output might look like:

_gateway   192.##.##.1 at <04:50:56:##:##:ff> [ether] on <interfacename>

VM B's output might look like:

Interface: <192.0.2.##> -- <hexnumber>
Internet Address    Physical Address 
192.##.##.1           04:50:56:##:##:aa

Environment

  • vSphere
  • NSX
  • VCF NSX
  • OpenShift

Cause

Two different VMs/devices on your network are broadcasting arp responses for the gateway IP indicating an IP conflict between the two devices. VM's grab the fastest response.  The VM's that get the MAC address from the actual gateway work. The ones getting the response from the other device will fail. 

Resolution

  1. Disconnect the offending device
  2. Reconfigure so it no longer has the same IP as the default gateway.
  3. This is not a VMware issue, but it is findable using the 'arp -a' command on the VMs to compare the mac addresses of the gateway IPs between working and non-working VMs.

Additional Information

Notes:

  • After addressing the main issue, you may need to take the VMs' NICs down and back up to clear the arp cache on the non-working VMs.
  • If you are in a version of NSX that is affected by the JDK bug, and your managers have been up longer than a week, this scenario can cause you to hit the bug sooner due to increased numbers of updates, a rolling reboot of the managers is recommended at this point. See NSX is Impacted by JDK-8330017: ForkJoinPool Stops ...
  • Example outputs above are from a Linux box for VM A, and a Windows box for VM B.
  •  In the original case  the issue is caused by ARP Spoofing/Conflict where a second device (OpenShift) broadcasts the gateway IP.
  • Collecting arp info from NSX.