VMs may have intermittent L3 connectivity issues if a segment gateway IP is reused on different segments
search cancel

VMs may have intermittent L3 connectivity issues if a segment gateway IP is reused on different segments

book

Article ID: 327296

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
The following conditions are met
  • VMs experience intermittent connectivity issues when communicating with other networks
  • Segment default gateway IPs are reused on segments attached to different router gateways
  • When connectivity problems are present the VM has learned the wrong MAC for its default gateway IP. The default gateway MAC should be in the form 02:50:56:xx:xx:xx
  • On the ESXi host where the VM is running net-vdl2 -M ip -s <NVDS_Name> -n <VNI> does not show the gateway IP in the protected IP list.
  Expected output when a VM on segment VNI 70001 is running on this ESXi host

  # net-vdl2 -M ip -s N-VDS -n 70001
   IP entry count: 1
        IP: 192.168.5.1     << Segment default gateway IP
        MAC: 02:50:56:56:44:52
        Flags: 1(PROTECTED) << Protected
        vxlanID: 70001

 
  • The ESXi host vmkernel.log may have logging similar to these extracts
2019-10-21T22:43:52.647Z cpu13:2156327)WARNING: nsx_vdrb: VdrProcessRouteUpdateMessageAddRouteIpv6:314: [nsx@6876 comp="nsx-esx"]CP:Route Add: Entry 0: Nexthop not found in the route entry skipping entry
2019-10-21T22:43:52.648Z cpu29:2157923)WARNING: nsx_vdrb: VdrRefVxlanLifDynamic:615: [nsx@6876 comp="nsx-esx"]LIF:[I:0x4,L:5d77ffe3-87c9-47cb-bde6-a8ecab69e20a,LI:71681,LT:2] Failed static ARP addition for dvs=N-VDS, portId[67158024], VNI[70001]
2019-10-21T22:43:53.650Z cpu29:2157923)WARNING: nsx_vdrb: VdrLifIpOpDeferredProc:1593: [0x4:8e17fcc1-44b7-441d-ad3e-de0d3818adc1]: Failed static ND addition for dvs=N-VDS, portId[67158024], VNI[70001]: Operation already in progress

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 2.x

Cause

ESXi hosts register the segment default gateway IP and corresponding MAC as static protected entries so that they cannot be overwritten.
Duplicate IP detection logic did not account for the possibility of a LIF IP being reused on a different VNI on a different router gateway.
When a default gateway IP is reused on another router gateway segment, it results in only one of the segment gateway IPs being added to the VDL2 IP table.
When the segment default gateway IP is not in the protected IP table, ARP poisoning can happen breaking L3 communication for VMs on that segment.

Resolution

This issue is resolved in:

VMware NSX-T Data Center 2.4.3, available at VMware Downloads.

VMware NSX-T Data Center 2.5.1, available at VMware Downloads.

VMware NSX-T Data Center 3.0, available at VMware Downloads.




Workaround:
If an upgrade is not possible the following workaround can be used.
Identify the segments which have duplicate gateway IPs
The IPs should be changed so that they are unique, for example use .1 for one segment and .254 for the other segment
Under Networking -> Segments -> Edit Segment, edit the subnet and change the Gateway
If the gateway IP is not being changed for one segment, it should still be refreshed to clear any incorrect data from the dataplane
In that case change the IP to a temporary IP and change it back e.g. .1 to .130 and back to .1
Ensure the VMs on the impacted segment have their default gateway IPs updated.
Verify the workaround by running net-vdl2 -M ip -s <NVDS_Name> -n <VNI> on an ESXi host running a VM on that VNI and ensure the gateway IP is listed and is protected.