Edge Node Degraded with TEP Tunnels Down Due to Duplicate IP Issue
search cancel

Edge Node Degraded with TEP Tunnels Down Due to Duplicate IP Issue

book

Article ID: 402595

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Edge and host nodes may show a status of "degraded" in the NSX UI (due to tunnel disruption between impacted nodes)
  • Tunnel endpoints for Edges nodes may show a status of "down" 
  • NSX edge configuration state, might be in failed state too.
  • No other alarms within NSX or vCenter are present
  • Edge syslog (/var/log/syslog) of the IP owning the duplicate IP may show logs similar to below. In the below we see the Edge logging its ARP replies sent for an IP it owns (192.168.0.2) from its interface with example MAC BB:BB:BB:BB:BB:BB. However the edge then sees an ARP reply for the same IP from mac CC:CC:CC:CC:CC.
    <Timestamp> <EdgeName> NSX 5206 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="arp" level="INFO"] ARP reply sent to AA:AA:AA:AA:AA:AA for 192.168.0.2 from BB:BB:BB:BB:BB:BB on lrouter port <LrPortId>
    <Timestamp>  <EdgeName> NSX 5206 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="arp" level="INFO"] ARP reply received for 192.168.0.2 from CC:CC:CC:CC:CC:CC on lrouter port <LrPortId>                     
    <Timestamp> <EdgeName> NSX 5206 SWITCHING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="neigh" tname="dp-learning3" level="ERROR" errorCode="EDG0400013"] Duplicate IP detected (<LrPortId>, 192.168.0.2) from CC:CC:CC:CC:CC:CC
  • In the NSX manager logs (/var/log/syslog), you may see the following log snippet.
    <Timestamp><NSX_Manager_hostname> NSX 73877 - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="l2AppUfo"] Merged duplicated Fib [VtepRecord [logicalId=LogicalSwitchId:[id=<LS_ID>, vni=67594], vtepIp=<vtep_ID>, vtepMac=<vtep_MAC>, transportNodeId=<TN_UUID>, vtepLabel=<vtepLabel>, segmentId=<Segment_ID>, encapType=TRANSPORT_BINDING_INVALID, mpEncapType=TRANSPORT_BINDING_GENEVE, runtimeEncapType=TRANSPORT_BINDING_INVALID, HaState=INVALID, timestamp=<Timestamp>, isTnConnectedfalse, isProxyRecord=false]] from added and deleted

Environment

VMware NSX

Cause

This issue can occur when a duplicate IP exists within the environment that is attempting to be used by the Edges for their TEP configuration. When this occurs traffic going to the duplicated IP may reach the Edge TEP or the other IP owner, leading to tunnel instability for the Edge node.

Resolution

Ensure the IPs used by the Edge TEPs are not duplicated within the environment. This is not an NSX issue per se but a network misconfiguration.

Additional Information

  • Troubleshooting of the duplicate IP owner is best achieved through packet captures in the physical network. Powering off the Edge may allow the duplicated IP to be pinged and the ICMP traced if the MAC can not be attributed to a VM in the environment. 
  • Checking if NSX has an awareness of the MAC can be done in the universal search if the MAC is placed in double quotes eg. "AA:BB:CC:DD:EE:FF" from the NSX GUI.
  • Checking whether any VM running in a specific vCenter owns the MAC can be performed in the vCenter's Postgres by using the following command:
    psql -d VCDB -U postgres -c "select link_peer,mac_address from vpx_dvport where mac_address='<MAC>';"

Similar Issues / KBs: