VMware NSX Edge routing fails due to the corresponding interface/address missing in zclients
search cancel

VMware NSX Edge routing fails due to the corresponding interface/address missing in zclients

book

Article ID: 377814

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

  • You recently rebooted an NSX edge node.
  • IPv4/IPv6 addresses are missing on the Edge node interfaces.
  • Routing fails due to the fact that the corresponding interface is not in sync with zebra and potentially other protocols daemons (BGP/staticd/OSPF/PIM).
  • One of the following could contribute to encountering the issue:
    1. Interfaces are present in kernel but missing in zebra.
    2. Interfaces are present in zebra but missing in zclients (BGP/staticd/OSPF/PIM).
    3. Interfaces are present but IPv4/IPv6 addresses are missing in zclients. 
  • In the NSX Edge node log /var/log/syslog you see entries similar to the following:

edge-1 bgpd 18625 - -  [EC 100663301] INTERFACE_STATE: Cannot find IF vti-804 in VRF 0
edge-1 ospfd 18824 - -  [EC 100663301] INTERFACE_STATE: Cannot find IF vti-804 in VRF 0

  • The below impact could be observed following the Reboot/upgrade of an NSX edge node.
    1. BGP peer neighborship will not be established. (OR)
    2. OSPF adjacency will not be established. (OR)
    3. Static routes will not be active in RIB. (OR)
    4. Multicast routing will get fail.
  • BGP specific logs during this problematic scenario:

    /var/log/frr/frr.log

BGP: [EC 33554465] 100.83.141.41 [FSM] Failure handling event BGP_Start in state Idle, prior events BGP_Start, BGP_Start, fd -1
BGP: [Event] Incoming BGP connection rejected from <IPv4-address-1> since it is not directly connected and TTL is 1

/var/log/syslog

nsx-edge-1 bgpd 22005 - - [EC 33554465] <IPv6-address-1> [FSM] Failure handling event BGP_Start in state Idle, prior events BGP_Start, BGP_Start, fd -1
nsx-edge-1 bgpd 22005 - - [EC 33554465] <IPv6-address-2> [FSM] Failure handling event BGP_Start in state Idle, prior events BGP_Start, BGP_Start, fd -1

 

  • Static Route specific logs during this problematic scenario:

/var/log/syslog

edge-1 staticd 33893 - - Static Route using downlink-8786 interface not installed because the interface does not exist in specified vrf
edge-1 staticd 33893 - - Static Route using downlink-8786 interface not installed because the interface does not exist in specified vrf

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX-T Data Center 3.x
VMware NSX 4.x

Cause

Routing controller reads interface kernel notifications via netlink socket.
During edge reboot, netlink notifications might get dropped/missed.
This leads to missed configuration on the interfaces in comparison to the kernel.

Resolution

Workaround:

To recover the missing IPv4/IPv6 addresses, the interfaces can be rescanned using the below command in the edge node CLI as admin user:


Edge> get logical-routers
Edge> vrf <vrf_id of SERVICE_ROUTER_TIER0>
Edge(tier0_sr)> set debug
Edge(tier0_sr)> start rescan interfaces
Edge(tier0_sr)> exit
 
NOTE: For VMware NSX versions prior to 4.1.2, please do not use the workaround on EVPN enabled setups.

Additional Information

For additional information related to specific scenarios impacting BGP neighbors failing to establish due to IPv4/IPv6 addresses missing on NSX-T edge node interfaces, please refer the following kb:

https://knowledge.broadcom.com/external/article?articleNumber=322523