Edge BGP neighborship lost with TOR as well as Default Tier-0 SR routes was lost, north-south communication lost.
search cancel

Edge BGP neighborship lost with TOR as well as Default Tier-0 SR routes was lost, north-south communication lost.

book

Article ID: 411909

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • BFD on both the tunnel and management was lost.
  • Management and controller connectivity were lost.
  • Tier-1 SR routes were removed.

20##-0#-2#T1#:3#:4#.7##Z set lswitch 50e1####-####-497c-####-5107####fc3f uplink device NULL
20##-0#-2#T1#:3#:4#.7##Z Self Node 1c8e####-283a-####-####-0050####b50a status changed from Up to Down (RTEP device down)

  • Then the RTEP lrport is deleted, node state is back to Up (Routing Down), but LCP doesn't reset op_state_up for T0.

20##-0#-2#T1#:3#:4#.9##Z Delete local rtep for lrport e58####7-####-446d-####-e88d####16f3 10.##.##.##

Environment

VMware NSX Datacenter.

Cause

RTEP is first unconfigured (rtep_ls->set_uplink_device(null). This brings down the node status and sets T0 SR op_state to false.

But HA FSM is not updated, and the state stays in ACTIVE. Subsequent removal of RTEP lrport does not trigger HA FSM update either (it's already ACTIVE), and recalculation of Tier-0 SR op_state did not happen.

Resolution

Workaround: 

Restarting LCP using the below command will fix this problem when it happens.

restart service local-controller


Resolution:

NSX release 3.2.0 and later.