BFD tunnels showing as DOWN between one or more Edge TEPs and one or more Host TEPs
search cancel

BFD tunnels showing as DOWN between one or more Edge TEPs and one or more Host TEPs

book

Article ID: 415609

calendar_today

Updated On:

Products

VMware NSX VMware Cloud Foundation

Issue/Introduction

  • From the Edge page within the NSX UI, tunnels are seen as being DOWN between one or more Edge TEPs and one or more Host TEPs.
  • You see the Edge nodes in a Degraded status.  
  • All the tunnels observed to be down connect to TEPs belonging to ESXi Host TNs.


  • Looking at the Host TN page will show the same tunnels as being UP, and the Host TNs will also have a status of "Up".
  • TEP groups have been configured on your Edge nodes (For improved throughput and load-sharing). 
  • An event that triggers a full Controller sync would have preceded this tunnel down condition. See below for examples of events that would trigger the Controller sync:  (You may not even realize one of these Controller syncs had taken place):
    • MP upgrade
    • MP reboot
    • Host TN disconnect/reconnect

Environment

VMware NSX 4.2.1 - 4.2.3 (TEP groups introduced in 4.2.1)
VCF 9.0.x

Cause

  • This issue is caused by a bug which got introduced when TEP groups were first implemented in the 4.2.1 release.
  • Update operations performed on the TEP groups are sometimes deleting the BFD tunnels from Edge TEP to Host TEP.
  • The Controller going through a full sync triggers these TEP group updates which sometimes end up deleting the BFD tunnels.

Resolution

The fix for this issue is included in NSX versions 4.2.4, 4.2.3.1.1, and VCF 9.1

Workaround:

  • Restart cfgAgent on the Host TN where the peer TEP of the DOWN tunnel as seen from the Edge page connects: /etc/init.d/nsx-cfgagent restart
  • Put Edge node into, then out of Maintenance Mode