BFD tunnels showing as DOWN between one or more Edge TEPs and one or more Host TEPs
book
Article ID: 415609
calendar_today
Updated On:
Products
VMware NSXVMware Cloud Foundation
Issue/Introduction
From the Edge page within the NSX UI, tunnels are seen as being DOWN between one or more Edge TEPs and one or more Host TEPs.
You see the Edge nodes in a Degraded status.
All the tunnels observed to be down connect to TEPs belonging to ESXi Host TNs.
Looking at the Host TN page will show the same tunnels as being UP, and the Host TNs will also have a status of "Up".
TEP groups have been configured on your Edge nodes (For improved throughput and load-sharing).
An event that triggers a full Controller sync would have preceded this tunnel down condition. See below for examples of events that would trigger the Controller sync: (You may not even realize one of these Controller syncs had taken place):
MP upgrade
MP reboot
Host TN disconnect/reconnect
Environment
VMware NSX 4.2.1 - 4.2.3 (TEP groups introduced in 4.2.1) VCF 9.0.x
Cause
This issue is caused by a bug which got introduced when TEP groups were first implemented in the 4.2.1 release.
Update operations performed on the TEP groups are sometimes deleting the BFD tunnels from Edge TEP to Host TEP.
The Controller going through a full sync triggers these TEP group updates which sometimes end up deleting the BFD tunnels.
Resolution
The fix for this issue is included in NSX versions 4.2.4, 4.2.3.1.1, and VCF 9.1
Workaround:
Restart cfgAgent on the Host TN where the peer TEP of the DOWN tunnel as seen from the Edge page connects: /etc/init.d/nsx-cfgagent restart