BFD Tunnels between ESXi hosts and Edges are intermittently going down.
On the ESXi host, issuing the command nsxdp-cli bfd sessions list, we see then tunnels stuck at Init and Down states.
Running the command nsxdp-cli bfd stats get on the ESXi host shows unidirectional packets increment
Running the GET API "/nsxapi/api/v1/transport-nodes/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx/tunnels", shows the error Control_Detection_Time_Expired
In the NSX UI, the BFD Diagnostic Code for the down tunnels shows "Neighbor Signaled Session Down".
The Gateway of the ESXi hosts is a virtual router connected to the same vDS as the NSX Edges & NSX VMK TEPs on the hosts.
VMware NSX
When the ESXi TN host’s default gateway is a virtual router connected to the same vDS as the Edge nodes and the host TEP vmkernel interfaces of the hosts, the BFD packets transmitted from the vmkernel interfaces will be forwarded directly from the virtual router (the host gateway) to the Edge node vNICs. Because these packets bypass the ESXi uplink ports on the vDS, this behavior can lead to incorrect or unexpected BFD operation.
This is a condition that may occur in a VMware NSX environment. To resolve the issue, move the virtual router VM to a different vDS.
If you are contacting Broadcom support about this issue, please provide the following:
Handling Log Bundles for offline review with Broadcom support: