To provide education on troubleshooting BGP on NSX-T edges
BGP neighbor is showing down, unable to route to neighbor or connection never up from NSX-T
You will also see not see the default route
There are going to be a couple of different ways to diagnose this in the UI and in the Edge Node SSH
The possible causes of a BGP neighbor being down are
State hung up
Service hung up
Host-level issue
Edge-level issue
Infrastructure-related issues between the ESX Host and BGP neighbor
Externally the BGP neighbor is having issues
To determine the cause you will have to find the resolution.
Clearing the state at each point in the connection will tell you if the cause is state because it will come up when the state is cleared if that is the cause.
Rebooting the service will clear the connection and refresh the state throughout NSX-T for BGP.
Moving to another host can help to determine if there is a host-level issue.
Rebooting the edge node will refresh everything in NSX-T. Many times this will provide a quick resolution but will not give good data for an investigation into the root cause because the root cause would be obtained by going through each point in the BGP connection while the connection was down and restarting services and states at each point to determine where things are actually getting hung up if you don’t have that data because BGP came back up after reboot of edge node root cause cannot be provided.
Rebooting the Active edge node in Active/Standby will cause the Standby node to move into Active.
There can also be issues with the connection throughout the switching infrastructure and on the core router handling the BGP neighbor.
It is important to know if this is recurring or not. If this is an issue that has only happened once, and it has been up for quite some time it is likely something got hung up and a refresh becomes necessary at some point in the connection.