Intermittent BGP flapping between the NSX Edge gateways and their BGP neighbors. The BGP session drops and then re-establishes automatically within a short period (approximately within a minute)
VMware NSX
VMware NSX-T Data Center
The issue is caused by excessively aggressive BGP timer settings.
Upon investigation of the Edge logs, the BGP neighborship is going down due to a Hold Timer Expired event.2025-12-18T07:03:42.764Z NSX 11636 - [nsx@6876 comp="nsx-edge" s2comp="nsx-monitoring" tid="11673" level="INFO"] Error message: In Router 310#####-####-####-b53b-########4b58, BGP neighbor e60#####-####-4af7-####-########9434(10.###.##.139) is down. Reason: BGP Notification send {Hold Timer Expired}.
Analysis of the running configuration reveals the following timer values:
Example:
Keepalive Timer: 1 second
Hold Timer: 3 seconds
While BGP allows for custom timers, a 3-second hold time is extremely aggressive. With this configuration, if a single Keepalive packet is lost or delayed by just 3 seconds (due to transient network congestion, high CPU on the peer, or minor path latency), the BGP session will immediately tear down.
To resolve the intermittent flapping, increase the BGP timers to allow for standard network variance.
Recommended Action: Modify the BGP Neighbor configuration to use the NSX default values or less aggressive custom values.
Recommended (Default) Values:
Keepalive Timer: 60 seconds
Hold Down Timer: 180 seconds
By increasing these timers, the BGP session will be able to tolerate minor packet drops or latency without resetting the connection.
Reference: For detailed steps on configuring BGP and timers in NSX, please refer to the administration guide: VMware NSX 4.0 - Configure BGP