NSX BGP sessions flap for VRF due to TCP port reuse
search cancel

NSX BGP sessions flap for VRF due to TCP port reuse

book

Article ID: 408123

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  •  BGP sessions are configured between same pair of source and destination IP on more than 1 VRFs on a given edge.

  • At least one of these session is configured incorrectly with mismatching AS number. This flap is expected.

  • The session which is configured correctly and established is not expected to flap. However, when the incorrectly configured session tries to initiate TCP connection with the same source port as the established session, it leads to flapping of the stable session.
  • Packet captures and Edge Node logs may show BGP OPEN Message Error/Bad Peer AS notifications and FSM unexpected packet received errors.
  • Log lines similar to the below are encountered on the NSX Edge Node in /var/log/syslog:

    YYYY-MM-DDTHH:MM:SS.SSSZ <EdgeHostname> bgpd <PID> - - %NOTIFICATION: received from neighbor X.X.X.X 2/2 (OPEN Message Error/Bad Peer AS) X bytes
    YYYY-MM-DDTHH:MM:SS.SSSZ <EdgeHostname> bgpd <PID> - - [EC XXXXXXXX] X.X.X.X [FSM] unexpected packet received in state OpenSent
    YYYY-MM-DDTHH:MM:SS.SSSZ <EdgeHostname> bgpd <PID> - - %NOTIFICATION: sent to neighbor X.X.X.X 5/1 (Neighbor Events Error/Receive Unexpected Message in OpenSent State) X bytes
    YYYY-MM-DDTHH:MM:SS.SSSZ <EdgeHostname> bgpd <PID> - - BGP: X.X.X.X [FSM] Timer (holdtime timer expire)
    YYYY-MM-DDTHH:MM:SS.SSSZ <EdgeHostname> bgpd <PID> - - BGP: X.X.X.X [FSM] Hold_Timer_expired (Established->Clearing)
  • During the issue, packet captures indicate the unexpected reuse of a TCP port from an existing, stable BGP session by a newly configured BGP session for a different VRF on the same Edge Node.

Environment

 VMware NSX

Cause

  • When the incorrectly configured session re-attempts to establish the connection, the operating system kernel on the Edge Node may reuse TCP source ports that are already in use by other stable BGP sessions on different VRFs.

  • This selection of TCP source port is random.

  • During these re-attempts, the operating system kernel on the Edge Node may reuse TCP source ports for the new, unstable BGP connection that are already in use by other stable BGP sessions on different VRFs.

  • Although each VRF is logically separate, an unexpected BGP OPEN message (due to misconfiguration) for one VRF, if it reuses a TCP port , can disrupt an otherwise stable BGP session in another VRF on the same Edge Node.

 

Resolution

  • This is a known issue impacting VMware NSX.


     Workaround

  • To mitigate this issue, ensure that all BGP peering configurations for VRFs on an edge node and underlying network connectivity (e.g., VLANs, IP addressing) are accurate before establishing BGP sessions.

  • Proactive Correction of Unstable Sessions: If a BGP session is persistently flapping due to misconfiguration (e.g., incorrect ASN), correct the configuration immediately. Deleting and re-creating the BGP session with the correct parameters can prevent it from destabilizing other active sessions on the same Edge Node.