BGP connection from the Edge VM on VLAN backed segment to the peer router is disrupted due to an incorrect uplink teaming policy.
search cancel

BGP connection from the Edge VM on VLAN backed segment to the peer router is disrupted due to an incorrect uplink teaming policy.

book

Article ID: 380193

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction


BGP connections from the Edge VM linked to a VLAN-backed segment are facing disruptions because the MAC addresses of the peer routers cannot be resolved. ARP replies are not being received on the ESX uplinks.

Relevant logs to look: Syslog from Edge VM shows whether BGP is down or not.

2024-10-14T15:18:13.104Z nsxmgr-07.com NSX 7 FABRIC [nsx@6876 comp="nsx-edge" subcomp="rcpm" s2comp="routing-service-realization" level="INFO"] Alarm for BGP ##:##:##:##::1, peer_uuid: ########-####-####-####-########1ba21f125abc in SR: ########-####-####-####-########-d4a3dfbdbf09, state=BGP_DOWN
2024-10-14T15:18:13.201Z nsxmgr-07.com NSX 7 FABRIC [nsx@6876 comp="nsx-edge" subcomp="rcpm" s2comp="routing-service-realization" level="INFO"] Alarm for BGP 10.249.##.##, peer_uuid: ########-####-####-####-########-655d050ec609 in SR: ########-####-####-####-########-d4a3dfbdbf09, state=BGP_DOWN
2024-10-14T15:18:13.273Z ensxmgr-07.com NSX 7 FABRIC [nsx@6876 comp="nsx-edge" subcomp="rcpm" s2comp="routing-service-realization" level="INFO"] Alarm for BGP 10.249.##.##, peer_uuid: ########-####-####-####-########-24067ec95f74 in SR: ########-####-####-####-########-d4a3dfbdbf09, state=BGP_DOWN
2024-10-14T15:18:13.751Znsxmgr-07.com NSX 7 FABRIC [nsx@6876 comp="nsx-edge" subcomp="rcpm" s2comp="routing-service-realization" level="INFO"] Alarm for BGP ##:##:##:##::1, peer_uuid: ########-####-####-####-########-8cec-2e94b93e1d1e in SR: ########-####-####-####-########-d4a3dfbdbf09, state=BGP_DOWN

Environment

VMware NSX

Cause

Both Edge vNIC ports were originally assigned to individual ESX uplinks/vmnics using two different vSphere distributed virtual port groups (DVPG's) with opposing teaming policies (dvpg1 -> Active: uplink1, dvpg2 -> Active: uplink2). This setup ensures that BGP traffic is directed to the correct peer ToR. This design is recommended because if pNIC1 fails, vNIC1 is not expected to failover to pNIC2. Instead, Edge routing will recover by selecting vNIC2, which is linked to the alternative uplink interface within the Edge, based on BGP failure over vNIC1/pNIC1.

When a customer migrates the Edge vNICs from vSphere DVPG's to NSX VLAN-backed segments (our recommended configuration), they may only create a single replacement segment, relying solely on the default teaming policy.  As a result, this default DVS teaming (source port ID based) would be applied to the one segment created. Consequently, the vNICs would be mapped to the incorrect uplinks in a non-deterministic manner, leading to BGP failures resulting from next-hop IP ARP resolution issues. This would occur due to the underlying ToR VLAN/IP configuration which may not align with the next-hop VLAN/IP due to this incorrect uplink mapping.

Resolution

Use separate segments for each uplink and set up a named teaming policy for these segments to ensure a proper mapping of Edge VNICs to uplinks and ToR.

Additional Information

Maintenance Window Guidelines : Reconfigure NSX uplink profile.
Doc- https://docs.vmware.com/en/VMware-NSX/4.2/installation/GUID-491A66DE-F5C9-4FC2-AA57-43589E49806F.html