<Date>T<Time>Z <Manager hostname> NSX 5443 MONITORING [nsx@6876 alarmId="<Alarm ID>" alarmState="OPEN" comp="nsx-manager" entId="<Ent ID>" errorCode="MP701099" eventFeatureName="routing" eventSev="HIGH" eventState="On" eventType="bgp_down" level="ERROR" nodeId="<Node UUID>" subcomp="monitoring"] In Router <Router UUID>, BGP neighbor <Neighbor ID> is down. Reason: Network or config error.
#get bgp neighbor summary
<Date>T<Time>Z <Hostname> NSX 591121 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.bgpd.1697551443.5514.160.6.gz
/var/log/vmware/top-mem.log
shows the BGP process has high memory usage and is growing linearly over timeWed Sep ## 16:30:06 UTC 202#
15054 frr 20 0 7427796 6.944g 2520 R 200.0 22.2 127860:14 15054 /usr/lib/frr/bgpd -d -A 127.0.0.1
Wed Sep ## 17:30:07 UTC 202#
15054 frr 20 0 7427796 6.965g 2524 R 200.0 22.3 127972:59 15054 /usr/lib/frr/bgpd -d -A 127.0.0.1
Wed Sep ## 18:30:07 UTC 202#
15054 frr 20 0 7427796 6.981g 2524 R 200.0 22.4 128056:36 15054 /usr/lib/frr/bgpd -d -A 127.0.0.1
VMware NSX 4.x
VMware NSX-T Data Center 3.2.x
This issue occurs when the main BGP thread gets stuck in a loop after referencing a stale pointer.
The BGP process will eventually crash out of memory and automatically be restarted. Any BGP peering that was down will come back up once the service restarts.
This issue is resolved in VMware NSX 4.2.0, available at Broadcom downloads.
If you are having difficulty finding and download software, please review the Download Broadcom products and software KB.
Workaround:
If an Edge is in a broken state with BGP down, the following steps will recover it in a planned manner:
It is possible the condition could reoccur at a future time.