Edge Node bgpd application crash in NSX-T 4.1.1 and 4.1.2.x
search cancel

Edge Node bgpd application crash in NSX-T 4.1.1 and 4.1.2.x

book

Article ID: 312636

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
  • You are running NSX-T 4.1.1 or 4.1.2.x
  • You have encountered alarms similar to the below reporting application crashes relating to edge nodes:
Application on NSX node <node_name> has crashed. The number of core files found is x. Collect the Support Bundle including core dump files and contact VMware Support team.
 
  • Core dumps for the BGP daemon can be seen on the reported Edge Node:
-rw-r--r--  1 root root 970K Mar 26 16:29 core.bgpd.xxxxxxx.xxxxx.xxx.gz
 
  • Entries similar to the below will be encountered on the NSX Edge node in /var/log/syslog:
2024-03-25T20:19:49.192Z edge-node-1 bgpd 10582 - -  bgp_advertise_clean_subgroup+0x3f     10322a97aa4f     796c58d59000 /usr/lib/frr/bgpd (mapped at 0x10322a8c9000)
.
.
2024-03-25T20:19:49.849Z edge-node-1 bgpd 10582 - -  subgroup_process_announce_selected+0x350     10322a95a430     796c58d59040 /usr/lib/frr/bgpd (mapped at 0x10322a8c9000)


Environment

VMware NSX

Cause

This issue occurs due to a stale entry following a clean up operation in BGP sub-groups in the very rare occurrence when a withdraw update operation is taking place simultaneously. When the service attempts to access the stale entry, the crash occurs.

Resolution

This is a known issue currently impacting NSX-T. The issue is fixed from VMware NSX 4.2 onwards

Workaround:
Currently there is no workaround to this issue and it very rarely encountered, if you encounter this issue, please raise a support request with VMware.

Once the logs have been gathered to workaround the issue and clear the alarm:

  • On NSX appliance nodes, the following nsxcli command can be used to remove core and heap dump files:

nsxcli> del core-dump all
or
nsxcli> del core-dump <core-dump-file>