Application has crashed on NSX Edge node alarm with core dump generated on Edge VM

search cancel

Application has crashed on NSX Edge node alarm with core dump generated on Edge VM

book

Article ID: 377121

calendar_today

Updated On: 10-16-2024

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

Core dump file is observed in NSX Edge node's /var/log/core:
core.pimd.1722861488.<process-id>.160.X.gz
In Edge's /var/log/dumpcore.log, there is an indication of crash, e.g.:
1 2024-08-30T09:26:14.234Z <NSX_Edge_Hostname> NSX 3823286 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="INFO"] Core dump generation received by process: 3139677 [pimd]
1 2024-08-30T09:26:14.631Z <NSX_Edge_Hostname> NSX 3823286 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="INFO"] Deleted extra core files: /var/log/core/core.pimd.1724927784.3129177.160.6.gz
1 2024-08-30T09:26:14.633Z <NSX_Edge_Hostname> NSX 3823286 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.pimd.1725009974.3139677.160.6.gz
At the time of crash, in Edge's /var/log/frr/frr.log, logs similar to sample below may be observed:
2024/08/30 09:26:14.078379 PIM: pim_mroute_msg: pim kernel upcall WHOLEPKT type=3 ip_p=0 from fd=10 for (S,G)=(<source_vm_IP>,<multicast_group_IP>) on pimreg vifi=0 size=10000
2024/08/30 09:26:14.078383 PIM: pim_ecmp_nexthop_search: (10.10.140.1,239.255.0.0)(default) current nexthop uplink-313 is valid, skipping new path selection
2024/08/30 09:26:14.896700 ZEBRA: connection closed socket [50]
2024/08/30 09:26:14.896726 ZEBRA: [EC 4043309117] Client 'system' encountered an error and is shutting down.
2024/08/30 09:26:14.896774 ZEBRA: Closing client 'system'
2024/08/30 09:26:14.896803 ZEBRA: connection closed socket [45]
2024/08/30 09:26:14.896820 ZEBRA: [EC 4043309117] Client 'pim' encountered an error and is shutting down.

Environment

NSX-T Data Center 3.x
NSX 4.x
VMs connected to NSX networks are a part of multicast group

Cause

VM connected to NSX segment will send a large multicast (UDP) packet of size 10000B or more to the destination outside of NSX.
This packet will be dropped by the Edge node.
Due to buffer overrun, caused by the big packet, FRR service on the Edge may crash and restart.

Resolution

This is a known issue, which will be fixed in future product release.

Currently, there is no workaround.

Feedback

Was this article helpful?

thumb_up Yes

thumb_down No