Application has crashed on NSX Edge node alarm with core dump generated on Edge VM
search cancel

Application has crashed on NSX Edge node alarm with core dump generated on Edge VM

book

Article ID: 377121

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

  • Core dump file is observed in NSX Edge node's /var/log/core:
    core.pimd.1722861488.<process-id>.160.X.gz
  • In Edge's /var/log/dumpcore.log, there is an indication of crash, e.g.:
    1 2024-08-30T09:26:14.234Z <NSX_Edge_Hostname> NSX 3823286 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="INFO"] Core dump generation received by process: 3139677 [pimd]
    1 2024-08-30T09:26:14.631Z <NSX_Edge_Hostname> NSX 3823286 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="INFO"] Deleted extra core files: /var/log/core/core.pimd.1724927784.3129177.160.6.gz
    1 2024-08-30T09:26:14.633Z <NSX_Edge_Hostname> NSX 3823286 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.pimd.1725009974.3139677.160.6.gz
  • At the time of crash, in Edge's /var/log/frr/frr.log, logs similar to sample below may be observed:
    2024/08/30 09:26:14.078379 PIM: pim_mroute_msg: pim kernel upcall WHOLEPKT type=3 ip_p=0 from fd=10 for (S,G)=(<source_vm_IP>,<multicast_group_IP>) on pimreg vifi=0 size=10000
    2024/08/30 09:26:14.078383 PIM: pim_ecmp_nexthop_search: (10.10.140.1,239.255.0.0)(default) current nexthop uplink-313 is valid, skipping new path selection
    2024/08/30 09:26:14.896700 ZEBRA: connection closed socket [50]
    2024/08/30 09:26:14.896726 ZEBRA: [EC 4043309117] Client 'system' encountered an error and is shutting down.
    2024/08/30 09:26:14.896774 ZEBRA: Closing client 'system'
    2024/08/30 09:26:14.896803 ZEBRA: connection closed socket [45]
    2024/08/30 09:26:14.896820 ZEBRA: [EC 4043309117] Client 'pim' encountered an error and is shutting down.

Environment

  • NSX-T Data Center 3.x
  • NSX 4.x
  • VMs connected to NSX networks are a part of multicast group

Cause

  • VM connected to NSX segment will send a large multicast (UDP) packet of size 10000B or more to the destination outside of NSX. 
  • This packet will be dropped by the Edge node.
  • Due to buffer overrun, caused by the big packet, FRR service on the Edge may crash and restart.

Resolution

This is a known issue, which will be fixed in future product release. 

Currently, there is no workaround.