In VMware NSX 4.1.X, a coredump was generated from the NSX edge node in the path: /var/log/core/core.dp-fp:<process-ID.timestamp>.gz
A similar log entry maybe found in the NSX edge syslog:
2024-10-25T16:49:41.862Z <NSX-Edge-Node> kernel - - - [44513.610835] grsec: Segmentation fault occurred at 0000000000000000 in /opt/vmware/nsx-edge/sbin/datapathd[dp-fp:26:355289] uid/euid:0/0 gid/egid:124/124, parent /opt/vmware/edge/dpd/entrypoint.sh[entrypoint.sh:354038] uid/euid:0/0 gid/egid:124/124
2024-10-25T16:49:42.543Z <NSX-Edge-Node> NSX 357574 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.dp-fp:26.1729874981.354089.0.11.gz
Also, in the syslog, there maybe log entries for routing domain not found:
2024-10-25T18:09:05.082Z <NSX-Edge> datapath-systemd-helper 385708 - - 2024-10-25T18:09:05Z datapathd 385838 routing-domain tname="dp-ipc67" [ERROR] Routing domain <routing-domain-UUID> does not exist errorCode="EDG0400056"
2024-10-25T17:53:34.470Z <NSX-Edge> NSX 385838 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="dpc-pb" tname="dp-ipc67" level="INFO"] Processing full config msg version 38 from nsx-agent
VMware NSX 4.1.x
This is caused by the T1 gateway assigned to the Edge nodes has different Transport Zones than the T0 it is connecting to. Thus the Edge node is pulling a full configuration from the NSX manager that can lead to a race condition which can crash the dataplane service.
The race condition is resolved in VMware NSX 4.2.1 and above, however, the root cause for this issue is the T1 connecting to a wrong T0.
Customers should fix their T1 gateway to resolve this issue.