NSX Manager has reported an application crash on an ESXi host.
The host /var/log/vmkernel.log shows a crash dump file named nsx-exporter-zdump.000 was generated. A sample log snippet is provided below for reference.
YYYY-MM-DDTHH:MM:SS.SSSZ In(182) vmkernel: cpu36:#####76)User: ####: nsx-exporter: wantCoreDump:nsx-exporter signal:6 exitCode:0 coredump:enabledYYYY-MM-DDTHH:MM:SS.SSSZ In(182) vmkernel: cpu36:#####76)UserDump: ####: nsx-exporter: Dumping cartel #####92 (from world #####76) to file /var/core/nsx-exporter-zdump.000 ...YYYY-MM-DDTHH:MM:SS.SSSZ In(182) vmkernel: cpu36:#####76)UserDump: ####: nsx-exporter: Userworld(nsx-exporter) coredump complete.
From the /var/log/nsx-syslog around the time the core dump was generated shows multiple pNIC bonding and teaming related events occurring shortly before the crash. Please refer to the sample log snippet below for additional context.
nsx-syslog.0:YYYY-MM-DDTHH:MM:SS.SSSZ In(182) nsx-exporter[#####92]: NSX #####92 - [nsx@##76 comp="nsx-esx" subcomp="agg-service" tid="#####10" level="INFO"] [Heatmap][KCPGetPSTeam] Got 0 uplinks and 0 LAGsnsx-syslog.0:YYYY-MM-DDTHH:MM:SS.SSSZ In(182) nsx-exporter[#####92]: NSX #####92 - [nsx@##76 comp="nsx-esx" subcomp="agg-service" tid="#####10" level="INFO"] [Heatmap][KCPGetPSTeam] 0 pnics 0 lagsnsx-syslog.0:YYYY-MM-DDTHH:MM:SS.SSSZ In(182) nsx-exporter[#####2]: NSX #####92 - [nsx@##76 comp="nsx-esx" subcomp="mpa-client" tid="#####11" level="INFO"] [TransportNodeStatusVertical] RespondMsg : Sent response with type (com.vmware.nsx.management.agg_service.transport_node_status.PnicBondsMsg) corelationId (########-####-####-###############59) trackingId (########-####-####-###############59)nsx-syslog.0:YYYY-MM-DDTHH:MM:SS.SSSZ In(182) nsx-exporter[#####036]: NSX #####036 - [nsx@##76 comp="nsx-esx" subcomp="agg-service" tid="#####036" level="INFO"] App (TransportNodeStatusVertical) registered status handler for msgType: vmware.nsx.agg_service.PnicBondsMsgnsx-syslog.0:YYYY-MM-DDTHH:MM:SS.SSSZ In(182) nsx-exporter[#####036]: NSX #####036 - [nsx@##76 comp="nsx-esx" subcomp="agg-service" tid="#####036" level="INFO"] App (TransportNodeStatusVertical) registered status handler for msgType: vmware.nsx.agg_service.PnicBondsMsgnsx-syslog.0:YYYY-MM-DDTHH:MM:SS.SSSZ In(182) nsx-exporter[#####036]: NSX #####036 - [nsx@##76 comp="nsx-esx" subcomp="agg-service" tid="#####036" level="INFO"] Successfully registered TRANSPORT_NODE_STATUS_VERTICAL:PNIC_BONDS with StatusSource
VMware NSX
This issue occurs when multiple threads processing GetPnicBondsStatus, leading to a race condition issue as the nsx_uplink_map_ is not protected.
This issue has been resolved in NSX 4.2.1