NSX Manager reported Application Crashed on ESXi host during pNIC Bonding
search cancel

NSX Manager reported Application Crashed on ESXi host during pNIC Bonding

book

Article ID: 416892

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

NSX Manager has reported an application crash on an ESXi host.

The host /var/log/vmkernel.log shows a crash dump file named nsx-exporter-zdump.000 was generated. A sample log snippet is provided below for reference.

YYYY-MM-DDTHH:MM:SS.SSSZ In(182) vmkernel: cpu36:#####76)User: ####: nsx-exporter: wantCoreDump:nsx-exporter signal:6 exitCode:0 coredump:enabled
YYYY-MM-DDTHH:MM:SS.SSSZ In(182) vmkernel: cpu36:#####76)UserDump: ####: nsx-exporter: Dumping cartel #####92 (from world #####76) to file /var/core/nsx-exporter-zdump.000 ...
YYYY-MM-DDTHH:MM:SS.SSSZ In(182) vmkernel: cpu36:#####76)UserDump: ####: nsx-exporter: Userworld(nsx-exporter) coredump complete.

From the /var/log/nsx-syslog around the time the core dump was generated shows multiple pNIC bonding and teaming related events occurring shortly before the crash. Please refer to the sample log snippet below for additional context.

nsx-syslog.0:YYYY-MM-DDTHH:MM:SS.SSSZ In(182) nsx-exporter[#####92]: NSX #####92 - [nsx@##76 comp="nsx-esx" subcomp="agg-service" tid="#####10" level="INFO"] [Heatmap][KCPGetPSTeam] Got 0 uplinks and 0 LAGs
nsx-syslog.0:YYYY-MM-DDTHH:MM:SS.SSSZ In(182) nsx-exporter[#####92]: NSX #####92 - [nsx@##76 comp="nsx-esx" subcomp="agg-service" tid="#####10" level="INFO"] [Heatmap][KCPGetPSTeam] 0 pnics 0 lags
nsx-syslog.0:YYYY-MM-DDTHH:MM:SS.SSSZ In(182) nsx-exporter[#####2]: NSX #####92 - [nsx@##76 comp="nsx-esx" subcomp="mpa-client" tid="#####11" level="INFO"] [TransportNodeStatusVertical] RespondMsg : Sent response with type (com.vmware.nsx.management.agg_service.transport_node_status.PnicBondsMsg) corelationId (########-####-####-###############59) trackingId (########-####-####-###############59)
nsx-syslog.0:YYYY-MM-DDTHH:MM:SS.SSSZ In(182) nsx-exporter[#####036]: NSX #####036 - [nsx@##76 comp="nsx-esx" subcomp="agg-service" tid="#####036" level="INFO"] App (TransportNodeStatusVertical) registered status handler for msgType: vmware.nsx.agg_service.PnicBondsMsg
nsx-syslog.0:YYYY-MM-DDTHH:MM:SS.SSSZ In(182) nsx-exporter[#####036]: NSX #####036 - [nsx@##76 comp="nsx-esx" subcomp="agg-service" tid="#####036" level="INFO"] App (TransportNodeStatusVertical) registered status handler for msgType: vmware.nsx.agg_service.PnicBondsMsg
nsx-syslog.0:YYYY-MM-DDTHH:MM:SS.SSSZ In(182) nsx-exporter[#####036]: NSX #####036 - [nsx@##76 comp="nsx-esx" subcomp="agg-service" tid="#####036" level="INFO"] Successfully registered TRANSPORT_NODE_STATUS_VERTICAL:PNIC_BONDS with StatusSource

 

Environment

VMware NSX

Cause

This issue occurs when multiple threads processing GetPnicBondsStatus, leading to a race condition issue as the nsx_uplink_map_ is not protected.

Resolution

This issue has been resolved in NSX 4.2.1

Additional Information

Fixed Issue 3405911: In rare scenarios, nsx-exporter on ESX host may crash.
There will be a brief service interruption, which may delay data updates on the MP. The nsx-exporter process restarts automatically and resumes operation.