nsx-exporter process crash on ESXi host causes an NSX alarm, Application on NSX node <hostname> has crashed
search cancel

nsx-exporter process crash on ESXi host causes an NSX alarm, Application on NSX node <hostname> has crashed

book

Article ID: 312614

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • Alarms reported for application crashes on ESXi transport nodes.
  • Logs similar to the below observed on the ESXi host in var/run/log/vobd.log
2024-04-02T18:52:43.225Z In(14) vobd[2098026]:  [UserWorldCorrelator] 213546302089us: [vob.uw.core.dumped] /usr/lib64/vmware/nsx-exporter/nsx-exporter(2101503) /var/core/nsx-exporter-zdump.000
2024-04-02T18:52:43.225Z In(14) vobd[2098026]:  [UserWorldCorrelator] 213543013185us: [esx.problem.application.core.dumped] An application (/usr/lib64/vmware/nsx-exporter/nsx-exporter) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /var/core/nsx-exporter-zdump.000.

 
  • Logs similar to the below will be observed on the ESXi host in var/run/logs/vmkernel.log
2024-04-02T18:52:41.442Z In(182) vmkernel: cpu76:2101643)User: 3238: nsx-exporter: wantCoreDump:nsx-exporter signal:11 exitCode:0 coredump:enabled
2024-04-02T18:52:41.545Z In(182) vmkernel: cpu76:2101643)UserDump: 3072: nsx-exporter: Dumping cartel 2101503 (from world 2101643) to file /var/core/nsx-exporter-zdump.000 ...
2024-04-02T18:52:43.225Z In(182) vmkernel: cpu67:2101643)UserDump: 3367: nsx-exporter: Userworld(nsx-exporter) coredump complete.
 
  • Logs similar to the below indicating that the flow count exceeded the buffer will be observed in var/run/logs/nsx-syslog
2024-04-02T18:52:41.373Z In(182) nsx-exporter[2101503]: NSX 2101503 - [nsx@6876 comp="nsx-esx" subcomp="agg-service" >tid="2101643" level="INFO"] Flow count diffs in actual (27368) vs in FLOW_GET_NUMRECORDS (27367)

Environment

VMware NSX
VMware NSX-T Data Center

Cause

This issue occurs when the flow count exceeds the buffer.

Resolution

This issue is resolved in VMware NSX 4.2

Additional Information

Impact/Risks:
This issue is very rare, and when the issue occurs the application will restart, so it is unlikely that much impact will be observed.