Application on NSX node has crashed and created multiple nestdb-server-zdump core dumps on ESXI hosts.
search cancel

Application on NSX node has crashed and created multiple nestdb-server-zdump core dumps on ESXI hosts.

book

Article ID: 367512

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • You are running VMware NSX 4.1.x.
  • In the NSX-T manager UI, the below alarm is generated with the following details:
    "Application on NSX node <Node name> has crashed. The number of core files found is <X>. Collect the Support Bundle including core dump files and contact VMware Support team."
  • On the ESXi host, In the log file /var/run/log/vobd.log we see entries:
    [esx.problem.application.core.dumped] An application (/opt/vmware/nsx-nestdb/bin/nestdb-server) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /var/core/nestdb-server-zdump.000. 
  • On the ESXi host, we see the following core dump generated:

    /var/core/nestdb-server-zdump.xxx

Cause

  • When the metrics are being written to the scratch location, a Remote Procedure Call (RPC) accesses the file system to check if a specific file exists on the scratch location.
  • As a part of the file system check, when nsx-nestDB tries to check the file location, it may return the error code "busy" on the file when multiple hosts are trying to access the same file. This may cause a crash of  nsx-nestDB service. Service is configured to be auto-restarted after a crash.
  • This will not impact the data plane or performance of the host. 

Resolution

This issue is fixed in NSX 4.2 and higher.

Additional Information

To clear the alarm, core dump file(s) should be manually deleted from the ESXi host.