Application on NSX node has crashed and created multiple nestdb-server-zdump core dumps on ESXI hosts.
search cancel

Application on NSX node has crashed and created multiple nestdb-server-zdump core dumps on ESXI hosts.

book

Article ID: 367512

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

  • You are running VMware NSX 4.1.x.
  • In the NSX-T manager UI, the below alarm is generated with the following details:
    "Application on NSX node <Node name> has crashed. The number of core files found is <X>. Collect the Support Bundle including core dump files and contact VMware Support team."
  • On the ESXi host, In the log file /var/run/log/vobd.log we see entries:
    [esx.problem.application.core.dumped] An application (/opt/vmware/nsx-nestdb/bin/nestdb-server) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /var/core/nestdb-server-zdump.000. 
  • On the ESXi host, we see the following core dump generated:

    /var/core/nestdb-server-zdump.xxx

Cause

  • When the metrics are being written to the scratch location, a Remote Procedure Call (RPC) accesses the file system to check if a specific file exists on the scratch location.
  • As a part of the file system check, when nsx-nestDB tries to check the file location, it may return the error code "busy" on the file when multiple hosts are trying to access the same file. This may cause a crash of  nsx-nestDB service. Service is configured to be auto-restarted after a crash.
  • This will not impact the data plane or performance of the host. 

Resolution

This is a known issue which will be fixed in future release of NSX.

Additional Information

To clear the alarm, core dump file(s) should be manually deleted from the ESXi host.