Application on NSX node has crashed and created multiple nestdb-server-zdump core dumps on ESXI hosts.
book
Article ID: 367512
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
You are running VMware NSX 4.1.x.
In the NSX-T manager UI, the below alarm is generated with the following details: "Application on NSX node <Node name> has crashed. The number of core files found is <X>. Collect the Support Bundle including core dump files and contact VMware Support team."
On the ESXi host, In the log file /var/run/log/vobd.log we see entries: [esx.problem.application.core.dumped] An application (/opt/vmware/nsx-nestdb/bin/nestdb-server) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /var/core/nestdb-server-zdump.000.
On the ESXi host, we see the following core dump generated:
/var/core/nestdb-server-zdump.xxx
Cause
When the metrics are being written to the scratch location, a Remote Procedure Call (RPC) accesses the file system to check if a specific file exists on the scratch location.
As a part of the file system check, when nsx-nestDB tries to check the file location, it may return the error code "busy" on the file when multiple hosts are trying to access the same file. This may cause a crash of nsx-nestDB service. Service is configured to be auto-restarted after a crash.
This will not impact the data plane or performance of the host.
Resolution
This issue is fixed in NSX 4.2 and higher.
Additional Information
To clear the alarm, core dump file(s) should be manually deleted from the ESXi host.