Event ID: infrastructure_service.application_crashed
Alarm Description :
Purpose: This alarm notifies user that an application crash has been reported by node (with its hostname or id) in alarm description.
Impact: Services have crashed and the appliance generated their respective core or heap dump files.
Application on NSX node <node> has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team. Recommended Action Collect Support Bundle for NSX node <nsx manager> using NSX Manager UI or API.
/var/log/syslog.log on NSX appliance node (Unified Appliance, Edge, etc), similar to:2023-05-19T02:50:34.898Z local-manager NSX 85581 MONITORING [nsx@6876 alarmId="e44e47ae-####-####-####-7a1#####d7ee" alarmState="OPEN" comp="nsx-manager" entId="####-####-####-####-####" errorCode="MP701099" eventFeatureName="infrastructure_service" eventSev="CRITICAL" eventState="On" eventType="application_crashed" level="FATAL" nodeId="####-####-####-####-d#####b" subcomp="monitoring"] Application on NSX node local-manager has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team.
/var/log/nsx-syslog.log:2023-05-18T10:07:31Z nsx-sha: NSX 268653 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="CRITICAL" eventFeatureName="infrastructure_service" eventType="application_crashed" eventSev="critical" eventState="On" entId="####-####-####-####-####"] Application on NSX node has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team.Validation:
nsxcli> get core-dumps
Directory: /var/log/core
20762624 May 18 2023 11:44:13 UTC core.nginx.1559278043.gznsxcli> get service <service-name>
or
nsxcli> get servicesNote: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on the environment.
VMware NSX 4.x
Services have crashed, and the system generated the respective core dump files. All NSX services are configured to be auto-restarted in the event of a crash. Depending on the application which has crashed, it might be possible other services depending on it may not be functioning correctly. It is recommended to verify the status of services that have crashed to confirm the running state. In many cases, the alarms are noticed after upgrading the NSX environment and did not appear prior to the upgrade. In these cases, a core dump may have been present for a long time even without any issues having been noticed or any intervention steps taken. On an NSX Manager, core files can be generated at either /var/log/core/ or /image/core/. There can be external causes, such as network cable failures that can contributed to network redundancy issues and vSAN connectivity problems.
This alarm is no longer present in NSX 4.2.1 and above.
To resolve the alarm the core dump files must be deleted from the respective nodes, this activity has no impact on production.
NSX Appliance Manager and Edge
All core files can be deleted with one command from admin shell
admin> del core-dump all
Alternatively it is possible to delete files one by oneadmin> get core-dumps
Directory: /var/log/core20762624 May 18 2023 11:44:13 UTC core.nginx.1559278043.gz
admin> del core-dump /var/log/core/core.nginx.1559278043.gz
ESXi host
Commands below to be executed in root shell console of ESXi host:
For NSX version 4.1 or below:
root# rm -f /var/core/*
For NSX version 4.1.1 or above:
root# nsxcli -c del core-dump all
If contacting Broadcom Support for this issue, provide the text of the alarm(s) from the NSX UI as well as the log files and core dump(s). Before deleting any core files, collect the latest support-bundle, adding the option for core dump/sensitive information from the nodes where the application crashed alarm is observed. Please refer to Collect Support Bundles for details on how to collect the support bundle with core and audit logs.
In NSX version 4.1.1 or above, the core dump files can also be removed as part of the collection of a support bundle, with the command: get support-bundle
nsxcli> get support-bundle file support-bundle.tgz all remove-core-files
If needed individual core dump files can be copied to a remote location from NSX appliance nodes with the admin CLI command: copy core-dump
Note that the full path should be given for the core file, depending on the output of the admin CLI command: get core-dumps
Replace the path and filename with your values.
nsxcli> get core-dumps
Directory: /var/log/core
20762624 May 18 2023 11:44:13 UTC core.nginx.1559278043.gz
nsxcli> copy core-dump /var/log/core/core.nginx.1559278043.gz url scp://root@<Remote location IP address>/tmp/
root@<Remote location IP address>'s password:
The following articles detail known application crash issues: