Application on NSX node has crashed alarm
search cancel

Application on NSX node has crashed alarm

book

Article ID: 345792

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Title: Alarm for application crashed on NSX node
Event ID: infrastructure_service.application_crashed
Alarm Description

  • Purpose: This alarm notifies user that an application crash has been reported by node (with its hostname or id) in alarm description.
  • Impact: Services have crashed and the appliance generated their respective core or heap dump files.
  • Symptoms
    • You are seeing alarms similar to the following in the NSX UI :

      Application on NSX node <node> has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team. Recommended Action Collect Support Bundle for NSX node <nsx manager> using NSX Manager UI or API.

    • Checking /var/log/syslog.log on NSX appliance node (Unified appliance, Edge, etc), you can see messages similar to:

      2023-05-19T02:50:34.898Z local-manager NSX 85581 MONITORING [nsx@6876 alarmId="e44e47ae-8c4c-47aa-85a9-7a159b72d7ee" alarmState="OPEN" comp="nsx-manager" entId="340cd33e-fec7-46cd-91d5-ff3b6fc90faf" errorCode="MP701099" eventFeatureName="infrastructure_service" eventSev="CRITICAL" eventState="On" eventType="application_crashed" level="FATAL" nodeId="d1be0142-b001-01f5-8bdb-d5ae7b37180b" subcomp="monitoring"] Application on NSX node local-manager has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team.

    • In the case of the node being an ESXi host transport node, same message as above can be found in /var/log/nsx-syslog.log as below:

      2023-05-18T10:07:31Z nsx-sha: NSX 268653 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="CRITICAL" eventFeatureName="infrastructure_service" eventType="application_crashed" eventSev="critical" eventState="On" entId="76a85727-30ab-4ff5-bb7c-a064668252f0"] Application on NSX node has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team.

    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX

Cause

Services have crashed and the system generated their respective core dump files. All NSX services are configured to be auto-restarted after hitting a crash. Depending on the application which has crashed it might be possible other services depending on it may not be functioning correctly. It is recommended to verify the services status which have crashed to confirm whether it's running.

  • On NSX appliance node - service status can be verified over CLI as below:

    nsxcli> get service <service-name>
    or
    nsxcli> get services

  • Application crash should have generated a core or heap dump on the NSX node, which can be verified in CLI as below:

    nsxcli>  get core-dumps
    Directory: /var/log/core
    20762624     May 18 2023 11:44:13 UTC  core.nginx.1559278043.gz

Note: In the above example output, the service nginx crashed and the system generated a core dump file.

Resolution

Steps to resolve
For 4.0.0 and above

Recommended Action:

  • NSX services are configured to auto-restart after hitting a crash.
    Not always, crashed application may cause dependent services to not function correctly, in such cases it is recommended to verify the services status to confirm all the related services are running.
  • VMware strive towards building quality products and in order to continue delivering the best - engineering teams at Broadcom are inclined to learn such issues from its customers. Hence, application crashed issue needs to be reported to VMware support team, so that NSX services can be made more robust in coming releases.

Application crashed alarm is is related to NSX services or a certain environmental factor which might have hit a fatal or unhandled exception causing core or heap dump generation. Hence, application crashed issues need to be reported to VMware support team, so that NSX services can be made more robust in coming releases.

In order to report application crashed issues, use the steps below:

    1. Collect the latest support-bundle with core dump and audit logs from the nodes where application crashed alarm is observed, please refer to Collect Support Bundles for details on how to collect the support bundle with core and audit logs.
    2. Individual core dump files can be copied to a remote location from NSX appliance nodes with the command: copy core-dump 
      Note that the fullpath should be given for the core file, depending on the output of the command: get core-dumps
      Replace the path and filename with your values.

      nsxcli>  get core-dumps
      Directory: /var/log/core
      20762624     May 18 2023 11:44:13 UTC  core.nginx.1559278043.gz
      nsxcli> copy core-dump /var/log/core/core.nginx.1559278043.gz url scp://[email protected]/tmp/
      [email protected]'s password:

    3. If you face this issue, kindly collect a support-bundle, contact Broadcom Support.

    4. After collecting the support-bundle, the application crashed alarm can be resolved by removing the core dump files from the respective nodes.
      1. On NSX appliance nodes, core and heap dump files can be removed with the command: del core-dump
        Note that the fullpath should be given for the core file, depending on the output of the command: get core-dumps
        Replace the path and filename with your values.

nsxcli>  get core-dumps
Directory: /var/log/core
20762624     May 18 2023 11:44:13 UTC  core.nginx.1559278043.gz
nsxcli> del core-dump /var/log/core/core.nginx.1559278043.gz

or

nsxcli> del core-dump all

In NSX version 4.1.1 or above, the core dump files can also be removed as part of the collection of a support bundle, with the command: get support-bundle

nsxcli> get support-bundle file support-bundle.tgz all remove-core-files

      1. On ESXi host transport nodes, following commands can be used respective of NSX version to remove core dump files:
        1. For NSX version 4.1 or below:

          command below to be executed in shell console of ESXi host

          root> rm -rf /var/core

        2. For NSX version 4.1.1 or above:

          command below to be executed in NSX CLI of ESXi host

          nsxcli> del core-dump all

          or

          nsxcli> del core-dump <core-dump-file>

Maintenance window required for remediation? No

Additional Information

The following articles detail known core dump issues:

API Guide
Admin Guide 
CLI Guide

Note: Article 322533 (legacy id 92493), "NSX-T UI alarms are generated: Application on NSX node <node> has crashed" was archived in favor of this article.