Application on NSX node has crashed alarm
search cancel

Application on NSX node has crashed alarm

book

Article ID: 345792

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Title: Alarm for application crashed on NSX node
Event ID: infrastructure_service.application_crashed
Added in release: 4.0
Alarm Description
  • Purpose: This alarm notifies user that an application crash has been reported by node (with its hostname or id) in alarm description.
  • Impact: Services have crashed and the appliance generated their respective core or heap dump files.
  • Resolution:
    • NSX services are configured to auto-restart after hitting a crash.
      Not always but, crashed application may cause dependent services to not function correctly, in such cases it is recommended to verify the services status to confirm all the related services are running.
    • VMware strive towards building quality products and in order to continue delivering the best - engineering teams at VMware are inclined to learn such issues from its customers. Hence, application crashed issue needs to be reported to VMware support team, so that NSX services can be made more robust in coming releases.
 
  1. Symptoms
    • You are seeing the following alarm in NSX UI :

      Application on NSX node <node> has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team. Recommended Action Collect Support Bundle for NSX node <nsx manager> using NSX Manager UI or API.

    • Checking /var/log/syslog.log on NSX appliance node (Unified appliance, Edge, etc), you can see messages similar to:

      2023-05-19T02:50:34.898Z local-manager NSX 85581 MONITORING [nsx@6876 alarmId="e44e47ae-8c4c-47aa-85a9-7a159b72d7ee" alarmState="OPEN" comp="nsx-manager" entId="340cd33e-fec7-46cd-91d5-ff3b6fc90faf" errorCode="MP701099" eventFeatureName="infrastructure_service" eventSev="CRITICAL" eventState="On" eventType="application_crashed" level="FATAL" nodeId="d1be0142-b001-01f5-8bdb-d5ae7b37180b" subcomp="monitoring"] Application on NSX node local-manager has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team.

    • In the case of the node being an ESXi host transport node, same message as above can be found in /var/log/nsx-syslog.log as below:

      2023-05-18T10:07:31Z nsx-sha: NSX 268653 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="CRITICAL" eventFeatureName="infrastructure_service" eventType="application_crashed" eventSev="critical" eventState="On" entId="76a85727-30ab-4ff5-bb7c-a064668252f0"] Application on NSX node sc2-10-185-106-230.esxi.host.com has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team.

    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

  2. Cause

     

    • On NSX appliance node - service status can be verified over CLI as below:

      nsxcli> get service <service-name>
      or
      nsxcli> get services

    • Application crash should have generated a core or heap dump on the NSX node, which can be verified in CLI as below:

      nsxcli>  get core-dumps
      Directory: /var/core
      20762624     May 18 2023 11:44:13 UTC  nsx-exporter-zdump.000
      26832896     May 18 2023 10:04:59 UTC  opsAgent-zdump.000

    Note: In the above example output - a couple of services nsx-exporter and opsAgent crashed and the system generated their respective core dump files.

  3. Resolution

    Application crashed alarm is is related to NSX services or a certain environmental factor which might have hit a fatal or unhandled exception causing core or heap dump generation. Hence, application crashed issues need to be reported to VMware support team, so that NSX services can be made more robust in coming releases.

    In order to report application crashed issues, kindly refer to the steps below:

    1. Collect the latest support-bundle with core dump and audit logs from the nodeswhere application crashed alarm is observed, please refer to this document for details on how to collect the support bundle with core and audit logs.
    2. Kindly refer following command in order to copy individual core dump files to a remote location from NSX appliance nodes

      nsxcli> copy core-dump core.nginx.1559278043.gz url scp://[email protected]/tmp/
      [email protected]'s password:

    3. If you face this issue, kindly collect a support-bundle, contact VMware Support and file a Support Request (refer to KB article "How to file a Support Request in Customer Connect" https://kb.vmware.com/s/article/2006985).

    4. After collecting the support-bundle, the application crashed alarm can be resolved by removing the core dump files from the respective nodes.
      1. On NSX appliance nodes, following command can be used respective of NSX version to remove core and heap dump files:
        1. For NSX version 4.1 or below:

          nsxcli> del core-dump all

          or

          nsxcli> del core-dump <core-dump-file>

        2. For NSX version 4.1.1 or above:

          Use the command below to collect support-bundle with core dump and audit log while also deleting the core dump files at same the time

          nsxcli> get support-bundle file support-bundle.tgz all remove-core-files

      2. On ESXi host transport nodes, following commands can be used respective of NSX version to remove core dump files:
        1. For NSX version 4.1 or below:

          command below to be executed in shell console of ESXi host

          root> rm -rf /var/core

        2. For NSX version 4.1.1 or above:

          command below to be executed in NSX CLI of ESXi host

          nsxcli> del core-dump all

          or

          nsxcli> del core-dump <core-dump-file>



Environment

VMware NSX-T Data Center

Additional Information

API Guide: https://developer.vmware.com/apis/1583/nsx-t#api
Admin Guide: https://docs.vmware.com/en/VMware-NSX/4.1/administration/GUID-FBFD577B-745C-4658-B713-A3016D18CB9A.html