VMware NSX 4.1 sudo or nvpapi application on NSX node has crashed
search cancel

VMware NSX 4.1 sudo or nvpapi application on NSX node has crashed

book

Article ID: 322432

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • You are using VMware NSX 4.1
  • You are alerted to an application crash on an NSX component.
  • There is minimal noticeable production impact.
  • In /var/log/syslog of the NSX manager you can confirm which NSX component (Manager or Edge) has the error (in this case edge##):
2023-06-12T01:28:45.566Z <manager_name> NSX 121272 - [nsx@6876 comp="nsx-manager" subcomp="nsx-sha" username="nsx-sha" level="CRITICAL" eventFeatureName="infrastructure_service" eventType="application_crashed" 
eventSev="critical" eventState="On" entId="cd59fec3-####-####-####-##########6d"] Application on NSX node <edge_name> has crashed. 
The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team.
  • Checking /var/log/dumpcore.log of the above mentioned NSX node we see a core file has been generated for sudo, nvpapi, or api_roothelper:
2023-06-12T01:27:45.480Z nsxmanager# NSX 4187495 - [nsx@6876 comp="nsx-manager" subcomp="node-mgmt" username="root" level="INFO"] Core dump generation received by process: None [nvpapi]
2023-06-12T01:27:45.488Z nsxmanager# NSX 4187495 - [nsx@6876 comp="nsx-manager" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.nvpapi.py.1679583589.5354.33.11.gz
  • This can be further confirmed checking /var/log/syslog of the above node for core dumps:
2023-03-23T14:59:49.659Z edge##.#######.#### NSX 1366408 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.nvpapi.py.1679583589.5354.33.11.gz
or
2023-03-23T14:59:49.659Z edge##.#######.#### NSX 1366408 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.sudo.1686533265.4187488.0.11.gz
  • Checking in /var/log/core/ on the node you see the core files:
root@edge##:/var/log/core# ls
total 454M
-rw-rw-rw- 1 root root 321M Jun 26 12:39 core.nvpapi.py.1679583589.5354.33.11.gz
  • In /var/log/kern.log you will see a line similar to the line below:

2025-01-24T17:43:55.934Z <manager_name> kernel - - - [17318523.570124] api_roothelper.[1097264] bad frame in rt_sigreturn frame:000##########1b8 ip:72a######30b sp:757######ec0 orax:fff##########fff in libc-2.31.so[72a#########+####00]
2025-01-24T17:43:55.935Z <manager_name> kernel - - - [17318523.570144] grsec: Segmentation fault occurred at 0000000000000000 in /opt/vmware/nsx-node-api/bin/api_roothelper.sh[api_roothelper.:1097264] uid/euid:0/0 gid/egid:0/0, parent /usr/bin/sudo[sudo:1097263] uid/euid:0/0 gid/egid:0/0

NOTE: The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on your environment.

Environment

VMware NSX

Cause

This is a known issue caused by the kernel version of the base appliance in NSX 4.1.0.0 
 
Services have crashed and the system generated their respective cored dump files. All NSX services are configured to be auto-restarted after hitting a crash. Depending on the application which has crashed it might be possible other services depending on it may not be functioning correctly. It is recommended to verify the services status which have crashed to confirm whether it's running. On the NSX-T appliance node, service status can be verified in nsxcli as below:
 
nsxcli> get service <service-name>
or
nsxcli> get services

 

Resolution

This issue is resolved in VMware NSX 4.1.1, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.


Workaround:
After collecting the support-bundle, the application crashed alarm can be resolved by removing the core dump files from the respective nodes.

  • On NSX appliance nodes, the following nsxcli command can be used to remove core and heap dump files:
nsxcli> del core-dump all
or
nsxcli> del core-dump <core-dump-file>

 

  • On ESXi host transport nodes, the following command can be used when logged in from the host as root:

root> rm -rf /var/core