opsAgent process crash on ESXi host causes an NSX alarm, Application on NSX node <hostname> has crashed
book
Article ID: 319783
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
You are running ESXi versions 7.0U3, 8.0, 8.0U1.
In the NSX-T manager UI, the below alarm is generated with the following details:
Application on NSX node <hostname> has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team. Recommended Action Collect Support Bundle for NSX node <hostname> using NSX Manager UI or API.
On the ESXi host, In the log file /var/run/log/hostd.log we see entries:
Event 2061759 : An application (/usr/lib64/vmware/nsx-opsagent/bin/opsAgent) running on ESXi host has crashed (1 time(s) so far). A core file might have been created at /var/core/opsAgent-zdump.000.
On the mentioned ESXi host, we see the following core dump generated:
/var/core/opsAgent-zdump.xxx
Environment
VMware NSX-T Data Center
Cause
This issue (opsAgent crashed) occurs due to a newly introduced feature in ESXi 7.0U3, which was added to change free random memory during switch vDS operations.
Resolution
This issue is resolved in ESXi version 8.0U2.
Workaround: You can disable the health check in NSX to avoid the crash of nsx-opsagent from occurring by following the below steps:
In the NSX-T UI, go to System, Fabric and Transport zones.
Click "Health Configuration" which reside on the right of the UI page.
Click "Edit" on the right of "Automatic Health Check".
Then disable it.
Note.
After upgrading to a fixed version, enable the health check in NSX again.
Regarding the health check and the potential consequences of disabling it, As health check is a tool to check VLAN and MTU if it's matched between two ports, when disabled, it will not check VLAN and MTU.
The core dump may continue to occur after you've deleted them from the ESXi host, when the health check is enabled.