opsAgent process crash on ESXi host causes an NSX alarm, Application on NSX node <hostname> has crashed
search cancel

opsAgent process crash on ESXi host causes an NSX alarm, Application on NSX node <hostname> has crashed

book

Article ID: 319783

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • You are running ESXi versions 7.0U3, 8.0, 8.0U1.
  • In the NSX-T manager UI, the below alarm is generated with the following details:
Application on NSX node <hostname> has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team. Recommended Action Collect Support Bundle for NSX node <hostname> using NSX Manager UI or API.
  • On the ESXi host, In the log file /var/run/log/hostd.log we see entries:
Event 2061759 : An application (/usr/lib64/vmware/nsx-opsagent/bin/opsAgent) running on ESXi host has crashed (1 time(s) so far). A core file might have been created at /var/core/opsAgent-zdump.000.
  • ​​​​​​On the mentioned ESXi host, we see the following core dump generated:
/var/core/opsAgent-zdump.xxx 


Environment

VMware NSX-T Data Center

Cause

This issue (opsAgent crashed) occurs due to a newly introduced feature in ESXi 7.0U3, which was added to change free random memory during switch vDS operations.

Resolution

This issue is resolved in ESXi version 8.0U2.

Workaround:
You can disable the health check in NSX to avoid the crash of nsx-opsagent from occurring by following the below steps:

  1. In the NSX-T UI, go to System, Fabric and Transport zones.
  2. Click "Health Configuration" which reside on the right of the UI page.
  3. Click "Edit" on the right of "Automatic Health Check".
  4. Then disable it.

Note.

  • After upgrading to a fixed version, enable the health check in NSX again. 
  • Regarding the health check and the potential consequences of disabling it, As health check is a tool to check VLAN and MTU if it's matched between two ports, when disabled, it will not check VLAN and MTU.
  • The core dump may continue to occur after you've deleted them from the ESXi host, when the health check is enabled.