A crash of the cfgAgent process on ESXi host causes an NSX alarm, "Application on NSX node <hostname> has crashed"
book
Article ID: 322495
calendar_today
Updated On:
Products
VMware NSXVMware vDefend FirewallVMware vDefend Firewall with Advanced Threat Prevention
Issue/Introduction
This issue is specific to NSX versions 4.x.
Auto-updates are enabled for the IP reputation feature.
In the NSX-T manager UI, a Critical alarm or alarms are generated with the following details:
Application on NSX node <hostname> has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team. Recommended Action Collect Support Bundle for NSX node <hostname> using NSX Manager UI or API.
The following core dump is present on the ESXi host:
/var/core/nsx-cfgagent-zdump.xxx
The log file /var/run/log/vobd.log contains entries similar to the following:
[esx.problem.application.core.dumped] An application (/usr/lib/vmware/nsx-cfgagent/bin/nsx-cfgagent) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /var/core/nsx-cfgagent-zdump.xxx
The log file /var/run/log/nsx-syslog.log contains entries similar to the following:
CFGAGENT_ALLOC_FAIL : CfgAgent error: no memory!
cfgAgent log files will contain output similar to the below, confirming that auto-update is enabled for the IP reputation feature [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="9810D700" level="info"] dfw: DfwMsgCache: ip reputation chunks to update: chunk(86, 0)
DFW rules are not getting realized on the ESXi host
Adding VMs to the Exclusion list does NOT remove the DFW rules from the VM
Newly spun-up VMs don't get connected to the network
Environment
VMware NSX
Cause
The NSX-T agent running on the host, cfgAgent, ran out of memory when the IP reputation feature is enabled and is updating.
Resolution
This issue is resolved in VMware NSX 4.1.2.3 This issue is resolved in VMware NSX 4.2.0
Workaround:
If the IP Reputation feature is not used, you can disable IP Reputation checks to prevent this issue from occurring again. In the NSX-T UI, go to Security, Distributed Firewall and click on Actions
Under Settings, Malicious IP Feeds Change Auto-update Malicious IPs from On to Off.
If uptime of nsx-cfgagent is approaching 90 days, the service can be restarted manually with: /etc/init.d/nsx-cfgagent restart