vdpi process crash on ESXi host causes an NSX alarm, Application on NSX node <hostname> has crashed
book
Article ID: 323542
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
Symptoms:
You are running VMware NSX 4.x.
In the NSX-T manager UI, the below alarm is generated with the following details:
Application on NSX node <hostname> has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team.
On the ESXi host, In the log file /var/run/log/vobd.log we see entries:
[esx.problem.application.core.dumped] An application (/usr/lib/vmware/nsx-vdpi/bin/vdpi) running on ESXi host has crashed (2 time(s) so far). A core file may have been created at /var/core/vdpi-zdump.001.
On the ESXi host, we see the following core dump generated:
/var/core/vdpi-zdump.xxx
On the ESXi host, In /var/run/log/nsx-syslog.log we see the following entries between 0 to 20 times:
Revalidating domains to generation number <x>
Note: The 'x' does not change for each FQDN revalidation.
Environment
VMware NSX-T Data Center
Cause
Under normal circumstance we can expect to see these log entries 'Revalidating domains to generation number <x>' between 10 to 20 times during FQDN changes, when this issue occurs we see the entry more than 20 times.
The VDPI crash occurs when the FQDN has changed for a context profile firewall rule, while traffic is flowing through this rule and using the existing FQDN.
The process gets caught in a loop and leads to memory issues causing the VDPI crash.
Workaround: To avoid this issue from occurring, do not make changes to the FQDN used in the context profile firewall rule, while traffic is flowing for this FQDN. You can disable the rule, then make the changes and enable the rule again. As this may impact traffic, you can do it in a maintenance window.