ESXi host in "not responding" state

Products

VMware vSphere ESX 7.x VMware vSphere ESX 8.x

Issue/Introduction

Symptoms:

An ESXi host shows as "not responding" in vCenter Server.
Cannot connect the ESXi host to vCenter Server.
Virtual machine running on the hosts are still accessible.
The esxi will get into not responding state due to ram disk getting full with the below log snippets in vmkernel.log

YYYY-MM-DDTHH-MM-SS In vmkernel: cpu5o: 2101467)Activating Jumpstart plugin nicd.
YYYY-MM-DDTHH-MM-SS In vmkernel: cpu63:2101481) Activating Jumpstart plugin vmfstraced.
YYYY-MM-DDTHH-MM-SS In vmkernel: cpu37:2101485) Activating Jumpstart plugin lbtd.
YYYY-MM-DDTHH-MM-SS In vmkernel: cpu22:2101882) Admission failure in path: host/system/visorfs/ramdisks/etc:etc
YYYY-MM-DDTHH-MM-SS In vmkernel: cpu22:2101882) etc (270) requires 4 KB, asked 4 KB from etc (269) which has 28672 KB occupied and 0 KB available.
YYYY-MM-DDTHH-MM-SS In vmkernel: cpu22:2101882) Admission failure in path: host/system/visorfs/ramdisks/etc:etc
YYYY-MM-DDTHH-MM-SS In vmkernel: cpu22:2101882) etc (270) requires 4 KB, asked 4 KB from etc (269) which has 28672 KB occupied and 0 KB available.
YYYY-MM-DDTHH-MM-SS Wa vmkwarning: cpu22:2101882) WARNING: VisorFSRam: 220: Cannot extend visorfs file /etc/vmsyslog. conf.d/portlldpd.conf because its ramdisk (etc) is full.

The Ram disk usage will be at 100% for /etc.

Compare to other config file under NSX directory this controller file alone will have a large size. The expected size of the file should be around 8 to 10 MB in size.

Environment

VMware vSphere ESXi 7.0.x
VMware vSphere ESXi 8.0.x

Cause

The issue is caused by a corrupted controller-info.xml file located in /etc/vmware/nsx. The corruption may include unwanted whitespace or additional invalid inputs, resulting in excessive file size (often >10 MB).

Resolution

To resolve the issue, follow the steps below:

Step 1: Clear Old Log Files

Free up space on the /etc partition by removing old log files, as detailed in the KB:

ESXi host RAM disk is full

Step 2: Then Validate controller-info.xml File

If the issue persists after clearing log files, check the size of the NSX configuration file:
ls -lh /etc/vmware/nsx/controller-info.xml
If the controller-info.xml file is larger than 10 MB, proceed with the below steps.

Step 3: Remove and Recreate controller-info.xml

Stop NSX Proxy Service:
/etc/init.d/nsx-proxy stop
Remove the Corrupted File:
rm /etc/vmware/nsx/controller-info.xml
Restart Critical Services:
/etc/init.d/hostd restart /etc/init.d/vpxa restart
Start NSX Proxy Again:
/etc/init.d/nsx-proxy start
Verify File Re-Creation:

Ensure that the controller-info.xml file is automatically regenerated:
ls -lh /etc/vmware/nsx/controller-info.xml

Check NSX Controller Connectivity:

Confirm that the host has re-established communication with the NSX Controller.

Step 4: Reboot the Host

Perform a clean reboot of the ESXi host.

Step 5: Post-Reboot Validation

After reboot, verify that all services are running properly.
Check the RAM disk status using vdf -h RAM disk usage is within normal limits.
The controller-info.xml file is reset and correctly populated.
NSX connectivity is restored.