In an NSXv environment an ESXi host is in a Not Responding state in vCenter
search cancel

In an NSXv environment an ESXi host is in a Not Responding state in vCenter

book

Article ID: 324196

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
  • NSX Data Center for vSphere environment with a large inventory of DFW rules
  • ESXi host maybe in a Not Responding state in vCenter
  • hostd management service on the ESXi host restarts unexpectedly or maybe stopped
#/etc/init.d/hostd status
  hostd is not running.
  • vdf -h shows the etc ram disk is full
Ramdisk                   Size      Used Available Use% Mounted on
root                       32M        5M       26M  17% --
etc                        28M       28M        0M 100% --
opt                        32M        0B       32M   0% --
var                        48M        1M       46M   3% --
tmp                       256M       32K      255M   0% --
iofilters                  32M        0B       32M   0% --
hostdstats                553M       14M      538M   2% --
snmptraps                   1M        0B        1M   0% --
 
  • /var/log/hostd.log has logging similar to
2020-10-01T16:51:46.622Z info hostd[2101300] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 45312 : The ramdisk 'etc' is full.  As a result, the file /etc/vmware/hostd/vmInventory.xml.tmp could not be written.
2020-10-01T16:51:46.623Z info hostd[2100833] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 45313 : The ramdisk 'etc' is full.  As a result, the file /etc/vmware/hostd/pools.xml.tmp could not be written.
2020-10-01T16:52:58.496Z - time the service was last started, Section for VMware ESX, pid=17924909, version=6.7.0, build=15216127, option=Release
  • /var/log/vsfwd.log has logging similar to
2020-10-01T14:03:03Z vsfwd: [WARN] failed to write temporary file /etc/vmware/vsfwd/vsipfw_ruleset.dat.tmp
2020-09-30T14:40:04Z vsfwd: [ERROR] ioctl cmd 30 on device /dev/vsip failed: No such file or directory
2020-10-01T16:00:33Z vsfwd: [INFO] Config data of 19942924 bytes was not compressed.


Cause

The NSX Manager pushes the DFW configuration to the ESXi host and it is stored in /etc/vmware/vsfwd/vsipfw_ruleset.dat.
This mechanism involves compression of the DFW configuration before writing to file.
In rare cases, if vsfwd process memory is very high, the compression process may fail and the uncompressed configuration is written to disk.
If a large DFW configuration is present, this config will fill /etc ramdisk and the ESXi management service, hostd, will fail. As a result the host will be in a Not Responding state in vCenter.

Resolution

This is known issue affecting NSX Data Center for vSphere, there is currently no resolution.

Workaround:
1.) Stop vsfwd service:
/etc/init.d/vShield-Stateful-Firewall stop

2.) Clear the existing uncompressed DFW config files:
rm /etc/vmware/vsfwd/vsipfw_ruleset.dat
rm /etc/vmware/vsfwd/vsipfw_ruleset.dat.old

3.) Start hostd if it is stopped:
/etc/init.d/hostd start

4.) Start vsfwd
/etc/init.d/vShield-Stateful-Firewall start

These config files will be pushed down again, ensure they are written in a small compressed size that does not fill /etc.