The ESXi host enters a "Not Responding" state and the host UI becomes inaccessible. This issue occurs when the hostd daemon crashes and creates a host dump.
In the hostd.log file, prior to the crash dump, you see firewall rule set operations for adding and removing IPs for the NFS client.
/var/run/log/hostd-probe.log, the below entries are seen,[YYYY-MM-DDTHH:MM:SS] Wa(164) hostd-probe[2100200]: [Originator@6876 sub=Default] hostd was not detected to be running
/var/run/log/hostd.log the below entries are seen,[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016277]: [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 20286 : Firewall configuration has changed. Operation 'addIP4' for rule set nfsClient succeeded.[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016303]: [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> key = 124,[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> chainId = -1,[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> createdTime = "1970-01-01T00:00:00Z",[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> userName = "",[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> host = (vim.event.HostEventArgument) {[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> name = "Hostname",[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> host = 'vim.HostSystem:ha-host'[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> },[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> eventTypeId = "esx.audit.net.firewall.config.changed",[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> arguments = (vmodl.KeyAnyValue) [[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> (vmodl.KeyAnyValue) {[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> key = "1",[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> value = "removeIP4"[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> },[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> (vmodl.KeyAnyValue) {[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> key = "2",[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> value = "nfsClient"[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> }[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> ],[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> objectId = "ha-host",[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> objectType = "vim.HostSystem",[YYYY-MM-DDTHH:MM:SS] In(166) Hostd[4016267]: --> }
/var/run/log/vmkernel.log the below entries are seen,[YYYY-MM-DDTHH:MM:SS] In(182) vmkernel: cpu46:20###31 opID=75####6c)NFS: 366: NFS mount succeeded for server.example.com:/directory/TASK####### volume TASK#######.[YYYY-MM-DDTHH:MM:SS] In(182) vmkernel: cpu88:20###20)UserDump: 3157: hostd-worker: Dumping cartel 209###89 (from world 20###20) to file /var/core/hostd-zdump.000
There is a race condition between multiple threads attempting to remove the NFS server IP from the ESXi firewall list.
This race occurs when an ESXi host attempts concurrent mounts over NFS using a Fully Qualified Domain Name (FQDN) that resolves to the same unreachable IP address. Because the IPs are unreachable, the mount fails, and the threads that attempted the mounts simultaneously try to remove the NFS server IPs from the firewall. This contention leads to an unhandled exception and causes the hostd service to crash.
Workaround