Intermittent ESXi Host Not Responding Caused by NFS Datastore Refresh

search cancel

Intermittent ESXi Host Not Responding Caused by NFS Datastore Refresh

book

Article ID: 424771

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

ESXi hosts intermittently enters Not Responding state and disconnect from vCenter Server. During the event, multiple ESXi services—including hostd, python, localcli, and esxcfg—may crash or become unresponsive. The affected hosts typically reconnect to vCenter Server automatically after some time.

Environment

vSphere 8.0

Cause

This issue occurs when NFS datastores are configured using FQDN-based NFS server entries that resolve to multiple IP addresses, some of which are unreachable.

When an ESXi host resolves the NFS server FQDN, it sequentially attempts connectivity to each resolved IP address. For unreachable IPs, the host repeatedly:

Adds the IP address to the NFS client firewall ruleset
Tests connectivity
Removes the IP address if connectivity fails

If the FQDN resolves to a large number of unreachable IP addresses, the ResolveHostname process becomes time-consuming. Multiple ESXi components may simultaneously trigger NFS datastore refresh operations, resulting in excessive firewall rule updates. This can lead to a race condition in which multiple services attempt to add or remove the same firewall rules concurrently, causing the runner-up to fail and crash.

This behavior can cause critical services such as hostd to hang or crash, leading the host to enter Not Responding state.

Resolution

This issue is addressed in a future release of ESXi 8.0.
Ensure NFS server DNS records resolve only to reachable IPs from the ESXi hosts.
Identify and remove unused or stale NFS datastores that are no longer required in the datacenter.
Validate NFS client firewall entries to confirm they correspond to active and expected NFS servers.

Feedback

thumb_up Yes

thumb_down No