ESXi Host Client Inaccessible with 'No Healthy Upstream'

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms

The host appears "Disconnected" or "Not Responding" in vCenter Server.
Attempting to add the ESXi host to a cluster fails with the error: "Cannot contact host"
Accessing the ESXi Host Client (UI) returns: "No healthy upstream".
ESXCLI commands fail to execute.
Frequent qedf driver aborts and SCSI command failures are logged in /var/run/log/vmkernel.log

YYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu56:###)qedf:vmhba0:qedfc_eh_abort:3061:Info: IO not found. Returning Success, cmdSN=####, worldId=0
YYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu7:###)ScsiDeviceIO: 4656: Cmd(0x######) 0x16, cmdId.initiator=0x###### CmdSN 0x#### from world 0 to dev "naa.###########" failed H:0x5 D:0x0 P:0x0 . Cmd count Active:0 Queued:0
YYYY-MM-DDTHH:MM:SSZ Wa(#) vmkwarning: cpu45:###)WARNING: HBX: 2468: Failed to initialize VMFS distributed locking on volume ########: Timeout
YYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu45:###)Vol3: 4768: Failed to get object 28 type 1 uuid ######## FD 0 gen 0 :Timeout
YYYY-MM-DDTHH:MM:SSZ Wa(#) vmkwarning: cpu45:###)WARNING: Fil3: 1638: Failed to reserve volume f532 28 1 ######## 0 0 0 0 0 0 0
YYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu45:###)Fil3: 1600: Exhausted retries trying to get object of type 2 on volume ######## at <FD c0 r0>: Timeout
YYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu45:###)Vol3: 4768: Failed to get object 28 type 2 uuid ######## FD 4 gen 1 :Busy

The log snippets from /var/run/log/hostd.log indicates a storage bottleneck characterized by severe latency during file system operations.

YYYY-MM-DDTHH:MM:SSZ Wa((#) Hostd[###]: [Originator@6876 sub=IoTracker] In thread 2099332, open("/vmfs/volumes/datastore_name") took over #### sec.
YYYY-MM-DDTHH:MM:SSZ Wa(#) Hostd[###]: [Originator@6876 sub=IoTracker] In thread 2099332, open("/vmfs/volumes/datastore_name") took over #### sec.

Below error message may also be observed in /var/run/log/vobd.log indicating PDL error.

YYYY-MM-DDTHH:MM:SSZ Wa(#) vmkwarning: cpu5:###)WARNING: NMP: nmp_PathDetermineFailure:3536: Cmd (0x16) PDL error (0x5/0x25/0x0) - path vmhba#:C#:T#:L# device naa.########### - triggering path evaluation
YYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu5:###)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x16 (0x######, 0) to dev "naa.###########" on path "vmhba#:C#:T#:L#" Failed:
YYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu5:###)NMP: nmp_ThrottleLogForDevice:3898: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0. Act:EVAL. cmdId.initiator=0x###### CmdSN 0x####
YYYY-MM-DDTHH:MM:SSZ Wa(#) vmkwarning: cpu5:###)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:235: NMP device "naa.###########" state in doubt; requested fast path state update...
YYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu5:###)ScsiDeviceIO: 4672: Cmd(0x######) 0x16, CmdSN 0x#### from world 0 to dev "naa.###########" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0

Environment

VMware ESXi 8.x

Cause

The ESXi host becomes unresponsive due to storage-side contention or Permanent Device Loss (PDL). The host maintained active threads attempting to access a non-responsive LUN, leading to hostd thread exhaustion.

Resolution

Resolve Storage-layer issue: Reach out to storage vendor to investigate and resolve underlying hardware issues.
Reconnect Host: In the vSphere Client, right-click the Cluster and select Add Host or select Connection > Connect for the existing host entry.