The host appears "Disconnected" or "Not Responding" in vCenter Server.
Attempting to add the ESXi host to a cluster fails with the error: "Cannot contact host"
Accessing the ESXi Host Client (UI) returns: "No healthy upstream".
ESXCLI commands fail to execute.
Frequent qedf driver aborts and SCSI command failures are logged in /var/run/log/vmkernel.log
YYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu56:###)qedf:vmhba0:qedfc_eh_abort:3061:Info: IO not found. Returning Success, cmdSN=####, worldId=0YYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu7:###)ScsiDeviceIO: 4656: Cmd(0x######) 0x16, cmdId.initiator=0x###### CmdSN 0x#### from world 0 to dev "naa.###########" failed H:0x5 D:0x0 P:0x0 . Cmd count Active:0 Queued:0YYYY-MM-DDTHH:MM:SSZ Wa(#) vmkwarning: cpu45:###)WARNING: HBX: 2468: Failed to initialize VMFS distributed locking on volume ########: TimeoutYYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu45:###)Vol3: 4768: Failed to get object 28 type 1 uuid ######## FD 0 gen 0 :TimeoutYYYY-MM-DDTHH:MM:SSZ Wa(#) vmkwarning: cpu45:###)WARNING: Fil3: 1638: Failed to reserve volume f532 28 1 ######## 0 0 0 0 0 0 0YYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu45:###)Fil3: 1600: Exhausted retries trying to get object of type 2 on volume ######## at <FD c0 r0>: TimeoutYYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu45:###)Vol3: 4768: Failed to get object 28 type 2 uuid ######## FD 4 gen 1 :Busy
The log snippets from /var/run/log/hostd.log indicates a storage bottleneck characterized by severe latency during file system operations.
YYYY-MM-DDTHH:MM:SSZ Wa((#) Hostd[###]: [Originator@6876 sub=IoTracker] In thread 2099332, open("/vmfs/volumes/datastore_name") took over #### sec.YYYY-MM-DDTHH:MM:SSZ Wa(#) Hostd[###]: [Originator@6876 sub=IoTracker] In thread 2099332, open("/vmfs/volumes/datastore_name") took over #### sec.
Below error message may also be observed in /var/run/log/vobd.log indicating PDL error.
YYYY-MM-DDTHH:MM:SSZ Wa(#) vmkwarning: cpu5:)WARNING: NMP: nmp_PathDetermineFailure:3536: Cmd (0x16) PDL error (0x5/0x25/0x0) - path vmhba#:C#:T#:L# device ###naa.########### - triggering path evaluationYYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu5:)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x16 (0x#########, 0) to dev "naa.###########" on path "vmhba#:C#:T#:L#" Failed:YYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu5:)NMP: nmp_ThrottleLogForDevice:3898: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0. Act:EVAL. cmdId.initiator=0x######### CmdSN 0x####YYYY-MM-DDTHH:MM:SSZ Wa(#) vmkwarning: cpu5:)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:235: NMP device "###naa.###########" state in doubt; requested fast path state update...YYYY-MM-DDTHH:MM:SSZ In(#) vmkernel: cpu5:)ScsiDeviceIO: 4672: Cmd(0x#########) 0x16, CmdSN 0x#### from world 0 to dev "naa.###########" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0
VMware ESXi 8.x
The ESXi host becomes unresponsive due to storage-side contention or Permanent Device Loss (PDL). The host maintained active threads attempting to access a non-responsive LUN, leading to hostd thread exhaustion.