Host goes into non-responsive state and VMs on specific datastores freeze
search cancel

Host goes into non-responsive state and VMs on specific datastores freeze

book

Article ID: 382284

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Hostd is not responding
  • One or more datastores are unresponsive. 
    Logs report D:0x28 (TASK_SET_FULL) for the LUNs backing the unresponsive datastores.
  • VMs freeze on specific datastore.

Environment

  • ESXi 8.0.1
  • ESXi 8.0.2

Cause

D:0x28 (TASK_SET_FULL) -  this status is returned when the storage array fails SCSI commands from initiators due to lack of resources, namely the queue depth on the array is exhausted. 
When an array consistently returns D:0x28 over a long period of time, adaptive queuing, if configured, will repeatedly reduce the queue depth in response.
See Controlling LUN queue depth throttling in VMware ESXi  

If, in extreme circumstances, the queue depth reduces to 1, hostd service may become unresponsive and the datastore will become unresponsive, because a minimum of 2 queue slots are required.

vmkernel: cpu8:2098348)ScsiDeviceIO: 4619: Cmd(0x45bb02709380) 0x28, CmdSN 0xd6c9b0 from world 2099255 to dev "naa.xxxxx" failed H:0x0 D:0x28 P:0x0
vmkernel: cpu32:2097286)ScsiSched: 2104: Reduced the queue depth for device naa.xxxxx to 1, due to queue full/busy conditions. The queue depth could be reduced further if the condition persists.

Resolution

Upgrade to 8.0.3

In ESXi 8.0.3 the SCSISchedThrottleQDepth will not set qdepth below 2