Local datastore offline with "pqisrc_taskMgmt:1769: (abort) DONE" errors
search cancel

Local datastore offline with "pqisrc_taskMgmt:1769: (abort) DONE" errors

book

Article ID: 410051

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 8.0 VMware vSphere ESX 7.x

Issue/Introduction

  • Local VMFS datastore is increasable due to All paths down (APD)
  • Datastore is shown as unavailable in the VCSA/ESXi Storage view 

Environment

  • VMware vSphere ESXi  7.x
  • VMware vSphere ESXi  8.x

Cause

  • The local device backing the datastore is marked as APD and later enters Permanent Device Loss status ( APD timeout) due to the controller aborting IO and going offline.

 

  • Typical logging: VMkernel.log -> to /var/log/vmkernel.log

 

  • Prior to the Datastore going inaccessible the controller is reporting the below errors:

vmkernel: cpu2:2097405)smartpqi01: pqisrc_taskMgmt:1736: TMF abort Issued for CMD : 2a B:T:L 1:0:1 with tmf req tag : 0x264 cmd rcb tag 0x518
vmkwarning: cpu2:2097405)WARNING: smartpqi: pqisrc_taskMgmt:1756 :[201:0.0][B:T:L 1:0:1]:TMF abort is failed with status : -1
vmkernel: cpu2:2097405)smartpqi01: pqisrc_taskMgmt:1769: (abort)       DONE
vmkernel: cpu19:2097789)smartpqi01: pqisrc_taskMgmt:1816: TMF virt reset Issued for CMD : 8a B:T:L 1:0:1 with req tag : 0x56 cmd tag 0x743

 

  • APD timer is reached and disk is taken offline:

 vmkalert: cpu3:2097383)ALERT: smartpqi01: pqisrc_reject_if_device_unavailable:1474: Controller [201 : 0 : 0] Offline
) vmkernel: cpu15:2097381)LVM: 6279: Received APD EventType: APD_TIMEOUT (5) for device <naa.################################:1> (gen 2)
 vmkernel: cpu15:2097381)LVM: 5865: Handling APD EventType: APD_TIMEOUT (5) for device <naa.################################:1> (unlocked, gen 2, cur apd state APD_START, cur dev state 1)
 vmkernel: cpu15:2097381)HBX: 6854: APD EventType: APD_TIMEOUT (5) for vol 'Local_Datstore Name'
 vmkernel: cpu15:2097381)HBX: 6862: Aborting IOs on APD EventType: APD_TIMEOUT (5) for vol 'Local_Datstore Name'
 vmkernel: cpu15:2097381)StorageApdHandler: 606: APD timeout event for 0x430637f40f90 [naa.################################]
) vmkernel: cpu15:2097381)StorageApdHandlerEv: 120: Device or filesystem with identifier [naa.################################] has entered the All Paths Down Timeout state after being in the All Paths Down
 state for 140 seconds. I/Os will $
 vmkernel: cpu1:2097387)ScsiDeviceIO: 4633: Cmd(0x45b991818100) 0x9e, CmdSN 0x142a2 from world 0 to dev "naa.################################" failed H:0x5 D:0x0 P:0x0

 

  • Controller is then marked as offline: 

 vmkalert: cpu3:2097383)ALERT: smartpqi01: pqisrc_reject_if_device_unavailable:1474: Controller [201 : 0 : 0] Offline
 vmkwarning: cpu2:9347785)WARNING: ScsiDeviceIO: 13044: READ CAPACITY on device "naa.################################" from Plugin "HPP" failed. I/O error
 vmkalert: cpu2:9347785)ALERT: smartpqi01: pqi_IoctlGetPciInfo:0100: Controller is Offline

 

 

Resolution

  • In order to bring the Datastore back online the ESXI host needs to rebooted (preferable a cold reboot or power cycle)
  • To investigate controller hang/crash further, or if reboot does not resolve the issue, open a support request with the hardware vendor.