The disk device enters All-Paths-Down (APD) after the volume was OFFLINE with LSI RAID use "megaraid_sas"
search cancel

The disk device enters All-Paths-Down (APD) after the volume was OFFLINE with LSI RAID use "megaraid_sas"

book

Article ID: 344739

calendar_today

Updated On:

Products

VMware

Issue/Introduction

  • This article discusses a All-Paths-Down (APD) situation while volume was OFFLINE in LSI RAID adapter, and provides the solution for this scenario.


Symptoms:
  • The volume shows OFFLINE in LSI RAID adapter,the ESXi disk device shows as All-Paths-Down (APD) while using the driver "megaraid_sas".
  • The host appears as "not-responding" in vCenter as the All-Paths-Down (APD) situation.
  • vmkernel.log shows entries similar to 
    2019-01-31T18:08:49.937Z cpu8:32927)WARNING: NMP: nmpCompleteRetryForPath:352: Retry cmd 0x28 (0x43b7b8340980) to dev "naa.600605b00cbc8cc020b70ac9d53e0eb1" failed on path "vmhba3:C2:T1:L0" H:0x1 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. 
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Cause

  • In some configuration, if there is LSI RAID volume used in ESXi. For example,the local datastore or RAID0 in vSAN enviroment
  • When the volume goes OFFLINE for some reason, such as disk fault etc, the HBA driver "megaraid_sas" only sends the SCSI sense code "NO_CONNECT" to ESXi. For example, the ESXi received following SCSI sense code "H:0x1 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0"The SCSI sense code "H:0x1" mean "NO_CONNECT".
As there was no Permanent Device Loss (PDL), the disk device enters All-Paths-Down (APD).

Resolution

  • It is recommend to use "lsi_mr3". Many LSI HBA/RAID adapter support both "megaraid_sas" and "lsi_mr3". The dirver "lsi_mr3" was more perfect at this scenario. The PDL SCSI sense code will detected while use the "lsi_mr3".


Workaround:
  • If some LSI RAID adapter were only support "megaraid_sas" in the HCL(Hardware Compatibility List). It's recommend to detach the APD disk device to isolated the APD deviced.
# esxcli storage core device set --state=off -d NAA_ID