Device cancels reported after hot-inserted into vSAN
search cancel

Device cancels reported after hot-inserted into vSAN

book

Article ID: 319919

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:
  • In the /var/log/vmware/vmkernel log file, you see entries similar to:

019-09-04T19:40:23.746Z cpu41:2097989)StorageApdHandler: 977: APD Handle  Created with lock[StorageApd-0x4305cbb3d620]
2019-09-04T19:40:23.746Z cpu41:2097989)ScsiEvents: 501: Event Subsystem: Device Events, Created!
2019-09-04T19:40:23.746Z cpu41:2097989)VMWARE SCSI Id: Id for vmhba0:C2:T2:L0
0x50 0x00 0x0f 0x0b 0x49 0x10 0x60 0x30 0x56 0x4b 0x30 0x30 0x33 0x38
2019-09-04T19:40:23.747Z cpu41:2097989)ScsiDeviceIO: 8478: QErr is correctly set to 0x0 for device naa.50000f0b49106030.
2019-09-04T19:40:23.747Z cpu41:2097989)ScsiDeviceIO: 8975: Could not detect setting of sitpua for device naa.50000f0b49106030. Error Not supported.
2019-09-04T19:40:23.751Z cpu41:2097989)PLOG: PLOG_InitDevice:278: Initialized device M naa.50000f0b49106030:1 0x43138ddf8930 quiesceTask 0x45a32fcc3940 on SSD ########-####-####-####-########47bfdeviceUUID ########-####-####-####-########0000
2019-09-04T19:40:23.751Z cpu41:2097989)PLOG: PLOGProbeDevice:6391: Probed partition naa.50000f0b49106030:1 of disk naa.50000f0b49106030 hot added
2019-09-04T19:40:23.752Z cpu41:2097989)PLOG: PLOG_InitDevice:278: Initialized device D naa.50000f0b49106030:2 0x43138ddf9c20 quiesceTask 0x45a32fcb3d40 on SSD ########-####-####-####-########47bfdeviceUUID ########-####-####-####-########0000
2019-09-04T19:40:23.752Z cpu41:2097989)LSOMCommon: LSOMVA_OpenDiskGroup:868: Initiating VA space for disk group ########-####-####-####-########47bf
2019-09-04T19:40:23.752Z cpu41:2097989)LSOMCommon: LSOMVA_OpenDiskGroup:974: VA space handle 0 to be used for disk group ########-####-####-####-########47bf

 

  • In the vSAN log files you see:

2019-09-03T16:00:59.995Z cpu0:65617)ScsiDeviceIO: 3015: Cmd(0x4395418feac0) 0x25, CmdSN 0x11e58 from world 0 to dev "naa.50000f0b49106030" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2019-09-03T16:01:12.049Z cpu0:65617)ScsiDeviceIO: 3015: Cmd(0x4395503fd800) 0x2a, CmdSN 0x47e13 from world 0 to dev "naa.50000f0b49106030" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2019-09-03T16:01:12.049Z cpu0:66033)WARNING: ScsiCore: 1806: Invalid sense buffer: error=0x0, valid=0x0
2019-09-03T16:01:12.049Z cpu0:65617)ScsiDeviceIO: 3015: Cmd(0x439d413546c0) 0x2a, CmdSN 0x47e0b from world 0 to dev "naa.50000f0b49106030" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2019-09-03T16:01:12.049Z cpu0:65617)ScsiDeviceIO: 3015: Cmd(0x439d484071c0) 0x2a, CmdSN 0x47e0d from world 0 to dev "naa.50000f0b49106030" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2019-09-03T16:01:12.049Z cpu0:65617)ScsiDeviceIO: 3015: Cmd(0x439d413386c0) 0x2a, CmdSN 0x47e11 from world 0 to dev "naa.50000f0b49106030" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2019-09-03T16:01:12.049Z cpu0:66033)WARNING: ScsiCore: 1806: Invalid sense buffer: error=0x0, valid=0x0
2019-09-03T16:01:12.049Z cpu0:65617)ScsiDeviceIO: 3015: Cmd(0x43954f7d97c0) 0x2a, CmdSN 0x47e14 from world 0 to dev "naa.50000f0b49106030" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2019-09-03T16:01:12.049Z cpu0:66033)WARNING: ScsiCore: 1806: Invalid sense buffer: error=0x0, valid=0x0
2019-09-03T16:01:12.049Z cpu0:65617)ScsiDeviceIO: 3015: Cmd(0x43954f7023c0) 0x2a, CmdSN 0x47e15 from world 0 to dev "naa.50000f0b49106030" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2019-09-03T16:01:12.049Z cpu0:66033)WARNING: ScsiCore: 1806: Invalid sense buffer: error=0x0, valid=0x0

Environment

VMware vSAN 7.0.x

Cause

When a device is registered with the ESXi storage stack, it is possible that vSAN can generate more I/Os than usual before the device registration is complete, in such cases I/O can get blocked leading to I/O cancels until the time the device registration is completed, leading to failed disks at vSAN.

The issue may be triggered after a unplanned disk remove and reinsert operation. On rare occasions, a host reboot may trigger this issue.

Resolution

To resolve this issue, reboot the host reporting the errors.