Symptoms:
YYYY-MM-DD:HH:MM:SS.178Z cpu68:2098702)WARNING: ScsiDeviceIO: 1780: Device naa.############################# performance has deteriorated. I/O latency increased from average value of 1358 microseconds to 340625 microseconds.YYYY-MM-DD:HH:MM:SS.445Z cpu109:2098707)WARNING: ScsiDeviceIO: 1780: Device naa.############################# performance has deteriorated. I/O latency increased from average value of 1367 microseconds to 98813 microseconds.YYYY-MM-DD:HH:MM:SS.552Z cpu4:2098708)WARNING: ScsiDeviceIO: 1780: Device naa.############################# performance has deteriorated. I/O latency increased from average value of 1367 microseconds to 51917 microseconds.YYYY-MM-DD:HH:MM:SS.547Z cpu71:2098702)WARNING: ScsiDeviceIO: 1780: Device naa.############################# performance has deteriorated. I/O latency increased from average value of 1363 microseconds to 569679 microseconds.
YYYY-MM-DD:HH:MM:SS.505Z In(182) vmkernel: cpu58:2097306)ScsiDeviceIO: 4670: Cmd(0x45db44f71d40) 0x42, cmdId.initiator=0x430bb5912d40 CmdSN 0x829d301 from world 2786901 to dev "naa.#############################" failed H:0x5 D:0x0 P:0x0 Cancelled from device layerYYYY-MM-DD:HH:MM:SS.505Z In(182) vmkernel: cpu71:2097366)ScsiDeviceIO: 4670: Cmd(0x45db44ebef40) 0x42, cmdId.initiator=0x430bb5912d40 CmdSN 0x829d300 from world 2786901 to dev "naa.#############################" failed H:0x5 D:0x0 P:0x0 Cancelled from device layer
YYYY-MM-DD:HH:MM:SS.225Z In(182) vmkernel: cpu95:2097365)ScsiDeviceIO: 4605: Cmd(0x45bb5dfb9d80) 0x42, cmdId.initiator=0x430bb5912d40 CmdSN 0x829d3fc from world 2841660 to dev "naa.#############################" failed H:0x8 D:0x0 P:0x0 Cancelled from device layerYYYY-MM-DD:HH:MM:SS.322Z In(182) vmkernel: cpu95:2097365)ScsiDeviceIO: 4605: Cmd(0x45db374084c0) 0x42, cmdId.initiator=0x430bb5912d40 CmdSN 0x829d4b9 from world 2790618 to dev "naa.#############################" failed H:0x8 D:0x0 P:0x0 Cancelled from device layer
YYYY-MM-DD:HH:MM:SS.971Z cpu63:6728669)0x453a0e01be70:[0x420017f6e587]MCSLockWait@vmkernel#nover+0x10f stack: 0x45ba59ba9540, 0x420017f6eb6e, 0x45ba59ba9540, 0x4200182e07ef, 0x100453a0e01bf00YYYY-MM-DD:HH:MM:SS.971Z cpu63:6728669)0x453a0e01be90:[0x420017f6eb6d]MCSLockWork@vmkernel#nover+0x2a stack: 0x100453a0e01bf00, 0x420000000000, 0x26700000000, 0x188eb43c9d3f56, 0x430a754d5be8YYYY-MM-DD:HH:MM:SS.971Z cpu63:6728669)0x453a0e01bea0:[0x4200182e07ee]PsaScsiDeviceTimeoutHandlerFn@vmkernel#nover+0x56f stack: 0x26700000000, 0x188eb43c9d3f56, 0x430a754d5be8, 0x40, 0x42004fc016c0YYYY-MM-DD:HH:MM:SS.971Z cpu63:6728669)0x453a0e01bf60:[0x42001831fcc8]PsaStorDeviceTimeoutHandlerFn@vmkernel#nover+0x59 stack: 0x0, 0x420000000cd7, 0x430a754d5b40, 0x10, 0x209a1YYYY-MM-DD:HH:MM:SS.971Z cpu63:6728669)0x453a0e01bfa0:[0x4200183c5fff]PsaStorTaskMgmtWorldFunc@vmkernel#nover+0x8c stack: 0x453a10a9f100, 0x453a0e01f100, 0x0, 0x0, 0x0YYYY-MM-DD:HH:MM:SS.971Z cpu63:6728669)0x453a0e01bfe0:[0x4200184dc88e]CpuSched_StartWorld@vmkernel#nover+0xbf stack: 0x0, 0x420017f44fb0, 0x0, 0x0, 0x0YYYY-MM-DD:HH:MM:SS.971Z cpu63:6728669)0x453a0e01c000:[0x420017f44faf]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0, 0x0, 0x0, 0x0, 0x0
PSOD backtrace is similar to:
#PF Exception 14MCSLockWait@vmkernelMCSLockWork@vmkernelPsaScsiDeviceTimeoutHandlerFn@vmkernelPsaStorDeviceTimeoutHandlerFn@vmkernelPsaStorTaskMgmtWorldFunc@vmkernelCpuSched_StartWorld@vmkernelDebug_IsInitialized@vmkernel
VMware vSphere ESXi 8.0
This issue can arise when the level of UNMAP commands (0x42) generated in the vSphere environment is higher than the storage array can handle, and as a result there is performance deterioration, UNMAP IO pending and slow processing of aborts of delayed IO on the device.
Investigate the cause of the elevated UNMAP (0x42) command rate, focusing on potential storage array overload and contributing factors such as VM UNMAP granularity, firmware, design or driver issues, and device-level anomalies.
Workaround: As a temporary measure, disable or lower Space Reclamation. Refer to the relevant VMware Knowledge Base article for instructions and important considerations before implementing this change:
How to throttle the unmap requests on Datastore ( Space Reclamation )