Frequent 'Power-on Reset occurred on' events on SATA Disks on vSAN 6.7 hosts
search cancel

Frequent 'Power-on Reset occurred on' events on SATA Disks on vSAN 6.7 hosts

book

Article ID: 303644

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

When using vSAN 6.7 SATA disks experience Power On Resets and Abort events reported in vmekrnel log. These may or may not be seen in conjunction with HB (heartbeat timeouts) against vSAN namespaces or other objects, descriptor errors, or extreme VM latency. Batches of these events begin approximately every 20 minutes. 

Example vmkernel log output from a host:

2018-10-03T21:23:52.061Z cpu43:2098193)lsi_mr3: megasas_hotplug_work:353: event code: 0x10b.
2018-10-03T21:23:52.111Z cpu43:2098193)lsi_mr3: megasas_hotplug_work:353: event code: 0x10c.
2018-10-03T21:23:52.617Z cpu11:2098404)NMP: nmp_ThrottleLogForDevice:3689: Cmd 0x28 (0x459dbc74bac0, 0) to dev "naa.55cd2e414d80a7fa" on path "vmhba0:C0:T3:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0. Act:NONE
2018-10-03T21:23:52.617Z cpu11:2098404)ScsiDeviceIO: 2994: Cmd(0x459dbc74bac0) 0x28, CmdSN 0x14278fd4 from world 0 to dev "naa.55cd2e414d80a7fa" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0.
2018-10-03T21:23:52.617Z cpu11:2098404)ScsiCore: 1714: Power-on Reset occurred on naa.55cd2e414d80a7fa
2018-10-03T21:23:53.115Z cpu11:2098404)ScsiDeviceIO: 2994: Cmd(0x459dbc710d80) 0x28, CmdSN 0x14278fd9 from world 0 to dev "naa.55cd2e414d80a7fa" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0.
2018-10-03T21:23:53.115Z cpu11:2098404)ScsiCore: 1714: Power-on Reset occurred on naa.55cd2e414d80a7fa
2018-10-03T21:23:53.124Z cpu13:2097403)lsi_mr3: MR_PopulateDrvRaidMap:334: ldCount 0
2018-10-03T21:23:53.124Z cpu13:2097403)lsi_mr3: MR_PopulateDrvRaidMap:335: Max. VD supported 256
2018-10-03T21:24:33.062Z cpu43:2098193)lsi_mr3: megasas_hotplug_work:353: event code: 0x10b.
2018-10-03T21:24:33.111Z cpu43:2098193)lsi_mr3: megasas_hotplug_work:353: event code: 0x10c.
2018-10-03T21:24:33.139Z cpu44:2203934)lsi_mr3: mfi_TaskMgmt:672: Processing taskMgmt abort for device: vmhba0:C0:T3:L0
2018-10-03T21:24:33.139Z cpu44:2203934)lsi_mr3: mfi_TaskMgmt:691: ABORT
2018-10-03T21:24:33.139Z cpu44:2203934)WARNING: lsi_mr3: mfi_TaskMgmt:700: TM not supported C0:T3:L0
2018-10-03T21:24:33.495Z cpu20:2098404)NMP: nmp_ThrottleLogForDevice:3689: Cmd 0x28 (0x459dbc752780, 0) to dev "naa.5000cca04fb7d654" on path "vmhba0:C0:T0:L0" Failed: H:0x2 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:EVAL
2018-10-03T21:24:33.495Z cpu20:2098404)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.5000cca04fb7d654" state in doubt; requested fast path state update...
2018-10-03T21:24:33.495Z cpu20:2098404)ScsiDeviceIO: 2994: Cmd(0x459dbc752780) 0x28, CmdSN 0x12b06922 from world 0 to dev "naa.5000cca04fb7d654" failed H:0x2 D:0x0 P:0x0 Invalid sense data: 0x0 0x6e 0x73.
2018-10-03T21:24:33.538Z cpu20:2098404)ScsiDeviceIO: 2994: Cmd(0x459dbc62ba00) 0x28, CmdSN 0x12b06923 from world 0 to dev "naa.5000cca04fb7d654" failed H:0x2 D:0x0 P:0x0 Invalid sense data: 0xce 0x6 0x43.
2018-10-03T21:24:33.582Z cpu20:2098404)NMP: nmp_ThrottleLogForDevice:3618: last error status from device naa.5000cca04fb7d654 repeated 10 times
2018-10-03T21:24:33.724Z cpu20:2098404)NMP: nmp_ThrottleLogForDevice:3618: last error status from device naa.5000cca04fb7d654 repeated 20 times
2018-10-03T21:24:33.865Z cpu20:2098404)ScsiDeviceIO: 2994: Cmd(0x459d2181fe80) 0x28, CmdSN 0x142790be from world 0 to dev "naa.55cd2e414d80a7fa" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0.
2018-10-03T21:24:33.865Z cpu20:2098404)ScsiCore: 1714: Power-on Reset occurred on naa.55cd2e414d80a7fa
2018-10-03T21:24:33.865Z cpu20:2098404)NMP: nmp_ThrottleLogForDevice:3635: last error status from device naa.55cd2e414d80a7fa repeated 2 times
2018-10-03T21:24:33.865Z cpu20:2098404)NMP: nmp_ThrottleLogForDevice:3689: Cmd 0x28 (0x459dbc7a0fc0, 0) to dev "naa.55cd2e414d80a7fa" on path "vmhba0:C0:T3:L0" Failed: H:0x2 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:EVAL
2018-10-03T21:24:33.865Z cpu20:2098404)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.55cd2e414d80a7fa" state in doubt; requested fast path state update...
2018-10-03T21:24:33.865Z cpu20:2098404)ScsiDeviceIO: 2994: Cmd(0x459dbc7a0fc0) 0x28, CmdSN 0x142790c2 from world 0 to dev "naa.55cd2e414d80a7fa" failed H:0x2 D:0x0 P:0x0 Invalid sense data: 0x65 0x6e 0x73.

Environment

VMware vSAN 6.7.x

Cause

These symptoms are caused by certain SATA disks inappropriately handling specific commands that were introduced in the vSAN 6.7 DDH process.

The disks do not respond to any I/O after the commands are issued by the DDH process, and the issue persists until all attempts are exhausted by the DDH process. DDH tries to gather details from disks and issues these commands every 20 minutes.

Resolution

This is fixed in ESXi 6.7 EP5 (Build # 10764712)

If you need further assistance on the issue is observed in a different versions of ESXi, and matches above symptoms, please contact Broadcom Support to investigate the issue.



Additional Information

Impact/Risks:

This issue frequently results in VMs hanging or crashing as I/O is queued and not processed by disks.