STORAGE -- LSI -- HBA330 -- Frequent PowerOn Reset Unit Attentions are occurring on device
search cancel

STORAGE -- LSI -- HBA330 -- Frequent PowerOn Reset Unit Attentions are occurring on device

book

Article ID: 397960

calendar_today

Updated On:

Products

VMware vSAN VMware vSphere ESX 7.x VMware vSphere ESX 8.x

Issue/Introduction

Host Logs are flooding with errors in relation to many drives connected to the same Storage Controller (= vmhba),
while that Controller (incl. its Driver and Firmware) is certified to run with the installed vSphere Build.
The issue suddenly came back. Was running without issues in the past.
No hardware errors/issues seen in the KVM/out of band server management
 
 
Examples:
 
Storage Controller initiated: Power-On resets for more than one Drive connected to the same Storage Controller:
YYYY-MM-DDTHH:MM:SS.ZZ Wa(180) vmkwarning: cpu74:2098409)WARNING: HPP: HppScsiThrottleLogForDevice:593: Error status H:0xc D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0. hppAction = 1
YYYY-MM-DDTHH:MM:SS.ZZ In(14) vobd[2098052]:  [scsiCorrelator] 5994440993us: [vob.scsi.scsipath.por] Power-on Reset occurred on #################
YYYY-MM-DDTHH:MM:SS.ZZ Wa(180) vmkwarning: cpu70:2098419)WARNING: HPP: HppScsiThrottleLogForDevice:593: Error status H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x2. hppAction = 1
 
 
Storage Controller initiated: Abort(s):
YYYY-MM-DDTHH:MM:SS.ZZ Wa(180) vmkwarning: cpu85:2098403)WARNING: HPP: HppScsiThrottleLogForDevice:593: Error status H:0xc D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0. hppAction = 1
YYYY-MM-DDTHH:MM:SS.ZZ In(182) vmkernel: cpu3:2183988)lsi_msgpt3: _scsih_task_mgmt:915: lsi_msgpt3_0:C0:T9:L0 handle(0x0012): TM(abort) request start
YYYY-MM-DDTHH:MM:SS.ZZ In(182) vmkernel: cpu3:2183988)lsi_msgpt3: _scsih_abort:728: lsi_msgpt3_0:C0:T9:L0, handle(0x0012), smid(98), io_in_time_ms(86675822), abort_in_time_ms(86705824), delta(30002 ms)
YYYY-MM-DDTHH:MM:SS.ZZ In(182) vmkernel: cpu105:2098405)ScsiDeviceIO: 4656: Cmd(0x45deb0d4ac80) 0x2a, cmdId.initiator=0x430b292d5000 CmdSN 0x377ebde from world 0 to dev "#################" failed H:0x5 D:0x0 P:0x0 Cancelled from driver layer. Cmd count Active:1
YYYY-MM-DDTHH:MM:SS.ZZ In(182) vmkernel: cpu3:2183988)lsi_msgpt3: _scsih_task_mgmt:954: lsi_msgpt3_0:C0:T9:L0 handle(0x0012): TM(abort) request end: status=Success
 
 
Additional SCSI Errors seen in relation to Controller Disk Handling:
YYYY-MM-DDTHH:MM:SS.ZZ Wa(180) vmkwarning: cpu58:2098393)WARNING: HPP: HppScsiThrottleLogForDevice:593: Error status H:0xc D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0. hppAction = 1
YYYY-MM-DDTHH:MM:SS.ZZ Wa(180) vmkwarning: cpu58:2098393)WARNING: HPP: HppScsiThrottleLogForDevice:585: Cmd 0x28 (0x45dec9fcc380, 0) to dev ""#################"" on path "vmhba5:C0:T1:L0" Failed:
YYYY-MM-DDTHH:MM:SS.ZZ Wa(180) vmkwarning: cpu58:2098393)WARNING: HPP: HppScsiThrottleLogForDevice:593: Error status H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x4b 0x3. hppAction = 1
YYYY-MM-DDTHH:MM:SS.ZZ Wa(180) vmkwarning: cpu58:2098393)WARNING: HPP: HppScsiThrottleLogForDevice:585: Cmd 0x28 (0x45dec9fcc380, 0) to dev ""#################"" on path "vmhba5:C0:T1:L0" Failed:
YYYY-MM-DDTHH:MM:SS.ZZ Wa(180) vmkwarning: cpu58:2098393)WARNING: HPP: HppScsiThrottleLogForDevice:593: Error status H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x4b 0x4. hppAction = 1
YYYY-MM-DDTHH:MM:SS.ZZ In(182) vmkernel: cpu58:2098393)lsi_msgpt3: _scsih_print_command:248: lsi_msgpt3_0: SCSI CDB dump
YYYY-MM-DDTHH:MM:SS.ZZ In(182) vmkernel: cpu82:2098183)lsi_msgpt3: _scsih_ublock_io_device_wait:1229: lsi_msgpt3_0: _scsih_ublock_io_device_wait - device_unblocked, handle(0x000d), sas_addr(0x5002538b01b65422)
 
 
When logging into the affected Host(s) via SSH/Putty we do see a high number of Failed Operations for many Disks connected to the same Storage Controller (= vmhba)
Run the following command to retrieve the stats:
 
esxcli storage core device stats get
 
Example:
Device: ##################
Successful Commands: 10075127
Blocks Read: 51002678
Blocks Written: 67618888
Read Operations: 4911007
Write Operations: 5161826
Reserve Operations: 1
Reservation Conflicts: 0
Failed Commands: 206625
Failed Blocks Read: 415066648
Failed Blocks Written: 16155664
Failed Read Operations: 202608
Failed Write Operations: 4013
Failed Reserve Operations: 0
 
 

Environment

ESXi 7.x
ESXi 8.x

Cause

Potential HW Issue in regards to Controller and/or Backplane based on the fact that Controller and its Driver and Firmware are certified.

Resolution

Engage HW Vendor for investigation of HW setup on ESXi Host in regards to handling/operating of Storage (e.g. Controller, Disks, Backplane, Cabling etc. )

Additional Information

 
 
In regards to SCSI Errors shown in the examples above ( Reference ): 

H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x4b 0x4
0xB        :  ABORTED COMMAND
0x4b 0x4: INITIATOR RESPONSE TIMEOUT (NAK RECEIVED)

H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x4b 0x3
0xB        :  ABORTED COMMAND
0x4b 0x3: ACK/NAK TIMEOUT

H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x2
0x6         :  UNIT ATTENTION
0x29 0x2: SCSI BUS RESET OCCURRED