Hosts becoming unresponsive, datastores becoming inaccessible and storage performance degradation and VM migration getting stucked.
Devices observing high latency.
Here are some log excerpts from hosts having this issue.
hostd.log:
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100788] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 3984 : Lost access to volume (Datastore) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100744] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4233 : Lost access to volume (Datastore) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100744] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4233 : Lost access to volume (Datastore) due to connectivity issues.
Recovery attempt is in progress and outcome will be reported shortly.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100744] [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {
--> key = 6,
--> chainId = 0,
--> createdTime = "1970-01-01T00:00:00Z",
--> userName = "",
--> host = (vim.event.HostEventArgument) {
--> name = "ES#i-1.vmware.local",
--> host = 'vim.HostSystem:ha-host'
--> },
--> eventTypeId = "es#.problem.vmfs.heartbeat.recovered",
--> arguments = (vmodl.KeyAnyValue) [
--> (vmodl.KeyAnyValue) {
--> key = "1",
--> value = ""
--> },
--> (vmodl.KeyAnyValue) {
--> key = "2",
--> value = "Datastore"
--> }
--> ],
--> objectId = "ha-host",
--> objectType = "vim.HostSystem",
--> }
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100744] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4234 : Successfully restored access to volume (Datastore) following connectivity issues.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2101035] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4209 : Path redundancy to storage device naa.############## degraded. Path vmhba0:C0:T101:L7 is down. Affected datastores: Datastore-1.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2101034] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4210 : Path redundancy to storage device naa.############## degraded. Path vmhba0:C0:T100:L7 is down. Affected datastores: Datastore-1.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2101035] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4209 : Path redundancy to storage device naa.############## degraded. Path vmhba0:C0:T101:L7 is down. Affected datastores: .
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2101034] [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {
--> key = 6,
--> chainId = -1,
--> createdTime = "1970-01-01T00:00:00Z",
--> userName = "",
--> host = (vim.event.HostEventArgument) {
--> name = "ES#i-2.vmware.local",
--> host = 'vim.HostSystem:ha-host'
--> },
--> ds = (vim.event.DatastoreEventArgument) {
--> name = "Datastore-1",
--> datastore = 'vim.Datastore:'
--> },
--> eventTypeId = "es#.problem.storage.redundancy.degraded",
--> arguments = (vmodl.KeyAnyValue) [
--> (vmodl.KeyAnyValue) {
--> key = "1",
--> value = "naa.############## "
--> },
--> (vmodl.KeyAnyValue) {
--> key = "2",
--> value = "vmhba0:C0:T100:L7"
--> },
--> (vmodl.KeyAnyValue) {
--> key = "3",
--> value = (vim.event.DatastoreEventArgument) {
--> name = "Datastore-1",
--> datastore = 'vim.Datastore:'
--> }
--> }
--> ],
--> objectId = " ",
--> objectType = "vim.Datastore",
--> objectName = "Datastore",
VMkernel.log:
YYYY-MM-DDTHH:MM:SS.sssZ cpu0:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T23:L1) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds $
YYYY-MM-DDTHH:MM:SS.sssZ cpu21:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T22:L1) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds$
YYYY-MM-DDTHH:MM:SS.sssZ cpu21:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T1:L12) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds $
YYYY-MM-DDTHH:MM:SS.sssZ cpu21:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T2:L12) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds $
YYYY-MM-DDTHH:MM:SS.sssZ cpu0:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T83:L31
YYYY-MM-DDTHH:MM:SS.sssZ cpu2:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T84:L31
YYYY-MM-DDTHH:MM:SS.sssZ cpu2:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T84:L32
YYYY-MM-DDTHH:MM:SS.sssZ cpu2:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T84:L32
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0x6fd in pending state
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0x6fd ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)nfnic: <1>: INFO: fnic_abort_cmd: 3805: Abort cmd called for Tag: 0xa4 issued time: 40002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x12 sc:0x45b9423a1e48 flags: 0x3 lun: 6 target: 0x1f0300
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0#52a in pending state
YYYY-MM-DDTHH:MM:SS.sssZ cpu35:2098248)nfnic: <1>: INFO: fnic_fcpio_icmnd_cmpl_handler: 1809: io_req: 0#45b953c07040 sc: 0#45b96c1d3888 tag: 0#52a CMD_FLAGS: 0x53 CMD_STATE: FNIC_IOREQ_ABTS_PENDING ABTS pending hdr status: FCPIO_ABORTED scsi_status: 0#0$
YYYY-MM-DDTHH:MM:SS.sssZ cpu35:2098248)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2328: fcpio hdr status: FCPIO_ITMF_REJECTED
YYYY-MM-DDTHH:MM:SS.sssZ cpu35:2098248)WARNING: nfnic: <1>: fnic_fcpio_itmf_cmpl_handler: 2360: abort reject received id: 0#52a
YYYY-MM-DDTHH:MM:SS.sssZ cpu35:2098248)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2405: io_req: 0#45b953c07040 sc: 0#45b96c1d3888 id: 0#52a CMD_FLAGS: 0x73 CMD_STATE: FNIC_IOREQ_ABTS_PENDINGhdr status: FCPIO_ITMF_REJECTED ABTS cmpl received
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0#52a ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0#52a in pending state
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0#52a ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)nfnic: <1>: INFO: fnic_abort_cmd: 3805: Abort cmd called for Tag: 0x6fd issued time: 120002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x28 sc:0x45d96143ad88 flags: 0x3 lun: 13 target: 0x1f1900
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0x6fd in pending state
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0x6fd ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)nfnic: <1>: INFO: fnic_abort_cmd: 3805: Abort cmd called for Tag: 0xa4 issued time: 40002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x12 sc:0x45b9423a1e48 flags: 0x3 lun: 6 target: 0x1f0300
YYYY-MM-DDTHH:MM:SS.sssZ cpu5:2098564)WARNING: NMP: nmpCompleteRetryForPath:394: Logical device "naa.##########################": awaiting fast path state update before retrying failed command again...
YYYY-MM-DDTHH:MM:SS.sssZ cpu42:2098564)WARNING: NMP: nmp_DeviceRetryCommand:133: Device "naa.##########################": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
YYYY-MM-DDTHH:MM:SS.sssZ cpu46:2098564)WARNING: NMP: nmpCompleteRetryForPath:394: Logical device "naa.##########################": awaiting fast path state update before retrying failed command again...
vobd.log:
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10232600491us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.##############
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10232603661us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.##############
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10232606294us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.##############
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10232610858us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.##############
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10180510538us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L12
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10180593263us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L13
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10180296296us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L14
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10180529601us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L10
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9820348926us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T102:L7 changed state from on
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9820348928us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T103:L7 changed state from on
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9820348930us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T101:L7 changed state from on
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9820348934us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T100:L7 changed state from on
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9880205647us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L12 changed state from dead
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9880205760us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L1 changed state from dead
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9880206243us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L8 changed state from dead
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9880206365us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L5 changed state from dead
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9821589032us: [es#.problem.storage.redundancy.degraded] Path redundancy to storage device eui.############## degraded. Path vmhba0:C0:T102:L7 is down. Affected datastores: "Datastore".
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9821592030us: [es#.problem.storage.redundancy.degraded] Path redundancy to storage device eui.############## degraded. Path vmhba0:C0:T103:L7 is down. Affected datastores: "Datastore".
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9821595008us: [es#.problem.storage.redundancy.degraded] Path redundancy to storage device eui.############## degraded. Path vmhba0:C0:T101:L7 is down. Affected datastores: "Datastore".
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9821597978us: [es#.problem.storage.redundancy.degraded] Path redundancy to storage device eui.############## degraded. Path vmhba0:C0:T100:L7 is down. Affected datastores: "Datastore".
VMware ESXi 7.x
VMware ESXi 8.x
The logging can be found if the following conditions are present in your storage infrastructure
Below is a list of possible points to investigate on the storage infrastructure
NOTE: HBA driver and firmware must in the Broadcom Compatibility Guide. Reach out to your server manufacturer or their website to get more information /best practice about the driver and firmware compatible with your HBA hardware.