Hosts becoming unresponsive, datastores becoming inaccessible and storage performance degradation, and VM migration hanging.
VMware ESXi 7.x
VMware ESXi 8.x
This issue can arise if the following conditions are present in your storage infrastructure
In this context, typical logging may include:
/var/log/hostd.log:
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100788] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 3984 : Lost access to volume (Datastore) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100744] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4233 : Lost access to volume (Datastore) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100744] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4233 : Lost access to volume (Datastore) due to connectivity issues.
Recovery attempt is in progress and outcome will be reported shortly.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100744] [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {
--> key = 6,
--> chainId = 0,
--> createdTime = "1970-01-01T00:00:00Z",
--> userName = "",
--> host = (vim.event.HostEventArgument) {
--> name = "####-#.####.#####",
--> host = 'vim.HostSystem:ha-host'
--> },
--> eventTypeId = "esx.problem.vmfs.heartbeat.recovered",
--> arguments = (vmodl.KeyAnyValue) [
--> (vmodl.KeyAnyValue) {
--> key = "1",
--> value = ""
--> },
--> (vmodl.KeyAnyValue) {
--> key = "2",
--> value = "Datastore"
--> }
--> ],
--> objectId = "ha-host",
--> objectType = "vim.HostSystem",
--> }
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100744] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4234 : Successfully restored access to volume (Datastore) following connectivity issues.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2101035] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4209 : Path redundancy to storage device naa.############## degraded. Path vmhba0:C0:T101:L7 is down. Affected datastores: #########.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2101034] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4210 : Path redundancy to storage device naa.############## degraded. Path vmhba0:C0:T100:L7 is down. Affected datastores: #########.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2101035] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4209 : Path redundancy to storage device naa.############## degraded. Path vmhba0:C0:T101:L7 is down. Affected datastores: .
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2101034] [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {
--> key = 6,
--> chainId = -1,
--> createdTime = "1970-01-01T00:00:00Z",
--> userName = "",
--> host = (vim.event.HostEventArgument) {
--> name = "####.#####.####",
--> host = 'vim.HostSystem:ha-host'
--> },
--> ds = (vim.event.DatastoreEventArgument) {
--> name = "##########",
--> datastore = 'vim.Datastore:'
--> },
--> eventTypeId = "esx.problem.storage.redundancy.degraded",
--> arguments = (vmodl.KeyAnyValue) [
--> (vmodl.KeyAnyValue) {
--> key = "1",
--> value = "naa.############## "
--> },
--> (vmodl.KeyAnyValue) {
--> key = "2",
--> value = "vmhba0:C0:T100:L7"
--> },
--> (vmodl.KeyAnyValue) {
--> key = "3",
--> value = (vim.event.DatastoreEventArgument) {
--> name = "#########",
--> datastore = 'vim.Datastore:'
--> }
--> }
--> ],
--> objectId = " ",
--> objectType = "vim.Datastore",
--> objectName = "Datastore",/var/log/vmkernel.log:
YYYY-MM-DDTHH:MM:SS.sssZ cpu0:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T23:L1) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds $
YYYY-MM-DDTHH:MM:SS.sssZ cpu21:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T22:L1) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds$
YYYY-MM-DDTHH:MM:SS.sssZ cpu0:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T83:L31
YYYY-MM-DDTHH:MM:SS.sssZ cpu2:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T84:L31
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0x6fd in pending state
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0x6fd ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)nfnic: <1>: INFO: fnic_abort_cmd: 3805: Abort cmd called for Tag: 0xa4 issued time: 40002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x12 sc:0x45b9423a1e48 flags: 0x3 lun: 6 target: 0x1f0300
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0#52a in pending state
YYYY-MM-DDTHH:MM:SS.sssZ cpu35:2098248)nfnic: <1>: INFO: fnic_fcpio_icmnd_cmpl_handler: 1809: io_req: 0#45b953c07040 sc: 0#45b96c1d3888 tag: 0#52a CMD_FLAGS: 0x53 CMD_STATE: FNIC_IOREQ_ABTS_PENDING ABTS pending hdr status: FCPIO_ABORTED scsi_status: 0#0$
YYYY-MM-DDTHH:MM:SS.sssZ cpu35:2098248)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2328: fcpio hdr status: FCPIO_ITMF_REJECTED
YYYY-MM-DDTHH:MM:SS.sssZ cpu35:2098248)WARNING: nfnic: <1>: fnic_fcpio_itmf_cmpl_handler: 2360: abort reject received id: 0#52a
YYYY-MM-DDTHH:MM:SS.sssZ cpu35:2098248)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2405: io_req: 0#45b953c07040 sc: 0#45b96c1d3888 id: 0#52a CMD_FLAGS: 0x73 CMD_STATE: FNIC_IOREQ_ABTS_PENDINGhdr status: FCPIO_ITMF_REJECTED ABTS cmpl received
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0#52a ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0#52a in pending state
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0#52a ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)nfnic: <1>: INFO: fnic_abort_cmd: 3805: Abort cmd called for Tag: 0x6fd issued time: 120002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x28 sc:0x45d96143ad88 flags: 0x3 lun: 13 target: 0x1f1900
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0x6fd in pending state
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0x6fd ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)nfnic: <1>: INFO: fnic_abort_cmd: 3805: Abort cmd called for Tag: 0xa4 issued time: 40002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x12 sc:0x45b9423a1e48 flags: 0x3 lun: 6 target: 0x1f0300
YYYY-MM-DDTHH:MM:SS.sssZ cpu5:2098564)WARNING: NMP: nmpCompleteRetryForPath:394: Logical device "naa.##########################": awaiting fast path state update before retrying failed command again...
YYYY-MM-DDTHH:MM:SS.sssZ cpu42:2098564)WARNING: NMP: nmp_DeviceRetryCommand:133: Device "naa.##########################": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
/var/log/vobd.log:
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10232600491us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.##############
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10232603661us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.##############
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10180510538us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L12
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10180593263us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L13
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9820348926us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T102:L7 changed state from on
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9820348928us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T103:L7 changed state from on
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9880205647us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L12 changed state from dead
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9880205760us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L1 changed state from dead
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9821589032us: [esx.problem.storage.redundancy.degraded] Path redundancy to storage device eui.############## degraded. Path vmhba0:C0:T102:L7 is down. Affected datastores: "#######".
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9821592030us: [esx.problem.storage.redundancy.degraded] Path redundancy to storage device eui.############## degraded. Path vmhba0:C0:T103:L7 is down. Affected datastores: "#######".
Below is a list of possible points to investigate on the storage infrastructure
NOTE: HBA driver and firmware must in the Broadcom Compatibility Guide. Reach out to your server manufacturer or their website to get more information /best practice about the driver and firmware compatible with your HBA hardware.