Lost access to volume due to connectivity issues OR Path redundancy to storage device degraded
search cancel

Lost access to volume due to connectivity issues OR Path redundancy to storage device degraded

book

Article ID: 376147

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Hosts becoming unresponsive, datastores becoming inaccessible and storage performance degradation and VM migration getting stucked.
Devices observing high latency.

 

Here are some log excerpts from hosts having this issue. 

hostd.log:

 

YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100788] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 3984 : Lost access to volume  (Datastore) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100744] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4233 : Lost access to volume (Datastore) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.

YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100744] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4233 : Lost access to volume (Datastore) due to connectivity issues.
Recovery attempt is in progress and outcome will be reported shortly.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100744] [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {
-->    key = 6,
-->    chainId = 0,
-->    createdTime = "1970-01-01T00:00:00Z",
-->    userName = "",
-->    host = (vim.event.HostEventArgument) {
-->       name = "ES#i-1.vmware.local",
-->       host = 'vim.HostSystem:ha-host'
-->    },
-->    eventTypeId = "es#.problem.vmfs.heartbeat.recovered",
-->    arguments = (vmodl.KeyAnyValue) [
-->       (vmodl.KeyAnyValue) {
-->          key = "1",
-->          value = ""
-->       },
-->       (vmodl.KeyAnyValue) {
-->          key = "2",
-->          value = "Datastore"
-->       }
-->    ],
-->    objectId = "ha-host",
-->    objectType = "vim.HostSystem",
--> }
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2100744] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4234 : Successfully restored access to volume (Datastore) following connectivity issues.

YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2101035] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4209 : Path redundancy to storage device naa.############## degraded. Path vmhba0:C0:T101:L7 is down. Affected datastores: Datastore-1.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2101034] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4210 : Path redundancy to storage device naa.############## degraded. Path vmhba0:C0:T100:L7 is down. Affected datastores: Datastore-1.
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2101035] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4209 : Path redundancy to storage device naa.############## degraded. Path vmhba0:C0:T101:L7 is down. Affected datastores: .
YYYY-MM-DDTHH:MM:SS.sssZ info hostd[2101034] [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {
-->    key = 6,
-->    chainId = -1,
-->    createdTime = "1970-01-01T00:00:00Z",
-->    userName = "",
-->    host = (vim.event.HostEventArgument) {
-->       name = "ES#i-2.vmware.local",
-->       host = 'vim.HostSystem:ha-host'
-->    },
-->    ds = (vim.event.DatastoreEventArgument) {
-->       name = "Datastore-1",
-->       datastore = 'vim.Datastore:'
-->    },
-->    eventTypeId = "es#.problem.storage.redundancy.degraded",
-->    arguments = (vmodl.KeyAnyValue) [
-->       (vmodl.KeyAnyValue) {
-->          key = "1",
-->          value = "naa.############## "
-->       },
-->       (vmodl.KeyAnyValue) {
-->          key = "2",
-->          value = "vmhba0:C0:T100:L7"
-->       },
-->       (vmodl.KeyAnyValue) {
-->          key = "3",
-->          value = (vim.event.DatastoreEventArgument) {
-->             name = "Datastore-1",
-->             datastore = 'vim.Datastore:'
-->          }
-->       }
-->    ],
-->    objectId = " ",
-->    objectType = "vim.Datastore",
-->    objectName = "Datastore",


VMkernel.log:

YYYY-MM-DDTHH:MM:SS.sssZ cpu0:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T23:L1) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds $
YYYY-MM-DDTHH:MM:SS.sssZ cpu21:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T22:L1) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds$
YYYY-MM-DDTHH:MM:SS.sssZ cpu21:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T1:L12) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds $
YYYY-MM-DDTHH:MM:SS.sssZ cpu21:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T2:L12) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds $

YYYY-MM-DDTHH:MM:SS.sssZ cpu0:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T83:L31
YYYY-MM-DDTHH:MM:SS.sssZ cpu2:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T84:L31
YYYY-MM-DDTHH:MM:SS.sssZ cpu2:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T84:L32
YYYY-MM-DDTHH:MM:SS.sssZ cpu2:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T84:L32

YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0x6fd in pending state
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0x6fd ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)nfnic: <1>: INFO: fnic_abort_cmd: 3805: Abort cmd called for Tag: 0xa4  issued time: 40002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x12  sc:0x45b9423a1e48 flags: 0x3 lun: 6 target: 0x1f0300

YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0#52a in pending state
YYYY-MM-DDTHH:MM:SS.sssZ cpu35:2098248)nfnic: <1>: INFO: fnic_fcpio_icmnd_cmpl_handler: 1809: io_req: 0#45b953c07040 sc: 0#45b96c1d3888 tag: 0#52a CMD_FLAGS: 0x53 CMD_STATE: FNIC_IOREQ_ABTS_PENDING ABTS pending hdr status: FCPIO_ABORTED scsi_status: 0#0$
YYYY-MM-DDTHH:MM:SS.sssZ cpu35:2098248)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2328: fcpio hdr status: FCPIO_ITMF_REJECTED
YYYY-MM-DDTHH:MM:SS.sssZ cpu35:2098248)WARNING: nfnic: <1>: fnic_fcpio_itmf_cmpl_handler: 2360: abort reject received id: 0#52a
YYYY-MM-DDTHH:MM:SS.sssZ cpu35:2098248)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2405: io_req: 0#45b953c07040 sc: 0#45b96c1d3888 id: 0#52a CMD_FLAGS: 0x73 CMD_STATE: FNIC_IOREQ_ABTS_PENDINGhdr status: FCPIO_ITMF_REJECTED ABTS cmpl received
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0#52a ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3

YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0#52a in pending state
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0#52a ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)nfnic: <1>: INFO: fnic_abort_cmd: 3805: Abort cmd called for Tag: 0x6fd  issued time: 120002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x28  sc:0x45d96143ad88 flags: 0x3 lun: 13 target: 0x1f1900
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0x6fd in pending state
YYYY-MM-DDTHH:MM:SS.sssZ cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0x6fd ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
YYYY-MM-DDTHH:MM:SS.sssZ cpu11:2136878)nfnic: <1>: INFO: fnic_abort_cmd: 3805: Abort cmd called for Tag: 0xa4  issued time: 40002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x12  sc:0x45b9423a1e48 flags: 0x3 lun: 6 target: 0x1f0300

YYYY-MM-DDTHH:MM:SS.sssZ cpu5:2098564)WARNING: NMP: nmpCompleteRetryForPath:394: Logical device "naa.##########################": awaiting fast path state update before retrying failed command again...
YYYY-MM-DDTHH:MM:SS.sssZ cpu42:2098564)WARNING: NMP: nmp_DeviceRetryCommand:133: Device "naa.##########################": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
YYYY-MM-DDTHH:MM:SS.sssZ cpu46:2098564)WARNING: NMP: nmpCompleteRetryForPath:394: Logical device "naa.##########################": awaiting fast path state update before retrying failed command again...

vobd.log:

 

YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10232600491us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.############## 
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10232603661us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.############## 
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10232606294us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.############## 
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10232610858us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.############## 

YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10180510538us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L12
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10180593263us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L13
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10180296296us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L14
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 10180529601us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L10

YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9820348926us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T102:L7 changed state from on
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9820348928us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T103:L7 changed state from on
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9820348930us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T101:L7 changed state from on
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9820348934us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T100:L7 changed state from on

YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9880205647us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L12 changed state from dead
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9880205760us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L1 changed state from dead
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9880206243us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L8 changed state from dead
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9880206365us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L5 changed state from dead

YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9821589032us: [es#.problem.storage.redundancy.degraded] Path redundancy to storage device eui.############## degraded. Path vmhba0:C0:T102:L7 is down. Affected datastores: "Datastore".
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9821592030us: [es#.problem.storage.redundancy.degraded] Path redundancy to storage device eui.############## degraded. Path vmhba0:C0:T103:L7 is down. Affected datastores: "Datastore".
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9821595008us: [es#.problem.storage.redundancy.degraded] Path redundancy to storage device eui.############## degraded. Path vmhba0:C0:T101:L7 is down. Affected datastores: "Datastore".
YYYY-MM-DDTHH:MM:SS.sssZ: [scsiCorrelator] 9821597978us: [es#.problem.storage.redundancy.degraded] Path redundancy to storage device eui.############## degraded. Path vmhba0:C0:T100:L7 is down. Affected datastores: "Datastore".

 

 

Environment

VMware ESXi 7.x
VMware ESXi 8.x

Cause

The logging can be found if the following conditions are present in your storage infrastructure

  • Issues with the iSCSI network connectivity 
  • Issues with the fabric 
  • LUN is unpresented from the host or cluster from the array 
  • Faulty SFP module 
  • Faulty cables 

Resolution

Below is a list of possible points to investigate on the storage infrastructure

  • Check and replace any faulty SFP/SFP+ (Small Form-factor Pluggable) modules. 
  • Check and replace any faulty cables.
  • Check the fabric switch logs to see if you find any anomalies or errors. 
  • Reboot the fabric switches. Check if you find any errors when it's coming back online. 
  • Check if the SFPs are negotiating at the right speed. For example - If you find a 16 Gb SFP negotiating at 8 Gb, this must be investigated further or it indicates something wrong with that SFP. 

NOTE: HBA driver and firmware must in the Broadcom Compatibility Guide. Reach out to your server manufacturer or their website to get more information /best practice about the  driver and firmware compatible with your HBA hardware. 

Additional Information