Lost access to volume due to connectivity issues OR Path redundancy to storage device degraded

search cancel

Lost access to volume due to connectivity issues OR Path redundancy to storage device degraded

book

Article ID: 376147

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

These errors can lead to hosts becoming unresponsive, datastores becoming inaccessible and storage performance degradation.

Here are some log excerpts from hosts having this issue.

hostd.log:

2024-08-19T18:23:09.391Z info hostd[2100788] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 3984 : Lost access to volume (Datastore) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
2024-08-19T18:29:09.431Z info hostd[2100744] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4233 : Lost access to volume (Datastore) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.

2024-08-19T18:29:09.431Z info hostd[2100744] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4233 : Lost access to volume (Datastore) due to connectivity issues.
Recovery attempt is in progress and outcome will be reported shortly.
2024-08-19T18:29:09.431Z info hostd[2100744] [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {
--> key = 6,
--> chainId = 0,
--> createdTime = "1970-01-01T00:00:00Z",
--> userName = "",
--> host = (vim.event.HostEventArgument) {
--> name = "ESXi-1.vmware.local",
--> host = 'vim.HostSystem:ha-host'
--> },
--> eventTypeId = "esx.problem.vmfs.heartbeat.recovered",
--> arguments = (vmodl.KeyAnyValue) [
--> (vmodl.KeyAnyValue) {
--> key = "1",
--> value = ""
--> },
--> (vmodl.KeyAnyValue) {
--> key = "2",
--> value = "Datastore"
--> }
--> ],
--> objectId = "ha-host",
--> objectType = "vim.HostSystem",
--> }
2024-08-19T18:29:09.431Z info hostd[2100744] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4234 : Successfully restored access to volume (Datastore) following connectivity issues.

2024-08-19T18:23:30.769Z info hostd[2101035] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4209 : Path redundancy to storage device naa.xxxxxxxxxxxxxx degraded. Path vmhba0:C0:T101:L7 is down. Affected datastores: Datastore-1.
2024-08-19T18:23:30.772Z info hostd[2101034] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4210 : Path redundancy to storage device naa.xxxxxxxxxxxxxx degraded. Path vmhba0:C0:T100:L7 is down. Affected datastores: Datastore-1.
2024-08-19T18:23:30.769Z info hostd[2101035] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4209 : Path redundancy to storage device naa.xxxxxxxxxxxxxx degraded. Path vmhba0:C0:T101:L7 is down. Affected datastores: .
2024-08-19T18:23:30.772Z info hostd[2101034] [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {
--> key = 6,
--> chainId = -1,
--> createdTime = "1970-01-01T00:00:00Z",
--> userName = "",
--> host = (vim.event.HostEventArgument) {
--> name = "ESXi-2.vmware.local",
--> host = 'vim.HostSystem:ha-host'
--> },
--> ds = (vim.event.DatastoreEventArgument) {
--> name = "Datastore-1",
--> datastore = 'vim.Datastore:'
--> },
--> eventTypeId = "esx.problem.storage.redundancy.degraded",
--> arguments = (vmodl.KeyAnyValue) [
--> (vmodl.KeyAnyValue) {
--> key = "1",
--> value = "naa.xxxxxxxxxxxxxx "
--> },
--> (vmodl.KeyAnyValue) {
--> key = "2",
--> value = "vmhba0:C0:T100:L7"
--> },
--> (vmodl.KeyAnyValue) {
--> key = "3",
--> value = (vim.event.DatastoreEventArgument) {
--> name = "Datastore-1",
--> datastore = 'vim.Datastore:'
--> }
--> }
--> ],
--> objectId = " ",
--> objectType = "vim.Datastore",
--> objectName = "Datastore",
--> }

VMkernel.log:

2024-08-19T18:32:29.194Z cpu0:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T23:L1) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds $
2024-08-19T18:32:29.194Z cpu21:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T22:L1) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds$
2024-08-19T18:32:29.195Z cpu21:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T1:L12) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds $
2024-08-19T18:32:29.195Z cpu21:2118328)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:843: Path (vmhba1:C0:T2:L12) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x29 0x3. Waiting for 20 seconds $

2024-08-19T18:11:46.301Z cpu0:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T83:L31
2024-08-19T18:11:46.441Z cpu2:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T84:L31
2024-08-19T18:11:46.453Z cpu2:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T84:L32
2024-08-19T18:11:46.463Z cpu2:2097996)ScsiCore: 1823: Power-on Reset occurred on vmhba0:C0:T84:L32

2024-08-19T18:32:29.135Z cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0x6fd in pending state
2024-08-19T18:32:29.135Z cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0x6fd ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
2024-08-19T18:33:40.056Z cpu11:2136878)nfnic: <1>: INFO: fnic_abort_cmd: 3805: Abort cmd called for Tag: 0xa4 issued time: 40002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x12 sc:0x45b9423a1e48 flags: 0x3 lun: 6 target: 0x1f0300

2024-08-19T18:32:11.897Z cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0x52a in pending state
2024-08-19T18:32:11.897Z cpu35:2098248)nfnic: <1>: INFO: fnic_fcpio_icmnd_cmpl_handler: 1809: io_req: 0x45b953c07040 sc: 0x45b96c1d3888 tag: 0x52a CMD_FLAGS: 0x53 CMD_STATE: FNIC_IOREQ_ABTS_PENDING ABTS pending hdr status: FCPIO_ABORTED scsi_status: 0x0$
2024-08-19T18:32:11.897Z cpu35:2098248)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2328: fcpio hdr status: FCPIO_ITMF_REJECTED
2024-08-19T18:32:11.897Z cpu35:2098248)WARNING: nfnic: <1>: fnic_fcpio_itmf_cmpl_handler: 2360: abort reject received id: 0x52a
2024-08-19T18:32:11.897Z cpu35:2098248)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2405: io_req: 0x45b953c07040 sc: 0x45b96c1d3888 id: 0x52a CMD_FLAGS: 0x73 CMD_STATE: FNIC_IOREQ_ABTS_PENDINGhdr status: FCPIO_ITMF_REJECTED ABTS cmpl received
2024-08-19T18:32:11.897Z cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0x52a ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3

24-08-19T18:32:11.897Z cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0x52a in pending state
2024-08-19T18:32:11.897Z cpu11:2136878)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0x52a ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
2024-08-19T18:32:29.135Z cpu32:2097604)nfnic: <1>: INFO: fnic_abort_cmd: 3805: Abort cmd called for Tag: 0x6fd issued time: 120002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x28 sc:0x45d96143ad88 flags: 0x3 lun: 13 target: 0x1f1900
2024-08-19T18:32:29.135Z cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3819: Abort for cmd tag: 0x6fd in pending state
2024-08-19T18:32:29.135Z cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3957: Tag: 0x6fd ABTS_STATUS: 0x47, FLAGS: 0x273, STATE: 0x3
2024-08-19T18:33:40.056Z cpu11:2136878)nfnic: <1>: INFO: fnic_abort_cmd: 3805: Abort cmd called for Tag: 0xa4 issued time: 40002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x12 sc:0x45b9423a1e48 flags: 0x3 lun: 6 target: 0x1f0300

vobd.log:

2024-08-19T18:30:22.779Z: [scsiCorrelator] 10232600491us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.xxxxxxxxxxxxxx
2024-08-19T18:30:22.782Z: [scsiCorrelator] 10232603661us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.xxxxxxxxxxxxxx
2024-08-19T18:30:22.784Z: [scsiCorrelator] 10232606294us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.xxxxxxxxxxxxxx
2024-08-19T18:30:22.789Z: [scsiCorrelator] 10232610858us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.xxxxxxxxxxxxxx

2024-08-19T18:29:30.763Z: [scsiCorrelator] 10180510538us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L12
2024-08-19T18:29:30.766Z: [scsiCorrelator] 10180593263us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L13
2024-08-19T18:29:30.769Z: [scsiCorrelator] 10180296296us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L14
2024-08-19T18:29:30.772Z: [scsiCorrelator] 10180529601us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T104:L10

2024-08-19T18:23:30.760Z: [scsiCorrelator] 9820348926us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T102:L7 changed state from on
2024-08-19T18:23:30.763Z: [scsiCorrelator] 9820348928us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T103:L7 changed state from on
2024-08-19T18:23:30.766Z: [scsiCorrelator] 9820348930us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T101:L7 changed state from on
2024-08-19T18:23:30.769Z: [scsiCorrelator] 9820348934us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba0:C0:T100:L7 changed state from on

2024-08-19T18:24:30.349Z: [scsiCorrelator] 9880205647us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L12 changed state from dead
2024-08-19T18:24:30.350Z: [scsiCorrelator] 9880205760us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L1 changed state from dead
2024-08-19T18:24:30.350Z: [scsiCorrelator] 9880206243us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L8 changed state from dead
2024-08-19T18:24:30.351Z: [scsiCorrelator] 9880206365us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T113:L5 changed state from dead

2024-08-19T18:23:30.763Z: [scsiCorrelator] 9821589032us: [esx.problem.storage.redundancy.degraded] Path redundancy to storage device eui.xxxxxxxxxxxxxx degraded. Path vmhba0:C0:T102:L7 is down. Affected datastores: "Datastore".
2024-08-19T18:23:30.766Z: [scsiCorrelator] 9821592030us: [esx.problem.storage.redundancy.degraded] Path redundancy to storage device eui.xxxxxxxxxxxxxx degraded. Path vmhba0:C0:T103:L7 is down. Affected datastores: "Datastore".
2024-08-19T18:23:30.769Z: [scsiCorrelator] 9821595008us: [esx.problem.storage.redundancy.degraded] Path redundancy to storage device eui.xxxxxxxxxxxxxx degraded. Path vmhba0:C0:T101:L7 is down. Affected datastores: "Datastore".
2024-08-19T18:23:30.772Z: [scsiCorrelator] 9821597978us: [esx.problem.storage.redundancy.degraded] Path redundancy to storage device eui.xxxxxxxxxxxxxx degraded. Path vmhba0:C0:T100:L7 is down. Affected datastores: "Datastore".

Environment

VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Cause

1. Issues with the iSCSI network connectivity

2. Issues with the fabric

3. LUN is unpresented from the host or cluster from the array

4. Faulty SFP module

5. Faulty cables

Resolution

1. Check and replace any faulty SFP/SFP+ (Small Form-factor Pluggable) modules

2. Check and replace any faulty cables

3. Check the fabric switch logs to see if you find any anomalies or errors

4. Reboot the fabric switches. Check if you find any errors when it's coming back online.

5. Check if the SFPs are negotiating at the right speed. For example - If you find a 16 Gb SFP negotiating at 8 Gb, this must be investigated further or it indicates something wrong with that SFP.

NOTE: HBA driver and firmware must match the VMware Compatibility Guide. But, its not necessary to update them if they are just a version or two behind the latest version. Please reach out to your server manufacturer or their website to get more information about the latest driver and firmware compatible with your HBA hardware and also find out from the release notes if any known issues have been fixed.

If you are going to do this, then it would be ideal to update the driver & firmware on a test host or environment or a single host in your production cluster and monitor it for 24 Hrs. If this update resolves the problem, you can then proceed to update the whole cluster.

Additional Information

Understanding lost access to volume messages in ESXi

Check list before opening a case with support -

1. Is a single host losing access to the volume ?

2. Are multiple hosts losing access to the volume ?

3. Are multiple hosts losing access to the volume across multiple clusters ?

4. Are the hosts losing access to a single or multiple volumes ?

5. Are the hosts losing access to the volume belong to the same or different arrays ?

Upload 2 or more ESXi host logs to the case. A Storage Support Engineer will contact you to perform initial scoping of the issue and then share a log report. Based on what's found in the logs, you may have to do one or more of the things listed in this article or even contact your fabric or storage vendor, if needed.

Feedback

thumb_up Yes

thumb_down No