Multiple VMs not accessible over iscsi network.
search cancel

Multiple VMs not accessible over iscsi network.

book

Article ID: 395895

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms : 

  • Multiple VMs detect inaccessible in vcenter.

Validation Step:

  • In the var/run/log/hostd.log file, similar entries are seen:
    YYYY-MM-DDTHH:MM.SSSZ info hostd[2102794] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-############/VMNAME/VMNAME.vmx] Setting heartbeat to red; Heartbeat (in 184s): expected=184 (yellow<=80%, red<=40%), actual=0 (0%)
    YYYY-MM-DDTHH:MM.SSSZ verbose hostd[2102794] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-############/VMNAME/VMNAME.vmx] Updating current heartbeatStatus: green -> red
    YYYY-MM-DDTHH:MM.SSSZ verbose hostd[2102794] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-############/VMNAME/VMNAME.vmx] Updating the tools properties cache
    YYYY-MM-DDTHH:MM.SSSZ warning hostd[2102794] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-############/VMNAME/VMNAME.vmx] Error deserializing the tools properties: N5Vmomi5Fault15InvalidArgument9ExceptionE(Fault cause: vmodl.fault.InvalidArgument
    YYYY-MM-DDTHH:MM.SSSZ info hostd[2101582] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-############/VMNAME/VMNAME.vmx] Setting heartbeat to red; Heartbeat (in 30s): expected=30 (yellow<=80%, red<=40%), actual=0 (0%)
    YYYY-MM-DDTHH:MM.SSSZ info hostd[2102794] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-############/VMNAME/VMNAME.vmx] Setting heartbeat to red; Heartbeat (in 30s): expected=30 (yellow<=80%, red<=40%), actual=0 (0%)

Environment

VMware vSphere ESXi 6.x
VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Cause

  • NOP-Out timeout errors recorded.
  • Datastore entered an "All Paths Down" (APD) state, which caused the VM to lose connectivity to its storage.

Cause validation:

  • var/run/log/syslog.log file indicates NOP-Out timeout errors.
    YYYY-MM-DDTHH:MM.SSSZ iscsid: connection 9:0 (iqn.1992-08.com.netapp:sn.################################:vs.6 if=default addr=##.###.##.##:#### (TPGT:1027 ISID:0x2)  (T6 C1)) Nop-out timeout after 10 sec in state (3).
    YYYY-MM-DDTHH:MM.SSSZ iscsid: connection 8:0 (iqn.1992-08.com.netapp:sn.################################:vs.6 if=default addr=##.###.##.##:#### (TPGT:1028 ISID:0x1)  (T6 C0)) has recovered (2 attempts)
    YYYY-MM-DDTHH:MM.SSSZ iscsid: connection 9:0 (iqn.1992-08.com.netapp:sn.################################:vs.6 if=default addr=##.###.##.##:#### (TPGT:1027 ISID:0x2)  (T6 C1)) Nop-out timeout after 10 sec in state (3).
    YYYY-MM-DDTHH:MM.SSSZ iscsid: connection 9:0 (iqn.1992-08.com.netapp:sn.################################:vs.6 if=default addr=##.###.##.##:#### (TPGT:1027 ISID:0x2)  (T6 C1)) has recovered (2 attempts)
    YYYY-MM-DDTHH:MM.SSSZ iscsid: connection 9:0 (iqn.1992-08.com.netapp:sn.################################:vs.6 if=default addr=##.###.##.##:#### (TPGT:1027 ISID:0x2)  (T6 C1)) Nop-out timeout after 10 sec in state (3).
    YYYY-MM-DDTHH:MM.SSSZ iscsid: connection 9:0 (iqn.1992-08.com.netapp:sn.################################:vs.6 if=default addr=##.###.##.##:#### (TPGT:1027 ISID:0x2)  (T6 C1)) has recovered (2 attempts)
    YYYY-MM-DDTHH:MM.SSSZ iscsid: connection 8:0 (iqn.1992-08.com.netapp:sn.################################:vs.6 if=default addr=##.###.##.##:#### (TPGT:1028 ISID:0x1)  (T6 C0)) Nop-out timeout after 10 sec in state (3).
    YYYY-MM-DDTHH:MM.SSSZ iscsid: connection 8:0 (iqn.1992-08.com.netapp:sn.################################:vs.6 if=default addr=##.###.##.##:#### (TPGT:1028 ISID:0x1)  (T6 C0)) has recovered (2 attempts)

  • var/run/log/vmkernel.log file shows intermittent iSCSI connection drops.
    YYYY-MM-DDTHH:MM.SSSZ cpu11:2098168)WARNING: iscsi_vmk: iscsivmk_StopConnection:738: vmhba64:CH:1 T:6 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
    YYYY-MM-DDTHH:MM.SSSZ cpu19:2098168)WARNING: iscsi_vmk: iscsivmk_StopConnection:738: vmhba64:CH:0 T:6 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
    YYYY-MM-DDTHH:MM.SSSZ cpu0:2098168)WARNING: iscsi_vmk: iscsivmk_StartConnection:919: vmhba64:CH:0 T:6 CN:0: iSCSI connection is being marked "ONLINE"
    YYYY-MM-DDTHH:MM.SSSZ cpu18:2098168)WARNING: iscsi_vmk: iscsivmk_StartConnection:919: vmhba64:CH:1 T:6 CN:0: iSCSI connection is being marked "ONLINE"
    YYYY-MM-DDTHH:MM.SSSZ cpu0:2098168)WARNING: iscsi_vmk: iscsivmk_StopConnection:738: vmhba64:CH:1 T:6 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
    YYYY-MM-DDTHH:MM.SSSZ cpu1:2098168)WARNING: iscsi_vmk: iscsivmk_StopConnection:738: vmhba64:CH:0 T:6 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
    YYYY-MM-DDTHH:MM.SSSZ cpu1:2098168)WARNING: iscsi_vmk: iscsivmk_StartConnection:919: vmhba64:CH:0 T:6 CN:0: iSCSI connection is being marked "ONLINE"
    YYYY-MM-DDTHH:MM.SSSZ cpu0:2098168)WARNING: iscsi_vmk: iscsivmk_StartConnection:919: vmhba64:CH:1 T:6 CN:0: iSCSI connection is being marked "ONLINE"

  • var/run/log/vmkernel.log file also confirms that the datastore entered an All Paths Down (APD) state, followed by the error code H:0x1.
    YYYY-MM-DDTHH:MM.SSSZ cpu14:2097313)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate:1937: Could not select path for device "naa.################################".
    YYYY-MM-DDTHH:MM.SSSZ cpu14:2097313)LVM: 6021: Received APD EventType: APD_START (3) for device <naa.################################:1> (gen 80)
    YYYY-MM-DDTHH:MM.SSSZ cpu14:2097313)LVM: 5609: Handling APD EventType: APD_START (3) for device <naa.################################:1> (unlocked, gen 80, cur apd state UNKNOWN, cur dev state 1)
    YYYY-MM-DDTHH:MM.SSSZ cpu14:2097313)HBX: 6735: APD EventType: APD_START (3) for vol 'VolumeName'
    YYYY-MM-DDTHH:MM.SSSZ cpu14:2097313)ScsiDevice: 7031: Device state of naa.################################set to APD_START; token num:1
    YYYY-MM-DDTHH:MM.SSSZ cpu14:2097313)StorageApdHandler: 1191: APD start for 0x43079aab5170 [naa.################################]
    YYYY-MM-DDTHH:MM.SSSZ cpu22:2097458)StorageApdHandler: 408: APD start event for 0x43079aab5170 [naa.################################]
    YYYY-MM-DDTHH:MM.SSSZ cpu22:2097458)StorageApdHandlerEv: 110: Device or filesystem with identifier [naa.################################] has entered the All Paths Down state.
    YYYY-MM-DDTHH:MM.SSSZ cpu8:2097838)NMP: nmp_ThrottleLogForDevice:3813: last error status from device naa.################################ repeated 3 times
    YYYY-MM-DDTHH:MM.SSSZ cpu8:2097838)NMP: nmp_ThrottleLogForDevice:3861: Cmd 0xa3 (0x45b95beedf00, 0) to dev "naa.################################" on path "vmhba64:C0:T6:L6" Failed:
    YYYY-MM-DDTHH:MM.SSSZ cpu8:2097838)NMP: nmp_ThrottleLogForDevice:3869: H:0x1 D:0x0 P:0x0 . Act:NONE. cmdId.initiator=0x45390501bb18 CmdSN 0x0

    SCSi code H:0x1 : This status is returned if the connection is lost to the LUN. This can occur if the LUN is no longer visible to the host from the array side or if the physical connection to the array has been removed.

  • Power-on reset errors are recorded in var/run/log/vobd.log file.
    2025-04-18T19:26:40.321Z: [scsiCorrelator] 20558845452339us: [vob.scsi.scsipath.por] Power-on Reset occurred on naa.################################
    2025-04-18T19:26:40.321Z: [scsiCorrelator] 20558889295254us: [vob.scsi.scsipath.por] Power-on Reset occurred on naa.################################

  • var/run/log/vmkernel.log |grep -i "All Paths Down" file confirms that multiple datastores entered into All paths down state.
    YYYY-MM-DDTHH:MM.SSSZ cpu22:2097458)StorageApdHandlerEv: 110: Device or filesystem with identifier [naa.##############################01] has entered the All Paths Down state.
    YYYY-MM-DDTHH:MM.SSSZ cpu22:2097458)StorageApdHandlerEv: 110: Device or filesystem with identifier [naa.##############################02] has entered the All Paths Down state.
    YYYY-MM-DDTHH:MM.SSSZ cpu22:2097458)StorageApdHandlerEv: 110: Device or filesystem with identifier [naa.##############################03] has entered the All Paths Down state.
    YYYY-MM-DDTHH:MM.SSSZ cpu22:2097458)StorageApdHandlerEv: 110: Device or filesystem with identifier [naa.##############################04] has entered the All Paths Down state.
    YYYY-MM-DDTHH:MM.SSSZ cpu22:2097458)StorageApdHandlerEv: 110: Device or filesystem with identifier [naa.##############################05] has entered the All Paths Down state.

Resolution

Involve storage and internal Network vendors to investigate and resolve the Nop-out timeouts and all paths down situation.

Additional Information

NOP-Out timeouts : A NOP-Out timeout occurs if NOP-Outs do not receive a response from the ISCSI target for a period of 10 seconds

NOP-Out is used by the initiator as kind of "ping request" at the iSCSI layer to verify that a connection/sessions are still active and all its components are operational and active. 

This operation (NOP OUT) is performed by the initiator to the target.  

All Paths Down (APD) state : All-Paths-Down (APD) situation occurs when all paths to a device are down. As there is no indication of whether this is a permanent or temporary device loss, the ESXi host keeps reattempting to establish connectivity. APD-style situations commonly occur when the LUN is incorrectly un-presented from the ESXi host.

The timeout period begins when the storage device becomes unavailable to the ESXi host and enters the APD state. By default, the APD timeout is set to 140 seconds. While the timeout lasts, the host continues its attempts to reestablish connectivity with the device. When the timeout ends and the device does not recover, the host stops its attempts to retry any I/O that is not coming from virtual machines.

Power on reset :
 "Power on Reset" messages are caused by activity on the array side which is outside of the ESX host's scope.