Host goes into a "Not responding" state after creating / power off/on the VMs on vVol/VMFS datastore hosted on PURE iSCSI Storage
search cancel

Host goes into a "Not responding" state after creating / power off/on the VMs on vVol/VMFS datastore hosted on PURE iSCSI Storage

book

Article ID: 406807

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • During the initial stages, VM power-on and power-off tasks completed successfully and within expected timeframes.

  • However, as the time progressed, these tasks began to exhibit increased latency, and eventually, some operations timed out.

  • This led to a situation where ESXi hosts entered a hung state, becoming unresponsive.

  • A manual reboot of the affected hosts was required to restore normal functionality.

  • /var/run/log/vmkernel.log  logs exabits  IO aborts 

2025-07-25T18:18:47.764Z In(182) vmkernel: cpu37:5225544)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:1005: Path "vmhba64:C3:T0:L252" (UP) command 0x12 failed with status Timeout. H:0x5 D:0x0 P:0x0 .
2025-07-25T18:18:48.764Z In(182) vmkernel: cpu34:5225541)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:1005: Path "vmhba64:C3:T0:L253" (UP) command 0x12 failed with status Timeout. H:0x5 D:0x0 P:0x0 .
2025-07-25T18:18:48.774Z In(182) vmkernel: cpu36:2097624)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:1005: Path "vmhba64:C2:T0:L254" (UP) command 0x12 failed with status Timeout. H:0x5 D:0x0 P:0x0 .
2025-07-25T18:18:51.778Z In(182) vmkernel: cpu9:5225543)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:1005: Path "vmhba64:C2:T0:L250" (UP) command 0x12 failed with status Timeout. H:0x5 D:0x0 P:0x0 .
2025-07-25T18:18:52.772Z In(182) vmkernel: cpu27:5247562)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:1005: Path "vmhba64:C2:T0:L249" (UP) command 0x12 failed with status Timeout. H:0x5 D:0x0 P:0x0 .
2025-07-25T18:18:54.770Z In(182) vmkernel: cpu53:5225542)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:1005: Path "vmhba64:C2:T0:L248" (UP) command 0x12 failed with status Timeout. H:0x5 D:0x0 P:0x0 .

  • The PCAP shows cmdWindowClosed issue for iSCSI Target (cmdWindowClosed means the iSCSI target has stopped accepting new SCSI commands because its command window size is zero).

    ExpCmdSN is greater than MaxCmdSN on the initiator side.

    Initiator side:

    ExpCmdSN = 213680834
    MaxCmdSN = 213680833

    Window size = MaxCmdSN – ExpCmdSN + 1 = 0 → command window closed.

    Example from packet capture : 

    Packet comments
    Frame 12: 114 bytes on wire (912 bits), 114 bytes captured (912 bits) on interface unknown, id 0
    Ethernet II, Src: Cisco_XX:bX:fX (3X:XX:7X:XX:bX:fX), Dst: VMware_6X:5X:bX (00:XX:XX:6X:5X:bX)
    Internet Protocol Version 4, Src: 17X.XX.XX.4, Dst: 17X.XX.X.1X
    Transmission Control Protocol, Src Port: 3260, Dst Port: 58324, Seq: 1, Ack: 1, Len: 48
    iSCSI (NOP In) 
        ..10 0000 = Opcode: NOP In (0x20)
        TotalAHSLength: 0 (0x00)
        DataSegmentLength: 0 (0x00000000)
        LUN
            00.. .... = Address Mode: Simple logical unit addressing (0x00)
            ..00 0000  0000 0000 = LUN: 0x0000
        InitiatorTaskTag: 0xffffffff
        TargetTransferTag: 0x0047dd1c
        StatSN: 3283165611 (0xc3b121ab)
        ExpCmdSN: 213680834 (0x0cbc82c2)
        MaxCmdSN: 213680833 (0x0cbc82c1) 
  • Through target address there are only NOP-Out and NOP-In packets and no SCSI commands for iSCSI protocol.

ExpCmdSN: 213680834 (0x0cbc82c2)
MaxCmdSN: 213680833 (0x0cbc82c1)

11.4.10.  ExpCmdSN - Next Expected CmdSN from This Initiator

The ExpCmdSN is a sequence number that the target iSCSI returns to the initiator to acknowledge command reception.  It is used to update a local variable with the same name.  An ExpCmdSN equal to MaxCmdSN + 1 indicates that the target cannot accept new commands.

11.4.11.  MaxCmdSN - Maximum CmdSN from This Initiator

The MaxCmdSN is a sequence number that the target iSCSI returns to the initiator to indicate the maximum CmdSN the initiator can send. It is used to update a local variable with the same name.  If the MaxCmdSN is equal to ExpCmdSN - 1, this indicates to the initiator that the target cannot receive any additional commands.  When the MaxCmdSN changes at the target while the target has no pending PDUs to convey this information to the initiator, it MUST generate a NOP-In to carry the new MaxCmdSN.

Note : If the MaxCmdSN is equal to ExpCmdSN - 1, this indicates to the initiator that the target cannot receive any additional commands.

  • In array's captures ExpCmdSN & MaxCmdSN are same. 

    ExpCmdSN == MaxCmdSN on the array side.

    ExpCmdSN: 6272183 (0x005fb4b7)
    MaxCmdSN: 6272183 (0x005fb4b7) 

Environment

VMware ESXi 8.x 
VMware ESXi 7.x 

Pure Storage - iSCSI 

Cause

When storage array abort commands and succeed, there is no response to the initiator for those commands. It is in the response handlers where storage increment the MaxCmdSN which preserves its delta between ExpCmdSN and MaxCmdSN (essentially the iSCSI queue depth). Each successful abort decrements  possible queue depth on the session by 1. Storage array start at 128 and each abort squeezes that until it hits 0. 

Resolution

Engage Pure Storage Support for further investigation in this case.

Additional Information