Software iSCSI LUN paths do not recover after going offline, or iSCSI reconnect randomly
search cancel

Software iSCSI LUN paths do not recover after going offline, or iSCSI reconnect randomly

book

Article ID: 311035

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • You are using the software iSCSI initiator in VMware ESXi. iSCSI LUN connectivity issues on ESX/ESXi.
  • You have multiple VMkernel portgroups in the same subnet, accessing the same iSCSI target.
  • iSCSI connections are frequently being marked as offline, but not all connections come back online again.
  • Multiple dead paths accumulate over time.
  • No actual network traffic loss is experienced.
  • iSCSI initiator reconnecting randomly after reboot.
  • Latency issue observed with high KAVG/cmd and QAVG/cmd.
  • The ESXi server's /var/log/vmkernel.log file frequently displays warnings similar to:

    vmkernel: 57:14:42:01.498 cpu5:4321)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba34:CH:0 T:6 CN:0: Failed to receive data: Connection closed by peer
    vmkernel: 57:14:42:01.498 cpu5:4321)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba34:CH:0 T:6 CN:0: Connection rx notifying failure: Failed to Receive. State=Online
    vmkernel: 57:14:42:01.498 cpu5:4321)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba34:CH:0 T:6 CN:0: Processing CLEANUP event
    vmkernel: 57:14:42:01.748 cpu4:4321)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba34:CH:0 T:6 CN:0: iSCSI connection is being marked "OFFLINE"
    [...]
    vmkernel: 57:14:42:07.835 cpu1:4321)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba34:CH:0 T:6 CN:0: iSCSI connection is being marked "ONLINE"


Environment

VMware vSphere ESXi 6.x
VMware vSphere ESXi 7.0.x
VMware vSphere ESXi 8.0.x

Cause

  • The iSCSI connection is closed by the iSCSI target and the connection closed by peer refers to TCP session reset/closure that is sent from the target storage to the ESXi host.
  • A network error occurred while the client was receiving data from the server.
  • This issue occurs due to improper storage array configuration, host networking configuration, or the VMware ESXi product. The server accepts the connection, processes the request, and sends a reply to the client.
  • When the server closes the socket, the client believes that the connection has been terminated abnormally because the socket implementation sends a TCP reset segment telling the client to throw away the data and report an error.
  • Over-saturation of the SAN or SAN array, resulting in loss of communication, or storage task completion after the ESXi host has already stopped the task due to timeout (5000 ms).
  • Duplicate SAN targets IP addresses, resulting in intermittent connection loss and other anomalous behavior.
  • SAN target connection load balancing. Disable connection load balancing when using VMware ESXi software iSCSI initiators. You can utilize the Round-Robin multipathing policy to configure load balancing.
  • VMkernel networking misconfiguration:

Resolution

To resolve this issue, collect the TCP-dump during these messages and the storage OEM should identify the reason.

Additional Information