You are using the software iSCSI initiator in VMware ESXi. iSCSI LUN connectivity issues on ESX/ESXi.
You have multiple VMkernel portgroups in the same subnet, accessing the same iSCSI target.
iSCSI connections are frequently being marked as offline, but not all connections come back online again.
Multiple dead paths accumulate over time.
No actual network traffic loss is experienced.
iSCSI initiator reconnecting randomly after reboot.
Latency issue observed with high KAVG/cmd and QAVG/cmd.
The ESXi server's /var/log/vmkernel.log file frequently displays warnings similar to:
vmkernel: 57:14:42:01.498 cpu5:4321)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba34:CH:0 T:6 CN:0: Failed to receive data: Connection closed by peer
vmkernel: 57:14:42:01.498 cpu5:4321)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba34:CH:0 T:6 CN:0: Connection rx notifying failure: Failed to Receive. State=Online
vmkernel: 57:14:42:01.498 cpu5:4321)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba34:CH:0 T:6 CN:0: Processing CLEANUP event
vmkernel: 57:14:42:01.748 cpu4:4321)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba34:CH:0 T:6 CN:0: iSCSI connection is being marked "OFFLINE"
[...]
vmkernel: 57:14:42:07.835 cpu1:4321)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba34:CH:0 T:6 CN:0: iSCSI connection is being marked "ONLINE"
Due to number of the iSCSI flapping messages registered in the logs, the host resources get tied up. That can lead to the ESXi host along with the VMs running on it going unresponsive/hung.
The iSCSI connection is closed by the iSCSI target and the connection closed by peer refers to TCP session reset/closure that is sent from the target storage to the ESXi host.
A network error occurred while the client was receiving data from the server.
This issue occurs due to improper storage array configuration, host networking configuration, or the VMware ESXi product including the MTU size set across the environment. The server accepts the connection, processes the request, and sends a reply to the client.
When the server closes the socket, the client believes that the connection has been terminated abnormally because the socket implementation sends a TCP reset segment telling the client to throw away the data and report an error.
Over-saturation of the SAN or SAN array, resulting in loss of communication, or storage task completion after the ESXi host has already stopped the task due to timeout (5000 ms).
Duplicate SAN targets IP addresses, resulting in intermittent connection loss and other anomalous behavior.
SAN target connection load balancing. Disable connection load balancing when using VMware ESXi software iSCSI initiators. You can utilize the Round-Robin multipathing policy to configure load balancing.
Fix the VMkernel networking misconfiguration:
When using multiple VMkernel ports for software iSCSI, ensure that the number of VMkernel ports is lesser than or equal to the number of physical network interfaces.
Check MTU size across your environment and make it uniform (regular, Jumbo frames).
Ensure following Best Practices for Configuring Networking with Software iSCSI