Software iSCSI logins fail to complete in time causing datastore unavailability on EqualLogic arrays
search cancel

Software iSCSI logins fail to complete in time causing datastore unavailability on EqualLogic arrays

book

Article ID: 323124

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • Software iSCSI based EqualLogic array cannot failover
  • The EqualLogic array event log reports failed logins due to an initiator disconnect during the iSCSI login


Environment

VMware ESX 4.1.x
VMware vSphere ESXi 5.1
VMware ESXi 4.1.x Embedded
VMware ESXi 4.1.x Installable
VMware vSphere ESXi 5.0
VMware vSphere ESXi 5.5

Cause

This issue occurs due to several possible reasons. These are:
  • EqualLogic iSCSI load balancing:
    The EqualLogic array uses a load balancing algorithm to spread out iSCSI sessions between the initiator and the array iSCSI Ethernet ports. Prior to moving or creating a new iSCSI session to the Ethernet port, the array uses an ICMP ping to verify that the initiator's network port can connect to that Ethernet port.

    When multiple VMkernel ports on the same subnet, the VMkernel TCP/IP stack uses the first VMkernel port as the route to the local subnet. For example, if four VMkernel ports are configured on the same subnet, as would be the case when configuring iSCSI for Round Robin MPIO, and a ping is sent to the first, second, third, or fourth VMkernel port, the reply to that ping is sent from the first VMkernel port.

    The configuration requirements for iSCSI multipathing stipulate that each VMkernel port uses a single physical NIC as an uplink. As a result, if the physical uplink for the first VMkernel port goes down, no failover is possible at the network level. Any pings to the other three VMkernel ports fail, even though they are still up and reachable.

    The failed ping replies cause an additional delay to the iSCSI login processing, and can potentially delay the iSCSI login beyond the default VMware iSCSI login timeouts of 5 seconds for vSphere 5.0 and 15 seconds for vSphere 4.1. This causes the ESXi/ESX host to mark the datastore as unavailable.
     
    Note: This behavior is changed in ESXi 5.1 and later hosts. For more information, see Change to ICMP ping response behavior in ESXi 5.1 (2042189).
  • Delayed ACK timing issue:
    Delayed ACK is by default turned-on in the ESXi kernel. However, the EqualLogic array expects no delayed ACK handling in its congestion handling algorithm. This leads to inefficient recovery from lost packets and possible subsequent timeouts.
     
  • Insufficient iSCSI timeout in ESXi kernel:
    When an iSCSI connection times out due to congestion in the network or the array, the session that has issue is terminated and the iSCSI stack immediately tries to login. This can impact the storage array and cause more congestion in the network and in the command queue of the storage array.

Resolution

These methods address each of the issues described in the previous section:

Create a highly available VMkernel port

Dell recommends creating the highly available VMkernel port as described in the Dell EqualLogic technical report TR1075, Configuring iSCSI Connectivity with VMware vSphere 6 and Dell PS Series Storage

If you are using the Dell EqualLogic multipathing extension module, see Dell EqualLogic technical report TR1074, Configuring and Installing the PS Series Multipathing Extension Module for VMware vSphere and PS Series

Note: The preceding links were correct as of October 11, 2020. If you find the links are broken, provide feedback and a VMware employee will update the link.

These technical reports provide a detailed explanation on how to create the highly available VMkernel port, which ensures that ping replies can be transmitted to the array.
 
For more information on the above articles, contact the storage vendor, Dell.

Disable Delayed ACKs

There are reports of performance improvements on EqualLogic arrays by disabling the delayed ACK parameter. For more information, see ESX/ESXi hosts might experience read or write performance issues with certain storage arrays (1002598).

To disable delayed ACKs from the vSphere Client:
  1. Click the ESXi/ESX host you want to modify.
  2. Navigate to the Configuration tab and click Storage Adapters.
  3. Click the iSCSI VMHBA to be modified and click Properties.
  4. Modify the delayed ACK setting using the option that best matches your needs using one of these processes:
     
    • To modify the delayed ACK setting on a discovery address (recommended):
      1. Click the Dynamic Discovery tab.
      2. Click the Server Address tab.
      3. Click Settings > Advanced.
    • To modify the delayed ACK setting on a specific target:
      1. Click the Static Discovery tab and select the target.
      2. Click Settings > Advanced.
    • To modify the delayed ACK setting globally:
      1. Click the General tab.
      2. Click Advanced.
  5. In the Advanced Settings dialog box, scroll to the delayed ACK setting.
  6. Deselect Inherit from parent.
  7. Deselect DelayedAck.
  8. Reboot the host.
  9. Repeat for all hosts that have access to the iSCSI storage.

Adjust the iSCSI login timeout on ESXi 5.0

In ESXi 5.x, the iSCSI login timeout is currently set 5 seconds. This means that after 5 seconds the ESXi host stops the iSCSI session if there is no response, and tries to log in again immediately after. This places additional load on the Storage Array, and can result in a "login storm".

The ability to change this setting from the vSphere Client has been added with VMware ESXi 5.0 Patch Release ESXi500-201112001 (2007680). To help alleviate this problem, extend the login timeout to 15 seconds, or 30 seconds if necessary.

To change the login timeout from the vSphere Client:
  1. Go to Storage Adapters > iSCSI Software Adapter > Properties.
  2. Click Advanced and scroll down to LoginTimeout.
  3. Change the value from 5 seconds to a larger value, such as 15 or 30 seconds.
To adjust the iSCSI login timeout in ESXi 5.0 from the command line, run this command:
esxcli iscsi adapter param set -A adapter_name -k LoginTimeout -v value_in_sec
For example:
esxcli iscsi adapter param set -A vmhba33 -k LoginTimeout -v 60
Notes:
  • This option is grayed out if you are not running ESXi 5.0 patch 2 (build 515841).
  • You must reboot the host to reflect the correct timeout settings.
  • As per the VMware ESXi 4.1 Update 3 Release Notes, it is now possible to set the timeout value settings for iSCSI initiator login.


Additional Information