ESX/ESXi hosts might experience read or write performance issues with certain iSCSI storage arrays
search cancel

ESX/ESXi hosts might experience read or write performance issues with certain iSCSI storage arrays

book

Article ID: 313543

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Some iSCSI storage arrays from different array vendors do not behave appropriately during periods of network congestion. This problem is related to the TCP/IP implementation of these arrays and can severely impact the read performance of storage attached to the ESXi/ESX software through the iSCSI initiator. The problem has also been reported in native (non-virtualized) environments involving the Microsoft iSCSI initiator.

Background Concepts

To understand this problem, you should be familiar with several TCP concepts:
  • Delayed ACK
  • Slow start
  • Congestion avoidance

Delayed ACK

A central precept of the TCP network protocol is that data sent through TCP be acknowledged by the recipient. According to RFC 813, "Very simply, when data arrives at the recipient, the protocol requires that it send back an acknowledgement of this data. The protocol specifies that the bytes of data are sequentially numbered, so that the recipient can acknowledge data by naming the highest numbered byte of data it has received, which also acknowledges the previous bytes.". The TCP packet that carries the acknowledgement is known as an ACK.

A host receiving a stream of TCP data segments can increase efficiency in both the network and the hosts by sending less than one ACK acknowledgment segment per data segment received. This is known as a delayed ACK. The common practice is to send an ACK for every other full-sized data segment and not to delay the ACK for a segment by more than a specified threshold. This threshold varies between 100ms and 500ms. ESXi/ESX uses delayed ACK because of its benefits, as do most other servers.

Slow start and Congestion avoidance

Congestion can occur when there is a mismatch of data processing capabilities between two elements of the network leading from the source to the destination. Congestion manifests itself as a delay, timeout, or packet loss. To avoid and recover from congestion, TCP uses two algorithms—the congestion avoidance algorithm and the slow start algorithm. Although the underlying mechanisms of these two algorithms differ, the underlying concept is that when congestion occurs, the TCP sender must slow down its transmission rate and then increase the rate as retransmitted data segments are acknowledged.

When congestion occurs, the typical recovery sequence for TCP/IP networks that use delayed ACK and slow start is:
  1. The sender detects congestion because it did not receive an ACK within the retransmission timeout period.
  2. The sender retransmits the first data segment and waits for the ACK before sequencing the remaining segments for retransmission.
  3. The receiver receives the retransmitted data segment and starts the delayed ACK timer.
  4. The receiver transmits the ACK when the delayed ACK timer times out. During this waiting period, there are no other transmissions between the sender and receiver.
  5. Having sent the ACK, the sender then retransmits the next two data segments back to back.
  6. The receiver, upon receiving the second data segments, promptly transmits an ACK.
  7. The sender, upon receiving the ACK, retransmits the next four data segments back to back.
  8. This sequence continues until the congestion period passes and the network returns to a normal traffic rate.
In this recovery sequence, the longest lag time comes from the initial delayed ACK timer in step 3.
 

Problem Description

The affected iSCSI arrays in question take a slightly different approach to handling congestion. Instead of implementing either the slow start algorithm or congestion avoidance algorithm, or both, these arrays take the very conservative approach of retransmitting only one lost data segment at a time and waiting for the host's ACK before retransmitting the next one. This process continues until all lost data segments have been recovered.

Coupled with the delayed ACK implemented on the ESXi/ESX host, this approach slows read performance to a halt in a congested network. Consequently, frequent timeouts are reported in the kernel log on hosts that use this type of array. Most notably, the VMFS heartbeat experiences a large volume of timeouts because VMFS uses a short timeout value. This configuration also experiences excessively large maximum read-response times (on the order of several tens of seconds) reported by the guest. This problem is exacerbated when reading data in large block sizes. In this case, the higher bandwidth contributes to network congestion, and each I/O is comprised of many more data segments, requiring longer recovery times.

In the vmkernel.log file, you may see entries similar to:

-09-17T15:07:19Z iscsid: discovery_sendtargets::Running discovery on IFACE iscsi_vmk@vmk2(iscsi_vmk) (drec.transport=iscsi_vmk)
-09-17T15:07:19Z iscsid: cannot make connection to 10.0.68.2:3260 (111)
-09-17T15:07:19Z iscsid: connection to discovery address 10.0.68.2 failed
-09-17T15:07:19Z iscsid: connection login retries (reopen_max) 5 exceeded
-09-17T15:07:19Z iscsid: Login Target Skipped: iqn.2003-10.com.temp:hvtiscsi:1808:vmwds01 if=iscsi_vmk@vmk1 addr=10.0.68.2:3260 (TPGT:1 ISID:0x2) (Already Running)
-09-17T15:07:19Z iscsid: Login Target Skipped: iqn.2003-10.com.temp:hvtiscsi:1808:vmwds01 if=iscsi_vmk@vmk2 addr=10.0.68.2:3260 (TPGT:1 ISID:0x3) (Already Running)




Environment

VMware vSphere ESXi 5.0
VMware ESXi 3.5.x Embedded
VMware vSphere ESXi 6.0
VMware vSphere ESXi 6.5
VMware ESXi 4.0.x Installable
VMware vSphere ESXi 6.7
VMware vSphere ESXi 5.5
VMware vSphere ESXi 5.1
VMware vSphere ESXi 7.0.x
VMware ESX 4.1.x
VMware vSphere ESXi 7.0.0
VMware ESXi 4.1.x Embedded
VMware ESXi 4.1.x Installable
VMware ESX Server 3.5.x
VMware ESXi 3.5.x Installable
VMware ESX 4.0.x
VMware ESXi 4.0.x Embedded

Resolution

Until the iSCSI array vendors address this issue on their end, consider designing your IP storage network with enough capacity to account for the peak usage and lower the risk of congestion. If you experience the problem described in this article and are unable to alter your network configuration or find ways to guarantee a congestion-free environment, you can experiment with the following workaround.

This workaround involves disabling delayed ACK on your ESX/ESXi host through a configuration option.

Note: Please contact your storage vendor to confirm if disabling Delayed ACK is a workaround that they recommend for their storage array.
 

Differences in dealing with delayed ACK between ESX/ESXi 3.5 and 4.x/5.x/6.0

The way you disable delayed ACK varies slightly between ESX/ESXi 3.5 and 4.x/5.x/6.0.The main difference between ESX/ESXi 3.5 and 4.x/5.x/6.0 is in how the delayed ACK setting works and how it is set:
  • In ESX/ESXi 3.5, the delayed ACK setting is global to the ESX/ESXi system.

    By setting this option, you completely disable delayed ACK regardless of whether TCP is attempting congestion recovery and operating normally. This option affects the entire TCP/IP stack and, therefore, all TCP/IP applications are impacted if you implement this change. As a consequence, setting this option might degrade the performance of iSCSI and other applications—for example, NFS and vMotion—as well as contribute to the congestion. Nonetheless, the option allows a speedy recovery from congestion and a reasonable performance during other transactions with the affected iSCSI arrays.
     
  • In ESX/ESXi 4.x/5.x/6.0, the recommended method for configuring the delayed ACK setting is on a per discovery target. However, you can configure the delayed ACK setting on a per individual target. Furthermore, though not recommended, you can configure the delayed ACK setting globally for ALL iSCSI targets.


Configuring Delayed ACK in ESXi.

To implement this workaround in ESXi, use the vSphere Client to disable delayed ACK.

Disabling Delayed ACK in ESXi 6.5, ESXi 6.7

  1. Log in to the Web Client and select the host.
  2. Right click the host and click Maintenance Mode.
  3. Wait for the enter maintenance mode task to complete.
  4. Navigate to the Configure tab.
  5. Click Storage Adapters.
  6. Click the iSCSI vmhba that you want to modify.
  7. Modify the delayed ACK setting on a discovery address :
    1. Click Targets tab on Adapter Details.
    2. Click Dynamic Discovery.
    3. Click Advanced Settings.
    4. In the Delayed ACK Advanced Parameter options, uncheck inherit from parent and DelayedACK.
    5. Click OK
  8. Reboot the host.
 

Disabling Delayed ACK in ESX/ESXi 4.x, ESXi 5.x and ESXi 6.0

  1. Log in to the vSphere Client and select the host.
  2. Right click the host and click Maintenance Mode.
  3. Wait for the enter maintenance mode task to complete.
  4. Navigate to the Configuration tab.
  5. Click Storage Adapters.
  6. Click the iSCSI vmhba that you want to modify.
  7. Click Properties.
  8. Modify the delayed ACK setting, using the option that best matches your site's needs:
     
    • Modify the delayed ACK setting on a discovery address (recommended):
       
      1. On a discovery address, click the Dynamic Discovery tab.
      2. Click the Server Address tab.
      3. Click Settings > Advanced.
      4. In the Delayed ACK Advanced Parameter options, uncheck inherit from parent and DelayedACK.
      5. Click OK.
         
    • Modify the delayed ACK setting on a specific set of targets:
       
      1. On a discovery address, click the Static Discovery tab.
      2. Select all targets that reside on the array you are working with.
      3. Click Remove to delete the selected entries.
      4. Click the Dynamic Discovery tab.
      5. Remove all entries that reside on the array you are working with.
      6. Enter the discovery address but do not rescan the adapter yet.
      7. Select the discovery address to modify and click Settings > Advanced.
      8. In the Delayed ACK Advanced Parameter options, uncheck inherit from parent and DelayedACK.
      9. Click OK.
      10. Repeat steps f-i for each discovery address that needs to be modified.
         
    • Modify the delayed ACK setting globally:
       
      1. Select the General tab.
      2. Click Advanced.
      3. In the Delayed ACK Advanced Parameter options, uncheck inherit from parent and DelayedACK.
      4. Click OK.
         
  9. Reboot the host.

Re-enabling Delayed ACK in ESX/ESXi 4.x, ESXi 5.x and ESXi 6.0

  1. Log in to the vSphere Client and select the host.
  2. Navigate to the Advanced Settings page, as described in the preceding task Disabling Delayed ACK in ESX/ESXi 4.x, ESXi 5.x and ESXi 6.0.
  3. Click Inherit From parent > DelayedAck.
  4. Reboot the host.

Checking the Current Setting of Delayed ACK in ESX/ESXi 4.x, ESXi 5.x and ESXi 6.0

  1. Log in to the vSphere Client and select the host.
  2. Navigate to the Advanced Settings page, as described in the preceding task Disabling Delayed ACK in ESX/ESXi 4.x, ESXi 5.x and ESXi 6.0.
  3. Observe the setting for DelayedAck.

    If the DelayedAck setting is checked, this option is enabled. If you perform this check after you change the delayed ACK setting, but before you reboot the host, the result shows the new setting rather than the setting currently in effect.

    Notes:
    • To disable delayed_ack, run this command from the command line:

      vmkiscsi-tool -W -a delayed_ack=0 -j vmhbaXX

      To enable delayed_ack, run this command:

      vmkiscsi-tool -W -a delayed_ack=1 -j vmhbaXX
       
    • To check this parameter, run this command:

      vmkiscsi-tool -W vmhbaXX
       

Configuring Delayed ACK in ESX/ESXi 3.5

To implement this workaround in ESX/ESXi 3.5, use the VI Client to alter the Net.TcpipDelayedAck advanced parameter setting. By default, this option is set to 1, which enables delayed ACK. To disable delayed ACK, perform the following steps.

Disabling Delayed ACK in ESX/ESXi 3.5

  1. Log in to the VI Client and select the host.
  2. Click the Configuration tab and click Advanced Settings.
  3. Click Net and scroll through the advanced parameter list until you locate the Net.TcpipDelayedAck parameter.
  4. Set the value for the parameter to 0.
  5. Click OK.
  6. Reboot the host.

Re-enabling Delayed ACK in ESX/ESXi 3.5

  1. Log in to the VI Client and select the host.
  2. Click the Configuration tab and choose Advanced Settings.
  3. Choose Net and scroll through the advanced parameter list until you locate the Net.TcpipDelayedAck parameter.
  4. Set the value for the parameter to 1.
  5. Click OK.
  6. Reboot the host.

Checking the Current Setting of Delayed ACK In ESX/ESXi 3.5

  1. Log in to the VI Client and select the host.
  2. Click the Configuration tab and choose Advanced Settings.
  3. Choose Net and scroll through the advanced parameter list until you locate the Net.TcpipDelayedAck parameter.
  4. Check the setting. A setting of 1 means that the delayed ACK is enabled and a setting of 0 means that it is disabled.
  5. Click OK.

    Note: A reboot is required for delayed ACK setting changes to take effect. If the host has not been rebooted after making a change, the result of this check shows the new setting rather than the setting currently in effect.