ESX/ESXi hosts might experience read or write performance issues with certain iSCSI storage arrays
search cancel

ESX/ESXi hosts might experience read or write performance issues with certain iSCSI storage arrays

book

Article ID: 313543

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Some iSCSI storage arrays from different array vendors do not behave appropriately during periods of network congestion. This problem is related to the TCP/IP implementation of these arrays and can severely impact the read performance of storage attached to the ESXi/ESX software through the iSCSI initiator. The problem has also been reported in native (non-virtualized) environments involving the Microsoft iSCSI initiator.

Background Concepts

To understand this problem, you should be familiar with several TCP concepts:
  • Delayed ACK
  • Slow start
  • Congestion avoidance

Delayed ACK

A central precept of the TCP network protocol is that data sent through TCP be acknowledged by the recipient. According to RFC 813, "Very simply, when data arrives at the recipient, the protocol requires that it send back an acknowledgement of this data. The protocol specifies that the bytes of data are sequentially numbered, so that the recipient can acknowledge data by naming the highest numbered byte of data it has received, which also acknowledges the previous bytes.". The TCP packet that carries the acknowledgement is known as an ACK.

A host receiving a stream of TCP data segments can increase efficiency in both the network and the hosts by sending less than one ACK acknowledgment segment per data segment received. This is known as a delayed ACK. The common practice is to send an ACK for every other full-sized data segment and not to delay the ACK for a segment by more than a specified threshold. This threshold varies between 100ms and 500ms. ESXi/ESX uses delayed ACK because of its benefits, as do most other servers.

Slow start and Congestion avoidance

Congestion can occur when there is a mismatch of data processing capabilities between two elements of the network leading from the source to the destination. Congestion manifests itself as a delay, timeout, or packet loss. To avoid and recover from congestion, TCP uses two algorithms—the congestion avoidance algorithm and the slow start algorithm. Although the underlying mechanisms of these two algorithms differ, the underlying concept is that when congestion occurs, the TCP sender must slow down its transmission rate and then increase the rate as retransmitted data segments are acknowledged.

When congestion occurs, the typical recovery sequence for TCP/IP networks that use delayed ACK and slow start is:
  1. The sender detects congestion because it did not receive an ACK within the retransmission timeout period.
  2. The sender retransmits the first data segment and waits for the ACK before sequencing the remaining segments for retransmission.
  3. The receiver receives the retransmitted data segment and starts the delayed ACK timer.
  4. The receiver transmits the ACK when the delayed ACK timer times out. During this waiting period, there are no other transmissions between the sender and receiver.
  5. Having sent the ACK, the sender then retransmits the next two data segments back to back.
  6. The receiver, upon receiving the second data segments, promptly transmits an ACK.
  7. The sender, upon receiving the ACK, retransmits the next four data segments back to back.
  8. This sequence continues until the congestion period passes and the network returns to a normal traffic rate.
In this recovery sequence, the longest lag time comes from the initial delayed ACK timer in step 3.
 

Problem Description

The affected iSCSI arrays in question take a slightly different approach to handling congestion. Instead of implementing either the slow start algorithm or congestion avoidance algorithm, or both, these arrays take the very conservative approach of retransmitting only one lost data segment at a time and waiting for the host's ACK before retransmitting the next one. This process continues until all lost data segments have been recovered.

Coupled with the delayed ACK implemented on the ESXi/ESX host, this approach slows read performance to a halt in a congested network. Consequently, frequent timeouts are reported in the kernel log on hosts that use this type of array. Most notably, the VMFS heartbeat experiences a large volume of timeouts because VMFS uses a short timeout value. This configuration also experiences excessively large maximum read-response times (on the order of several tens of seconds) reported by the guest. This problem is exacerbated when reading data in large block sizes. In this case, the higher bandwidth contributes to network congestion, and each I/O is comprised of many more data segments, requiring longer recovery times.

In the vmkernel.log file, you may see entries similar to:

-09-17T15:07:19Z iscsid: discovery_sendtargets::Running discovery on IFACE iscsi_vmk@vmk2(iscsi_vmk) (drec.transport=iscsi_vmk)
-09-17T15:07:19Z iscsid: cannot make connection to 10.0.68.2:3260 (111)
-09-17T15:07:19Z iscsid: connection to discovery address 10.0.68.2 failed
-09-17T15:07:19Z iscsid: connection login retries (reopen_max) 5 exceeded
-09-17T15:07:19Z iscsid: Login Target Skipped: iqn.2003-10.com.temp:hvtiscsi:1808:vmwds01 if=iscsi_vmk@vmk1 addr=10.0.68.2:3260 (TPGT:1 ISID:0x2) (Already Running)
-09-17T15:07:19Z iscsid: Login Target Skipped: iqn.2003-10.com.temp:hvtiscsi:1808:vmwds01 if=iscsi_vmk@vmk2 addr=10.0.68.2:3260 (TPGT:1 ISID:0x3) (Already Running)




Environment

VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Resolution

Until the iSCSI array vendors address this issue on their end, consider designing your IP storage network with enough capacity to account for the peak usage and lower the risk of congestion. If you experience the problem described in this article and are unable to alter your network configuration or find ways to guarantee a congestion-free environment, you can experiment with the following workaround.

This workaround involves disabling delayed ACK on your ESX/ESXi host through a configuration option.

Note: Please contact your storage vendor to confirm if disabling Delayed ACK is a workaround that they recommend for their storage array.

Disabling Delayed ACK in ESXi 7.x and later

  1. Log in to the Web Client and select the host.
  2. Right click the host and click Maintenance Mode.
  3. Wait for the enter maintenance mode task to complete.
  4. Navigate to the Configure tab.
  5. Click Storage Adapters.
  6. Click the iSCSI vmhba that you want to modify.
  7. Click on static discovery and remove the iscsi targets for which you want to disable delayedAck
  8. Click on advanced options tab and set the attribute DelayedAck to false.
  9. To Modify the delayed ACK setting on a discovery address :
    1. Click Targets tab on Adapter Details.
    2. Click Dynamic Discovery.
    3. Click Advanced Settings.
    4. In the Delayed ACK Advanced Parameter options, uncheck inherit from parent and DelayedACK.
    5. Click OK
  10. Rescan the adapter
  11. Reboot the host.

Note: A reboot is required for delayed ACK setting changes to take effect. 

To verify if the settings are applied, run the command 

vmkiscsid --dump-db | grep Delayed

If it is disabled the value will be 0.