ESX/ESXi hosts might experience read or write performance issues with certain iSCSI storage arrays

search cancel

ESX/ESXi hosts might experience read or write performance issues with certain iSCSI storage arrays

book

Article ID: 313543

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Some iSCSI storage arrays from different array vendors do not behave appropriately during periods of network congestion. This problem is related to the TCP/IP implementation of these arrays and can severely impact the read performance of storage attached to the ESXi/ESX software through the iSCSI initiator. The problem has also been reported in native (non-virtualized) environments involving the Microsoft iSCSI initiator.

Background Concepts

To understand this problem, you should be familiar with several TCP concepts:

Delayed ACK
Slow start
Congestion avoidance

Delayed ACK

A central precept of the TCP network protocol is that data sent through TCP be acknowledged by the recipient. According to RFC 813, "Very simply, when data arrives at the recipient, the protocol requires that it send back an acknowledgement of this data. The protocol specifies that the bytes of data are sequentially numbered, so that the recipient can acknowledge data by naming the highest numbered byte of data it has received, which also acknowledges the previous bytes.". The TCP packet that carries the acknowledgement is known as an ACK.

A host receiving a stream of TCP data segments can increase efficiency in both the network and the hosts by sending less than one ACK acknowledgment segment per data segment received. This is known as a delayed ACK. The common practice is to send an ACK for every other full-sized data segment and not to delay the ACK for a segment by more than a specified threshold. This threshold varies between 100ms and 500ms. ESXi/ESX uses delayed ACK because of its benefits, as do most other servers.

Slow start and Congestion avoidance

Congestion can occur when there is a mismatch of data processing capabilities between two elements of the network leading from the source to the destination. Congestion manifests itself as a delay, timeout, or packet loss. To avoid and recover from congestion, TCP uses two algorithms—the congestion avoidance algorithm and the slow start algorithm. Although the underlying mechanisms of these two algorithms differ, the underlying concept is that when congestion occurs, the TCP sender must slow down its transmission rate and then increase the rate as retransmitted data segments are acknowledged.

When congestion occurs, the typical recovery sequence for TCP/IP networks that use delayed ACK and slow start is:

The sender detects congestion because it did not receive an ACK within the retransmission timeout period.
The sender retransmits the first data segment and waits for the ACK before sequencing the remaining segments for retransmission.
The receiver receives the retransmitted data segment and starts the delayed ACK timer.
The receiver transmits the ACK when the delayed ACK timer times out. During this waiting period, there are no other transmissions between the sender and receiver.
Having sent the ACK, the sender then retransmits the next two data segments back to back.
The receiver, upon receiving the second data segments, promptly transmits an ACK.
The sender, upon receiving the ACK, retransmits the next four data segments back to back.
This sequence continues until the congestion period passes and the network returns to a normal traffic rate.

In this recovery sequence, the longest lag time comes from the initial delayed ACK timer in step 3.

Problem Description

The affected iSCSI arrays in question take a slightly different approach to handling congestion. Instead of implementing either the slow start algorithm or congestion avoidance algorithm, or both, these arrays take the very conservative approach of retransmitting only one lost data segment at a time and waiting for the host's ACK before retransmitting the next one. This process continues until all lost data segments have been recovered.

Coupled with the delayed ACK implemented on the ESXi/ESX host, this approach slows read performance to a halt in a congested network. Consequently, frequent timeouts are reported in the kernel log on hosts that use this type of array. Most notably, the VMFS heartbeat experiences a large volume of timeouts because VMFS uses a short timeout value. This configuration also experiences excessively large maximum read-response times (on the order of several tens of seconds) reported by the guest. This problem is exacerbated when reading data in large block sizes. In this case, the higher bandwidth contributes to network congestion, and each I/O is comprised of many more data segments, requiring longer recovery times.

In the vmkernel.log file, you may see entries similar to:

-09-17T15:07:19Z iscsid: discovery_sendtargets::Running discovery on IFACE iscsi_vmk@vmk2(iscsi_vmk) (drec.transport=iscsi_vmk)
-09-17T15:07:19Z iscsid: cannot make connection to 10.#.#.#:3260 (111)
-09-17T15:07:19Z iscsid: connection to discovery address 10.#.#.# failed
-09-17T15:07:19Z iscsid: connection login retries (reopen_max) 5 exceeded
-09-17T15:07:19Z iscsid: Login Target Skipped: iqn.2003-10.com.temp:hvtiscsi:1808:vmwds01 if=iscsi_vmk@vmk1 addr=10.#.#.#:3260 (TPGT:1 ISID:0x2) (Already Running)
-09-17T15:07:19Z iscsid: Login Target Skipped: iqn.2003-10.com.temp:hvtiscsi:1808:vmwds01 if=iscsi_vmk@vmk2 addr=10.#.#.#:3260 (TPGT:1 ISID:0x3) (Already Running)

Environment

VMware vSphere ESXi 7.x

VMware vSphere ESXi 8.x

Resolution

Until the iSCSI array vendors address this issue on their end, consider designing your IP storage network with enough capacity to account for the peak usage and lower the risk of congestion. If you experience the problem described in this article and are unable to alter your network configuration or find ways to guarantee a congestion-free environment, you can experiment with the following workaround.

This workaround requires disabling delayed ACK on each ESX/ESXi host via a configuration option, which must be done for all hosts connected via iSCSI.

NOTE: Please contact your storage vendor to find out if they recommend disabling delayed ACK to enhance performance and reduce potential issues with their storage systems.

Disabling Delayed ACK in ESXi 7.x and later

Log in to the Web Client and select the host.
Right click the host and click Maintenance Mode.
Wait for the enter maintenance mode task to complete.
Navigate to the Configure tab.
Click Storage Adapters.
Click the iSCSI vmhba that you want to modify.
Click on static discovery and remove the iscsi targets for which you want to disable delayedAck
Click on advanced options tab and set the attribute DelayedAck to false.
To Modify the delayed ACK setting on a discovery address :
1. Click Targets tab on Adapter Details.
2. Click Dynamic Discovery.
3. Click Advanced Settings.
4. In the Delayed ACK Advanced Parameter options, uncheck inherit from parent and DelayedACK.
5. Click OK
Rescan the adapter
Reboot the host.

NOTE: A reboot is required for delayed ACK setting changes to take effect.

To verify if the settings are applied, run the command

vmkiscsid --dump-db | grep Delayed

If it is disabled the value will be 0.

Additional Information

What is TCP Delayed ACK ?

A "delayed ack" in a storage array environment refers to the TCP delayed acknowledgment protocol feature, which buffers acknowledgments (ACKs) for received data in order to combine them with subsequent ACKs or data, thus improving network and host efficiency by sending fewer ACK packets. While beneficial in high-traffic scenarios, it can increase latency and worsen performance in environments with lower I/O or network congestion, as the host waits for an acknowledgment before sending the next data segment. Disabling delayed ACK is often recommended for storage environments, especially iSCSI, to reduce latency and improve performance, and this setting can typically be adjusted on the host's operating system or the virtual machine's host adapter settings.

Protocol Efficiency: Delayed ACK is a TCP/IP feature that reduces the number of acknowledgment (ACK) packets sent by a recipient. Instead of acknowledging every received data segment individually, the recipient delays sending the ACK until it has received multiple segments or for a short period (e.g., 100-500ms).

Combined ACKs: This allows the recipient to combine multiple acknowledgments into a single packet, reducing network overhead and the processing burden on the host.

Dynamic Adjustment: Modern systems often use dynamic algorithms to adjust delayed ACK behavior based on network conditions.

Why Delayed ACK Can Cause Issues with Storage Arrays ?

Increased Latency: The primary drawback of delayed ACK is increased latency, as the sender waits for an acknowledgment for a period of time before sending more data.

Performance Degradation: This delay can negatively impact storage performance, especially during periods of low I/O or when networks are congested.

Congestion Amplification: In a congested network, delayed ACKs can exacerbate issues by causing further delays and increasing the likelihood of packet retransmissions, leading to a vicious cycle of poor performance.

When to Disable Delayed ACK ?

High Latency Environments:
If you observe higher latency in your storage environment, particularly during low I/O periods, disabling delayed ACK can often improve performance.

Congested Networks:
Disabling delayed ACK is particularly recommended for highly congested networks to reduce packet loss and retransmissions, according to Pure Storage documentation.

Vendor Recommendations:
Many storage vendors recommend disabling delayed ACK to enhance performance and reduce potential issues with their storage systems.

How to Disable Delayed ACK ?

Refer to the resolution section above.

Feedback

thumb_up Yes

thumb_down No