High IO latency reported for vms residing on the vSAN datastore.
In some cases high vSAN latency is observed after upgrading VxRail version
vSAN debug controller and vmnics used for vSAN traffic are confirmed to be compatible. Driver and firmware versions for network and storage components are verified to be up to date.
Determining Network/Storage firmware and driver version in ESXi
The vSAN Skyline Health reports no critical errors or warnings.
There is no resync of vSAN VMs however there is high latency observed on vSAN VMs.
Refer: Monitor the Resynchronization Tasks in the vSAN Cluster
vSAN 7.X
vSAN 8.X
Analysis of the vSAN traces reveals the presence of “Wait for RDT” events, which are indicative of ongoing network-related issues within the vSAN environment. Additionally, TCP/IP errors and a high volume of network retransmissions are observed on the vSAN network, pointing to a transient network condition. If there is network congestion or intermittent connectivity between vSAN nodes, the replication of data can be delayed. This can cause the system to wait for the data transfer to complete.
Example :
/var/log/vsantraces/vsantracesUrgent log on ESXi host:
2025-03-10T00:16:45.439638 [3612038] [cpu22] [c826378b OWNER readWithBlkAttr5 VMDISK] DOMTraceOpTookTooLong:10304: {'op': 0x45bad8d63c80, 'objUuid': 'xxxxxxx-xxxxxx-xxxx-xxxx-xxxxxxxxxxxx', 'offset-39': 138322706432, 'length-25': 65536, 'totalTimeMS': 10014, 'timeInThisPhaseMS': 10014, 'opPhase': 'Wait for RDT'}
Engage your networking team to investigate and resolve the cause of the TCP/IP errors and network retransmissions observed on the vSAN network.