This article is intended to inform about this issue, provide a workaround, and inform of the version of ESXi where this issue is resolved.
Symptoms:
In versions of vSAN 6.7 prior to 6.7 P04 (17167734) and 7.0 prior to 7.0 U1C/P02 (17325551), a latency spike is seen on vSAN objects when a vsanmgmtd process issued storage rescan is performed. This can also occur when collecting ESXi support bundles, as well as during log collection, when a storage scan is performed to collect storage information.
- This latency spike may reach multiple seconds of latency and persist for several minutes. This potentially impacts guest VM operations. The latency spike is random and may not occur on a regular basis, or may occur frequently with each vSAN rescan.
You will see this in the vSAN trace:
YYYY-MM-DDTHH:MM:SS [########] [cpu##] [e3e06a84 COMP scrubComponent VMDISK] DOMTraceOpTookTooLong:####: {'op': 0x45a46151d000, 'objUuid': '########-####-####-####-############', 'offset-39': ##########, 'length-25': 65536, 'totalTimeMS': 2259, 'timeInThisPhaseMS': 8, 'opPhase': 'Wait for LSOM'}
YYYY-MM-DDTHH:MM:SS [########] [cpu##] [8ce96737 COMP scrubComponent VMDISK] DOMTraceOpTookTooLong:####: {'op': 0x45a46145dc00, 'objUuid': '########-####-####-####-############', 'offset-39': ###########, 'length-25': 65536, 'totalTimeMS': 2212, 'timeInThisPhaseMS': 8, 'opPhase': 'Wait for LSOM'}
BA rescan, vSAN will also rescan the disk group. In situations where the rescan is issued against the entire cluster, latency can reach multiple seconds and intermittently persist for hours.
Example of a vSAN rescan thread running in vmkernel.log (your dates, times, and disk details will differ):
YYYY-MM-DDTHH:MM:SSZ cpu70:2101085)PLOG: PLOGInitAndAnnounceMD:8473: Successfully announced VSAN MD (naa.0000000000000000:1) with UUID: ########-####-####-####-############. kt 1, en 0, enC 0.
YYYY-MM-DDTHH:MM:SSZ cpu36:2101085)PLOG: PLOGOpenDevice:4464: Disk handle open failure for device naa.58ce38ee203eb7e9:2, status:Busy
YYYY-MM-DDTHH:MM:SSZ cpu36:2101085)PLOG: PLOGInitAndAnnounceMD:8473: Successfully announced VSAN MD (naa.0000000000000000:1) with UUID: ########-####-####-####-############ kt 1, en 0, enC 0.
YYYY-MM-DDTHH:MM:SSZ cpu32:71487898)VC: 4616: Device rescan time 23353 msec (total number of devices 5)
YYYY-MM-DDTHH:MM:SSZ cpu32:71487898)VC: 4619: Filesystem probe time 14 msec (devices probed 3 of 5)
YYYY-MM-DDTHH:MM:SSZ cpu32:71487898)VC: 4621: Refresh open volume time 26644 msec