This article is intended to inform about this issue, provide a workaround, and inform of the version of ESXi where this issue is resolved.
Symptoms:
In versions of vSAN 6.7 prior to 6.7 P04 (17167734) and 7.0 prior to 7.0 U1C/P02 (17325551), a latency spike is seen on vSAN objects when a vsanmgmtd process issued storage rescan is performed. This can also occur when collecting ESXi support bundles, as well as during log collection, when a storage scan is performed to collect storage information.
This latency spike may reach multiple seconds of latency and persist for several minutes. This potentially impacts guest VM operations. The latency spike is random and may not occur on a regular basis, or may occur frequently with each vSAN rescan.
This issue can be exacerbated following a rescan of HBAs on a host. After an HBA rescan, vSAN will also rescan the disk group. In situations where the rescan is issued against the entire cluster, latency can reach multiple seconds and intermittently persist for hours.
Example of a vSAN rescan thread running in vmkernel.log (your dates, times, and disk details will differ):
2021-04-07T17:08:53.220Z cpu70:2101085)PLOG: PLOGInitAndAnnounceMD:8473: Successfully announced VSAN MD (naa.58ce38ee203eb815:2) with UUID: 526e60e3-f6d0-9865-8a45-e257879581af. kt 1, en 0, enC 0.
2021-04-07T17:08:54.734Z cpu36:2101085)PLOG: PLOGOpenDevice:4464: Disk handle open failure for device naa.58ce38ee203eb7e9:2, status:Busy
2021-04-07T17:08:54.734Z cpu36:2101085)PLOG: PLOGInitAndAnnounceMD:8473: Successfully announced VSAN MD (naa.58ce38ee203eb7e9:2) with UUID: 52878efb-2331-9abb-9f61-2b88050ac353. kt 1, en 0, enC 0.
2021-04-07T17:08:55.648Z cpu32:71487898)VC: 4616: Device rescan time 23353 msec (total number of devices 5)
2021-04-07T17:08:55.648Z cpu32:71487898)VC: 4619: Filesystem probe time 14 msec (devices probed 3 of 5)
2021-04-07T17:08:55.648Z cpu32:71487898)VC: 4621: Refresh open volume time 26644 msec