vSAN latency experienced during vSAN storage rescan on host
search cancel

vSAN latency experienced during vSAN storage rescan on host

book

Article ID: 315528

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article is intended to inform about this issue, provide a workaround, and inform of the version of ESXi where this issue is resolved.

Symptoms:
In versions of vSAN 6.7 prior to 6.7 P04 (17167734) and 7.0 prior to 7.0 U1C/P02 (17325551), a latency spike is seen on vSAN objects when a vsanmgmtd process issued storage rescan is performed. This can also occur when collecting ESXi support bundles, as well as during log collection, when a storage scan is performed to collect storage information.

This latency spike may reach multiple seconds of latency and persist for several minutes. This potentially impacts guest VM operations. The latency spike is random and may not occur on a regular basis, or may occur frequently with each vSAN rescan.

This issue can be exacerbated following a rescan of HBAs on a host. After an HBA rescan, vSAN will also rescan the disk group. In situations where the rescan is issued against the entire cluster, latency can reach multiple seconds and intermittently persist for hours.

Example of a vSAN rescan thread running in vmkernel.log (your dates, times, and disk details will differ):
2021-04-07T17:08:53.220Z cpu70:2101085)PLOG: PLOGInitAndAnnounceMD:8473: Successfully announced VSAN MD (naa.58ce38ee203eb815:2) with UUID: 526e60e3-f6d0-9865-8a45-e257879581af. kt 1, en 0, enC 0.
2021-04-07T17:08:54.734Z cpu36:2101085)PLOG: PLOGOpenDevice:4464: Disk handle open failure for device naa.58ce38ee203eb7e9:2, status:Busy
2021-04-07T17:08:54.734Z cpu36:2101085)PLOG: PLOGInitAndAnnounceMD:8473: Successfully announced VSAN MD (naa.58ce38ee203eb7e9:2) with UUID: 52878efb-2331-9abb-9f61-2b88050ac353. kt 1, en 0, enC 0.
2021-04-07T17:08:55.648Z cpu32:71487898)VC: 4616: Device rescan time 23353 msec (total number of devices 5)
2021-04-07T17:08:55.648Z cpu32:71487898)VC: 4619: Filesystem probe time 14 msec (devices probed 3 of 5)
2021-04-07T17:08:55.648Z cpu32:71487898)VC: 4621: Refresh open volume time 26644 msec



Environment

VMware vSAN 6.7.x
VMware vSAN 7.0.x

Cause

A vSAN process run into lock contention, which interferes with I/O until the first process is complete.

Resolution


For 6.7 builds, this issue is resolved in ESXi/vSAN 6.7 P04 (Build # 17167734) and above.
For 7.0 builds, this issue is resolved in ESXi/vSAN 7.0 U1C/P02 (Build # 17325551) and above.

Workaround:
Do not issue any manual HBA rescans, as this will trigger the problem. Patch to ESXi/vSAN 6.7 P04 (Build # 17167734) or above as soon as possible.

Additional Information

Impact/Risks:
This issue can have a severe impact on guest operations, depending on the length of time the latency spike persists and how high the latency reaches. Advise patching to vSAN build 6.7 P04 or above as soon as possible.