This article is written to inform of this issue, the workaround, and the resolution.
Impact/Risks:
This problem may result in VMs crashing on the host with the PSOD and being unable to restart. Restore from backup may be required.
Symptoms:
This problem is specific to vSAN ESA in 8.0.
One or more VMs may crash and resync for objects may be stuck and unable to progress following the PSOD. VMs associated may be unable to recover and must be restored from backup.
PSOD example below. Details such as time, UUIDs, worlds, or system names will vary.
Panic Details: Crash at 2023-04-03T19:40:53.279Z on CPU 88 running world 2099105 - VSAN_0x4329bf0b3d40_Owner. VMK Uptime:6:19:58:40.098
Panic Message: @BlueScreen: ########-####-####-####-########046c: Failed to wait for object exit.
Backtrace:
0x453a2f59b920:[0x420020710c59]PanicvPanicInt@vmkernel#nover+0x1f9 stack: 0x420020766998, 0x420020710c59, 0x0, 0x420000000001, 0x420020710c59
0x453a2f59b9d0:[0x420020711544]Panic_vPanic@vmkernel#nover+0x25 stack: 0x0, 0x420020728e29, 0x3, 0x420000000010, 0x453a2f59ba50
0x453a2f59b9f0:[0x420020728e28]vmk_PanicWithModuleID@vmkernel#nover+0x41 stack: 0x453a2f59ba50, 0x453a2f59ba10, 0x0, 0x0, 0xc4
0x453a2f59ba50:[0x42002269cd24][email protected]#0.0.0.1+0x785 stack: 0xfa, 0x7f, 0x42, 0x90, 0xe8
0x453a2f59beb0:[0x4200226838f7][email protected]#0.0.0.1+0x10 stack: 0x45dad7700f00, 0x420021fec25c, 0x0, 0x4329bf0b3e00, 0x0
0x453a2f59bed0:[0x420021fec25b][email protected]#0.0.0.1+0x330 stack: 0x0, 0x4329bf0b3ed8, 0x570a653a7f4d6, 0x8, 0x1
0x453a2f59bf90:[0x420020730baf]vmkWorldFunc@vmkernel#nover+0x40 stack: 0x420020730bab, 0x0, 0x453a12f1f100, 0x453a2f59f000, 0x453a12f1f100
0x453a2f59bfe0:[0x420020a14f9e]CpuSched_StartWorld@vmkernel#nover+0x7b stack: 0x0, 0x4200206d40d0, 0x0, 0x0, 0x0
0x453a2f59c000:[0x4200206d40cf]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0, 0x0, 0x0, 0x0, 0x0
This problem is caused by a deadlock condition with the zDOM Snapshots (unrelated to customer VM snapshots).
This problem is resolved in 8.0 Update 1 P02 (8.0 U1c) and later. Please update to 8.0 U1c (build 22088125) or later as soon as possible.
The problem may be mitigated by disabling zDOM snapshots using the following command on all hosts in the cluster, this will persist and should be reverted to 1 after upgrade to 8.0 Update 1 or after:
esxcfg-advcfg -s 0 /VSAN/zDOMSnapshotMode