After upgrading an ESXi host from 8.0 U3b to 8.0 U3g, the host experiences a Purple Screen of Death (PSOD).
Rolling back to the previous ESXi version and rebooting results in the same behavior.
The ESXi host boots successfully only after disabling the vSAN module
Validation Steps :
PSOD crash report indicates failure traces in the LSOM (Log-Structured Object Manager) layer, suggesting a failure related to vSAN devices.
#0 LSOMLsnTable_Next (table=table@entry=0x450406382368, entry=entry@entry=0x0) at bora/modules/vmkernel/lsom/lsom_lsn_table.c:545545 bora/modules/vmkernel/lsom/lsom_lsn_table.c: No such file or directory.[Current thread is 1 (LWP 2099687)](gdb) bt#0 LSOMLsnTable_Next (table=table@entry=0x450406382368, entry=entry@entry=0x0) at bora/modules/vmkernel/lsom/lsom_lsn_table.c:545#1 0x000042000d611312 in LSOM_GetNextCommit (entry=0x0, component=0x#####) at bora/modules/vmkernel/lsom/lsom_int.h:1804#2 LSOMProcessGCP (task=task@entry=0x45ebbc31f100) at bora/modules/vmkernel/lsom/lsom_recovery.c:2063#3 0x000042000d613cf8 in LSOMCompleteRecoveryDispatch (sm=<optimized out>) at bora/modules/vmkernel/lsom/lsom_recovery.c:3003#4 0x000042000d5b04f5 in LSOMReadSNSetSeal (task=<optimized out>) at bora/modules/vmkernel/lsom/lsom.c:1385#5 LSOM_SrvDispatch (operation=0x45ebbc31f100, flags=<optimized out>) at bora/modules/vmkernel/lsom/lsom.c:3728#6 0x000042000d1bca47 in VSANServerExecuteOperation (currentTimeTC=0x##### operation=0x45ebbc31f100, serverHandle=0x#####0) at bora/modules/vmkernel/vsanutil/vsan_server.c:3516#7 VSANServerDispatchOperations (currentTimeTC=0x45####1c0, serverHandle=<optimized out>) at bora/modules/vmkernel/vsanutil/vsan_server.c:3643#8 VSANServerMainLoop (serverHandleOpaque=serverHandleOpaque@entry=0x450401018040) at bora/modules/vmkernel/vsanutil/vsan_server.c:3755#9 0x000042000ad9f7f9 in vmkWorldFunc (data=<optimized out>) at bora/vmkernel/main/vmkapi_world.#####10 0x000042000b2d67b3 in CpuSched_StartWorld (destWorld=<optimized out>, previous=<optimized out>) at bora/vmkernel/sched/cpusched.c:15324#11 0x000042000ad44cf0 in ?? () at bora/vmkernel/main/debug.c:4125#12 0x0000000000000000 in ?? ()
VMware vSAN 8.x(OSA)
This is known issue with vSAN 8.0.3 where operations related to unmap gets impacted.
This is known issue and Broadcom engineering is working on this issue. This will be fixed in 9.0 U1 or 9.0U2.
Workaround :
1. Reboot ESXi host with vSAN Modules Disabled
jumpstart.disable=vsan,lsom,plog,virsto,cmmds
2. Validate Cluster Health
3. Remove Disk Groups
Once the host is online, place it into Maintenance Mode.
Remove all vSAN disk groups associated with the affected host (Make a note of disk group details)
Related KB : Remove Disk group
4. Clear Disk Partitions (If Disk Group Removal Triggers PSOD)
5. Recreate Disk Groups
Create Disk group using cli : 2150567
Create disk group using UI : Create a Disk Group on a vSAN Host
6. Reboot the ESXi host