PSOD after updating vSAN host to 8.0 U3g
search cancel

PSOD after updating vSAN host to 8.0 U3g

book

Article ID: 420008

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms. 

  • After upgrading an ESXi host from 8.0 U3b to 8.0 U3g, the host experiences a Purple Screen of Death (PSOD).

  • Rolling back to the previous ESXi version and rebooting results in the same behavior.

  • The ESXi host boots successfully only after disabling the vSAN module

 

Validation Steps  :

PSOD crash report indicates failure traces in the LSOM (Log-Structured Object Manager) layer, suggesting a failure related to vSAN devices. 

#0  LSOMLsnTable_Next (table=table@entry=0x450406382368, entry=entry@entry=0x0) at bora/modules/vmkernel/lsom/lsom_lsn_table.c:545
545     bora/modules/vmkernel/lsom/lsom_lsn_table.c: No such file or directory.
[Current thread is 1 (LWP 2099687)]
(gdb) bt
#0  LSOMLsnTable_Next (table=table@entry=0x450406382368, entry=entry@entry=0x0) at bora/modules/vmkernel/lsom/lsom_lsn_table.c:545
#1  0x000042000d611312 in LSOM_GetNextCommit (entry=0x0, component=0x#####) at bora/modules/vmkernel/lsom/lsom_int.h:1804
#2  LSOMProcessGCP (task=task@entry=0x45ebbc31f100) at bora/modules/vmkernel/lsom/lsom_recovery.c:2063
#3  0x000042000d613cf8 in LSOMCompleteRecoveryDispatch (sm=<optimized out>) at bora/modules/vmkernel/lsom/lsom_recovery.c:3003
#4  0x000042000d5b04f5 in LSOMReadSNSetSeal (task=<optimized out>) at bora/modules/vmkernel/lsom/lsom.c:1385
#5  LSOM_SrvDispatch (operation=0x45ebbc31f100, flags=<optimized out>) at bora/modules/vmkernel/lsom/lsom.c:3728
#6  0x000042000d1bca47 in VSANServerExecuteOperation (currentTimeTC=0x##### operation=0x45ebbc31f100, serverHandle=0x#####0) at bora/modules/vmkernel/vsanutil/vsan_server.c:3516
#7  VSANServerDispatchOperations (currentTimeTC=0x45####1c0, serverHandle=<optimized out>) at bora/modules/vmkernel/vsanutil/vsan_server.c:3643
#8  VSANServerMainLoop (serverHandleOpaque=serverHandleOpaque@entry=0x450401018040) at bora/modules/vmkernel/vsanutil/vsan_server.c:3755
#9  0x000042000ad9f7f9 in vmkWorldFunc (data=<optimized out>) at bora/vmkernel/main/vmkapi_world.####
#10 0x000042000b2d67b3 in CpuSched_StartWorld (destWorld=<optimized out>, previous=<optimized out>) at bora/vmkernel/sched/cpusched.c:15324
#11 0x000042000ad44cf0 in ?? () at bora/vmkernel/main/debug.c:4125
#12 0x0000000000000000 in ?? ()

Environment

VMware vSAN 8.x(OSA) 

Cause

This is known issue with vSAN 8.0.3 where operations related to unmap gets impacted. 

 

Resolution

This is known issue and Broadcom engineering is working on this issue.  This will be fixed in 9.0 U1 or 9.0U2. 

Workaround

1. Reboot ESXi host with vSAN Modules Disabled

  •  Reboot the ESXi host.
  •  During the pre-boot splash screen, press SHIFT + O to modify boot options.
  •   At the end of the boot line, add a space and append

        jumpstart.disable=vsan,lsom,plog,virsto,cmmds
  •    Press Enter to continue booting.

2.  Validate Cluster Health

  • Ensure all vSAN objects and resync operations are healthy and stable on other cluster hosts.
  • Do not continue if objects are still resyncing or displaying errors

3. Remove Disk Groups

     Once the host is online, place it into Maintenance Mode.

     Remove all vSAN disk groups associated with the affected host (Make a note of disk group details)

    Related KB  : Remove Disk group

4. Clear Disk Partitions (If Disk Group Removal Triggers PSOD)

  •  Navigate to:Host > Storage Devices--> Select the cache disk → click Erase Partitions
  • Repeat the process for all capacity disks

5. Recreate Disk Groups

  • Create a disk group 

         Create Disk group using cli : 2150567

         Create disk group using UI : Create a Disk Group on a vSAN Host

6. Reboot the ESXi host