Upgrading vSAN stretched cluster to versions 7.0U2 or 7.0U2 ep1 may cause multiple ESXi Hosts to PSOD
search cancel

Upgrading vSAN stretched cluster to versions 7.0U2 or 7.0U2 ep1 may cause multiple ESXi Hosts to PSOD

book

Article ID: 326684

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:
  • On upgrading vSAN Stretched Cluster to 7.0 U2 or 7.0 U2 EP 1, you may see a PSOD on one or multiple ESXi Hosts.
  • You see a backtrace similar to:
Version Details: VMware ESXi 7.0.2 build-17867351
Panic Details: Crash at 2021-07-06T10:10:17.871Z on CPU 17 running world 2098593 - VSAN_0x431b35e70440_Owner. VMK Uptime:0:22:21:24.798
Panic Message: @BlueScreen: #PF Exception 14 in world 2098593:VSAN_0x431b3 IP 0x420011e3f8fb addr 0x18
Backtrace:
  0x45391fb9be50:[0x420011e3f8fb][email protected]#0.0.0.1+0x4b stack: 0xaf44e1e870e6, 0x420010103fa7, 0x45b9b1006840, 0x420011e66452, 0x0
  0x45391fb9bec0:[0x420011df03bc][email protected]#0.0.0.1+0xbd stack: 0x45391fb9bf40, 0x420011923bfc, 0x0, 0x0, 0x431b35e705d8
  0x45391fb9bee0:[0x420011923bfb][email protected]#0.0.0.1+0x570 stack: 0x431b35e705d8, 0x8, 0x0, 0x0, 0xaf44e1e7e49a
  0x45391fb9bf90:[0x420010119158]vmkWorldFunc@vmkernel#nover+0x49 stack: 0x420010119154, 0x0, 0x45390a121140, 0x45391fba1000, 0x45390a121140
  0x45391fb9bfe0:[0x420010381ead]CpuSched_StartWorld@vmkernel#nover+0x86 stack: 0x0, 0x4200100c2c24, 0x0, 0x0, 0x0
  0x45391fb9c000:[0x4200100c2c23]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0, 0x0, 0x0, 0x0, 0x0



Environment

VMware vSAN 7.0.x

Cause

In rare scenarios, it is possible for multiple ESXi Hosts to crash while performing an on-disk format version (DFC) upgrade on a stretched cluster. This PSOD is most likely to occur if the witness ESXi Host was replaced during the upgrade process and DFC was run shortly after. This could trigger a race between DFC and witness replacement workflow.

This can result in executing an unexpected code path and causing a null pointer exception leading to the PSOD.

Resolution

Upgrade to release vSAN 7.0 P03 / U2c or later vSAN releases (instead of 7.0U2 or 7.0U2 EP1).

Workaround:
  1. Reboot the impacted ESXi Hosts.
  2. Under Cluster>Configure>vSAN>Disk Management , you will have a warning message ( All 14 disks on version 14.0 but with older vSAN objects.) with an upgrade option available on UI
  3. Click on upgrade, DFC upgrade will be completed and all object versions are upgraded to v14.