ESXi Hosts Experience PSOD Due to VMFS Kernel Sorting Routine with Excessive Elements.
search cancel

ESXi Hosts Experience PSOD Due to VMFS Kernel Sorting Routine with Excessive Elements.

book

Article ID: 409758

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Multiple ESXi hosts experienced a Purple Screen of Death (PSOD), impacting the production environment.
  • No recent configuration changes were reported.
  • BIOS upgrades were attempted on a subset of hosts, but the issue persisted.
  • Hardware vendor confirmed there were no hardware-related issues.
    PSOD backtrace

    #0  0x0000420018b7c621 in Panic_WithBacktrace (sbt=sbt@entry=0x430352205858, fmt=fmt@entry=0x42001916d328 "NMI IPI: Panic requested by another PCPU. PC %#lx, SP %#lx (Src %#x, CPU%u)") at bora/vmkernel/main/panic.c:141

    #1  0x0000420018b78562 in NMIHandleBtOrHaltRequest (source=NMI_SRC_SP_SPINCOUNT, fullFrame=0x4529401f2f40) at bora/vmkernel/main/nmi.c:732

    #2  NMIHandleIPISource (fullFrame=0x4529401f2f40) at bora/vmkernel/main/nmi.c:551

    #3  NMI_Interrupt (fullFrame=fullFrame@entry=0x4529401f2f40) at bora/vmkernel/main/nmi.c:781

    #4  0x00004200190a6405 in IDTNMIWork (fullFrame=fullFrame@entry=0x4529401f2f40) at bora/vmkernel/main/x86/idt.c:1773

    #5  0x00004200190a786d in Int2_NMI (fullFrame=0x4529401f2f40) at bora/vmkernel/main/x86/idt.c:1015

    #6  0x00004200190a10c7 in gate_entry () at bora/vmkernel/main/x86/gates64.S:175

    #7  0x000042001a0b0dfa in insertSort (elemSize=8, compar=, temporary=, nmemb=139264, vbase=0x431eae527e70) at bora/modules/vmkernel/vmfs/fs3Misc.h:1839

    #8  Res6AffMgrComputeSortIndices (mgr=mgr@entry=0x431eab8b9690, typeID=typeID@entry=FS3_ADDR_LARGE_FILE_BLOCK, forceBuild=forceBuild@entry=1 '\001') at bora/modules/vmkernel/vmfs/fs3ResAffinityVMFS6.c:7652

    #9  0x000042001a0b9f8e in Res6AffMgrSetRegionMapAsValid (mgr=0x431eab8b9690) at bora/modules/vmkernel/vmfs/fs3ResAffinityVMFS6.c:7725

    #10 Res6AffmgrBatchRegionRead (mgrArg=0x431eab8b9690) at bora/modules/vmkernel/vmfs/fs3ResAffinityVMFS6.c:7943

    #11 0x0000420018b5bb80 in HelperProcessRequest (prevIRQL=, helper=0x431eaacf2f30, queue=0x431ea76589b0) at bora/vmkernel/main/helper.c:599

    #12 HelperQueueFunc (data=0x431eaacf2f30) at bora/vmkernel/main/helper.c:671

    #13 0x00004200190dc88f in CpuSched_StartWorld (destWorld=, previous=) at bora/vmkernel/sched/cpusched.c:15324

    #14 0x0000420018b44fb0 in ?? () at bora/vmkernel/main/debug.c:4125

    #15 0x0000000000000000 in ?? ()

Environment

vSphere Esxi 7.x
vSphere Esxi 8.x

Cause

The PSOD backtrace revealed that the VMFS kernel module (insertSort) was attempting to process an unusually large dataset (nmemb=139264). Each region is 8GB. Therefore, it corresponds to 1114112 GB in size (1088 TB datastore size). Sorting such a high number of elements consumed excessive CPU cycles.

This triggered CPU stall conditions, leading to a Non-Maskable Interrupt (NMI) and subsequent host panic.


Resolution

  • Unpresent any VMFS datastores larger than 64 TB from the ESXi hosts.This prevents the kernel from running unsupported operations.
  • Review and adhere to the datastore size limits published in the VMware vSphere Configuration.
    Broadcom configmaximum