ESXi host crashes with PSOD: NOT_IMPLEMENTED bora/vmkernel/main/world.c:2307
search cancel

ESXi host crashes with PSOD: NOT_IMPLEMENTED bora/vmkernel/main/world.c:2307

book

Article ID: 316522

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • ESXi host may experience a Purple Screen of Death (PSOD) specifically on HPE Gen10, Gen10 Plus, or Gen11 hardware platforms
  • In the var/log/vmkernel.log you will see below alert:

ALERT: Heap: 2746: Unable to complete wait for non-empty heap (worldGroup.######): TimeoutESC

  • PSOD (purple screen of death)  backtrace will be similar to:
@BlueScreen: NOT_IMPLEMENTED bora/vmkernel/main/world.c:2294
0x453a8b81bc00:[0x420031114d31]PanicvPanicInt@vmkernel#nover+0x1f5 stack: 0x100
0x453a8b81bcb0:[0x4200311153a0]Panic_NoSave@vmkernel#nover+0x4d stack: 0x453a8b81bd10
0x453a8b81bd10:[0x4200311158ad]Panic_OnAssertAt@vmkernel#nover+0xba stack: 0x8f600000000
0x453a8b81bd90:[0x42003116855f]Int6_UD2Assert@vmkernel#nover+0x260 stack: 0x0
0x453a8b81bdc0:[0x420031161067]gate_entry@vmkernel#nover+0x68 stack: 0x0
0x453a8b81be80:[0x420031147136]World_DestroyHeap@vmkernel#nover+0x4e stack: 0x431753200000
0x453a8b81bea0:[0x420031147251]WorldGroupCleanup@vmkernel#nover+0xe6 stack: 0x453a8b81bef0
0x453a8b81bec0:[0x4200310f1dee]InitTable_Cleanup@vmkernel#nover+0x27 stack: 0x43143f401220
0x453a8b81bee0:[0x42003114cd46]World_TryReap@vmkernel#nover+0x3d3 stack: 0x453ace21f000
0x453a8b81bfa0:[0x420031117582]ReaperWorkerWorld@vmkernel#nover+0xaf stack: 0x453a8b79f100
0x453a8b81bfe0:[0x420031428eca]CpuSched_StartWorld@vmkernel#nover+0x7b stack: 0x0
0x453a8b81c000:[0x4200310d788b]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
base fs=0x0 gs=0x420040800000 Kgs=0x0
Heap: 2746: Unable to complete wait for non-empty heap (worldGroup.62487428): Timeout

Note: The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on your environment.

Environment

  • VMware vSphere ESXi 8.x
  • VMware vSphere ESXi 7.x

Cause

When a kernel module exposing a character device doesn't behave as expected by the vmkernel, in certain cases a vmkpollcontext object could leak in the vmkernel after a poll() syscall from userspace on the same device. Later, when the userspace process terminates, if there was been an object leak, the vmkernel will PSOD with a NOT_IMPLEMENTED assert, because the leaked object poll context object is associated with the process.

The HPE "ilo" kernel module used by the HPE SMAD (System Management Assistant daemon) are known to cause this issue.

Resolution

For ESXi hosts running ESXi 7.0 (or later), update HPE iLO Native driver component to v10.8.2 (or later).

For ESXi hosts running ESXi 8.0 (or later), update HPE iLO Native driver component to v10.8.2 (or later) and update ESXi to ESXi 8.0 Update 2b or later