ESXi host crashes with PSOD reporting error as "Unable to complete wait for non-empty heap"
search cancel

ESXi host crashes with PSOD reporting error as "Unable to complete wait for non-empty heap"

book

Article ID: 387534

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0

Issue/Introduction

  • ESXi host reporting crash with PSOD as "Unable to complete wait for non-empty heap (worldGroup.xxxxxxxx): Timeout"
  • /var/run/log/LogEFI.log

YYYY-MM-DDTHH:MM:SS cpu80:2097406)ESC[45mESC[33;1mVMware ESXi 7.0.3 [Releasebuild-20328353 x86_64]ESC[0m
NOT_IMPLEMENTED bora/vmkernel/main/world.c:2293
YYYY-MM-DDTHH:MM:SS cpu80:2097406)cr0=0x8001003d cr2=0x970013a860 cr3=0x6913e000 cr4=0x10216c
YYYY-MM-DDTHH:MM:SS cpu80:2097406)FMS=06/55/7 uCode=0x5003303
*PCPU80:2097406/reapWorker-4
PCPU  0: SSVVUSSVVVVUVUVUUSVVSVVVUIVVVVVSVSVVSVSUVVUVVUVVVSVVVVSVUSSUUUSV
PCPU 64: UVUUUUVUSUVVVVUVSVSVVVVUUUSSVVUVSVUVVSVVVUUUVVVS
YYYY-MM-DDTHH:MM:SS cpu80:2097406)Code start: 0x420021800000 VMK uptime: 127:22:28:13.813
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bc20:[0x4200218fee4f]PanicvPanicInt@vmkernel#nover+0x327 stack: 0x453907f1bcf8
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bcf0:[0x4200218ff3a8]Panic_NoSave@vmkernel#nover+0x4d stack: 0x453907f1bd50
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bd50:[0x4200218ff939]Panic_OnAssertAt@vmkernel#nover+0xba stack: 0x8f500000000
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bdd0:[0x420021955716]Int6_UD2Assert@vmkernel#nover+0x27f stack: 0x0
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1be00:[0x42002194e067]gate_entry@vmkernel#nover+0x68 stack: 0x0
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bec0:[0x420021931c70]World_DestroyHeap@vmkernel#nover+0x50 stack: 0x4331d3200000
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bee0:[0x420021932d2d]WorldGroupCleanup@vmkernel#nover+0x16e stack: 0x420021f02f00
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bf10:[0x4200218dd726]InitTable_Cleanup@vmkernel#nover+0x27 stack: 0x431464401220
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bf30:[0x420021937de0]World_TryReap@vmkernel#nover+0x385 stack: 0x0
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bfa0:[0x420021901873]ReaperWorkerWorld@vmkernel#nover+0xd8 stack: 0x453907d9f140
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bfe0:[0x420021bb33e1]CpuSched_StartWorld@vmkernel#nover+0x86 stack: 0x0
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1c000:[0x4200218c4b4f]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
YYYY-MM-DDTHH:MM:SS cpu80:2097406)base fs=0x0 gs=0x420054000000 Kgs=0x0
YYYY-MM-DDTHH:MM:SS cpu80:2097406)Heap: 2749: Unable to complete wait for non-empty heap (worldGroup.56902719): Timeout

  • /var/run/log/vmkernel.log:

YYYY-MM-DDTHH:MM:SS cpu19:4365595)Admission failure in path: host:user:pool137:vm.4365595:uwWorldStore.4365595
YYYY-MM-DDTHH:MM:SS cpu19:4365595)uwWorldStore.4365595 (19814862) extraMin/extraFromParent: 1/1, host (0) childEmin/eMinLimit: 150544522/150544522
YYYY-MM-DDTHH:MM:SS cpu19:4365595)Admission failure in path: host:user:pool137:vm.4365595:uwWorldStore.4365595
YYYY-MM-DDTHH:MM:SS cpu19:4365595)uwWorldStore.4365595 (19814862) extraMin/extraFromParent: 1/1, host (0) childEmin/eMinLimit: 150544522/150544522
YYYY-MM-DDTHH:MM:SS cpu19:4365595)WARNING: World: 2706: Could not allocate new world handle for world ID: 4365598: Admission check failed for memory resource
YYYY-MM-DDTHH:MM:SS cpu0:2097455)ALERT: Heap: 2749: Unable to complete wait for non-empty heap (worldGroup.4365595): Timeout

Cause

NUMASchedNUMAVMInfoInit() allocates memory on the heap via a call to World_Alloc(). Upon failures following a successful invocation of NUMASchedNUMAVMInfoInit() in NUMASched_WorldInit(), however there is no call to World_Free() to deallocate memory allocated by World_Alloc() which causes a memory leak and subsequent PSOD.

Resolution

This issue is resolved with VMware vSphere ESXi 7.0u3v and later and VMware vSphere ESXi 8.0 and later.

 

This issue may also be caused due to over-commitment or incorrect reservation of compute (CPU/memory) resource on the cluster. Proceed to verify the configuration based on the resource available at the Cluster/Resource Pool level.