ESXi host crashes with PSOD reporting error as "Unable to complete wait for non-empty heap"
search cancel

ESXi host crashes with PSOD reporting error as "Unable to complete wait for non-empty heap"

book

Article ID: 387534

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0

Issue/Introduction

  • ESXi host reporting crash with PSOD as "Unable to complete wait for non-empty heap (worldGroup.xxxxxxxx): Timeout"
  • /var/run/log/LogEFI.log

YYYY-MM-DDTHH:MM:SS cpu80:2097406)ESC[45mESC[33;1mVMware ESXi 7.0.3 [Releasebuild-20328353 x86_64]ESC[0m
NOT_IMPLEMENTED bora/vmkernel/main/world.c:2293
YYYY-MM-DDTHH:MM:SS cpu80:2097406)cr0=0x8001003d cr2=0x970013a860 cr3=0x6913e000 cr4=0x10216c
YYYY-MM-DDTHH:MM:SS cpu80:2097406)FMS=06/55/7 uCode=0x5003303
*PCPU80:2097406/reapWorker-4
PCPU  0: SSVVUSSVVVVUVUVUUSVVSVVVUIVVVVVSVSVVSVSUVVUVVUVVVSVVVVSVUSSUUUSV
PCPU 64: UVUUUUVUSUVVVVUVSVSVVVVUUUSSVVUVSVUVVSVVVUUUVVVS
YYYY-MM-DDTHH:MM:SS cpu80:2097406)Code start: 0x420021800000 VMK uptime: 127:22:28:13.813
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bc20:[0x4200218fee4f]PanicvPanicInt@vmkernel#nover+0x327 stack: 0x453907f1bcf8
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bcf0:[0x4200218ff3a8]Panic_NoSave@vmkernel#nover+0x4d stack: 0x453907f1bd50
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bd50:[0x4200218ff939]Panic_OnAssertAt@vmkernel#nover+0xba stack: 0x8f500000000
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bdd0:[0x420021955716]Int6_UD2Assert@vmkernel#nover+0x27f stack: 0x0
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1be00:[0x42002194e067]gate_entry@vmkernel#nover+0x68 stack: 0x0
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bec0:[0x420021931c70]World_DestroyHeap@vmkernel#nover+0x50 stack: 0x4331d3200000
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bee0:[0x420021932d2d]WorldGroupCleanup@vmkernel#nover+0x16e stack: 0x420021f02f00
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bf10:[0x4200218dd726]InitTable_Cleanup@vmkernel#nover+0x27 stack: 0x431464401220
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bf30:[0x420021937de0]World_TryReap@vmkernel#nover+0x385 stack: 0x0
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bfa0:[0x420021901873]ReaperWorkerWorld@vmkernel#nover+0xd8 stack: 0x453907d9f140
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1bfe0:[0x420021bb33e1]CpuSched_StartWorld@vmkernel#nover+0x86 stack: 0x0
YYYY-MM-DDTHH:MM:SS cpu80:2097406)0x453907f1c000:[0x4200218c4b4f]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
YYYY-MM-DDTHH:MM:SS cpu80:2097406)base fs=0x0 gs=0x420054000000 Kgs=0x0
YYYY-MM-DDTHH:MM:SS cpu80:2097406)Heap: 2749: Unable to complete wait for non-empty heap (worldGroup.56902719): Timeout

  • /var/run/log/vmkernel.log:

YYYY-MM-DDTHH:MM:SS cpu19:4365595)Admission failure in path: host:user:pool137:vm.4365595:uwWorldStore.4365595
YYYY-MM-DDTHH:MM:SS cpu19:4365595)uwWorldStore.4365595 (19814862) extraMin/extraFromParent: 1/1, host (0) childEmin/eMinLimit: 150544522/150544522
YYYY-MM-DDTHH:MM:SS cpu19:4365595)Admission failure in path: host:user:pool137:vm.4365595:uwWorldStore.4365595
YYYY-MM-DDTHH:MM:SS cpu19:4365595)uwWorldStore.4365595 (19814862) extraMin/extraFromParent: 1/1, host (0) childEmin/eMinLimit: 150544522/150544522
YYYY-MM-DDTHH:MM:SS cpu19:4365595)WARNING: World: 2706: Could not allocate new world handle for world ID: 4365598: Admission check failed for memory resource
YYYY-MM-DDTHH:MM:SS cpu0:2097455)ALERT: Heap: 2749: Unable to complete wait for non-empty heap (worldGroup.4365595): Timeout

Cause

NUMASchedNUMAVMInfoInit() allocates memory on the heap via a call to World_Alloc(). Upon failures following a successful invocation of NUMASchedNUMAVMInfoInit() in NUMASched_WorldInit(), however there is no call to World_Free() to deallocate memory allocated by World_Alloc() which causes a memory leak and subsequent PSOD.

Resolution

Broadcom is working towards a permanent fix for this issue in vSphere 7.0. This issue is resolved with vSphere 8.0 and later.

 

This issue may also be caused due to over-commitment or incorrect reservation of compute (CPU/memory) resource on the cluster. Proceed to verify the configuration based on the resource available at the Cluster/Resource Pool level.