Virtual machine may crash and create a vmx-zdump in its files directory.
The contents of the back trace from the dump file will be following:
Core created from a live dump.
#0 VMKernel_LiveCoreDump (coreFileNameLen=4196, coreFileName=0xc1bcb89400 "") at bora/vmkernel/public/x86/uwvmk.h:945
945 if (linuxrc != 0) {
(gdb) bt
#0 VMKernel_LiveCoreDump (coreFileNameLen=4196, coreFileName=0xc1bcb89400 "") at bora/vmkernel/public/x86/uwvmk.h:945
#1 Sig_CoreDump () at bora/lib/sig/sigPosix.c:2326
#2 0x000000c1734527f6 in LogPanicAndDump (
text=0xc1b9aefa00 "vmk: vcpu-0:VMM unresponsive to lowmem swap actions. :mpn alloc=1553 free=280, :swap action post=1, tgt=596893, reclaimed=140800, :attempt swap=0 balloon=0 cow=0 :done swap=0")
at bora/vmcore/vmx/main/monitorLog.c:507
#3 MonitorLogMonitorPanic () at bora/vmcore/vmx/main/monitorLog.c:586
#4 0x000000c17345cb28 in MonitorLog_CheckPanic () at bora/vmcore/vmx/main/monitorLog.h:110
#5 MonitorLoopHostVCPULoop (vcpuid=vcpuid@entry=7) at bora/vmcore/vmx/main/monitorLoopVmkernel.c:99
#6 0x000000c173453b52 in MonitorLoopVCPUThreadBase (voidvcpuid=0x7) at bora/vmcore/vmx/main/monitor_loop.c:536
#7 0x000000c173588097 in VThreadThreadWrapper (clientData=0x0) at bora/lib/thread/vthreadPosix.c:405
#8 0x000000c1b6257d3b in start_thread (arg=0xc1bcb8e700) at pthread_create.c:308
#9 0x000000c1b655a16d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
VMware vSphere ESXi 7.0.x
It occurs while the VMM is doing its NUMA migration.
One vcpu will be handling memory release request while others are concurrently trying to allocate anonymous memory.
If pending bit is not getting set to 1 by the platform then, in theory, all the vcpus (except for vcpu 1) can repeatedly come back to the vmm, see no pending requests and start new rounds of allocations while only vcpu 1 is trying to release memory (so the allocation rate exceeds the release rate)
"set /config/Numa/intOpts/MonMigLocality 50"
Please keep in mind only one workaround should be applied, not all of them. (1) is the most conservative workaround, this guarantees the VM will not be affected by any slow reclamation.(2) can eliminate one scenario where the issue can be triggered but the issue may still be triggered in another rare case. (3) is just a way to achieve what (2) does, just at the host level. It won't protect the VM if it migrates to a different host w/o that setting.