VM crash is due to slow memory reclamation

Products

VMware vSphere ESXi

Issue/Introduction

Virtual machine may crash and create a vmx-zdump in its files directory.

The contents of the back trace from the dump file will be following:

Core created from a live dump.
#0  VMKernel_LiveCoreDump (coreFileNameLen=4196, coreFileName=0xc1bcb89400 "") at bora/vmkernel/public/x86/uwvmk.h:945
945        if (linuxrc != 0) {
(gdb) bt
#0  VMKernel_LiveCoreDump (coreFileNameLen=4196, coreFileName=0xc1bcb89400 "") at bora/vmkernel/public/x86/uwvmk.h:945
#1  Sig_CoreDump () at bora/lib/sig/sigPosix.c:2326
#2  0x000000c1734527f6 in LogPanicAndDump (
    text=0xc1b9aefa00 "vmk: vcpu-0:VMM unresponsive to lowmem swap actions. :mpn alloc=1553 free=280, :swap action post=1, tgt=596893, reclaimed=140800, :attempt swap=0 balloon=0 cow=0 :done swap=0")
    at bora/vmcore/vmx/main/monitorLog.c:507
#3  MonitorLogMonitorPanic () at bora/vmcore/vmx/main/monitorLog.c:586
#4  0x000000c17345cb28 in MonitorLog_CheckPanic () at bora/vmcore/vmx/main/monitorLog.h:110
#5  MonitorLoopHostVCPULoop (vcpuid=vcpuid@entry=7) at bora/vmcore/vmx/main/monitorLoopVmkernel.c:99
#6  0x000000c173453b52 in MonitorLoopVCPUThreadBase (voidvcpuid=0x7) at bora/vmcore/vmx/main/monitor_loop.c:536
#7  0x000000c173588097 in VThreadThreadWrapper (clientData=0x0) at bora/lib/thread/vthreadPosix.c:405
#8  0x000000c1b6257d3b in start_thread (arg=0xc1bcb8e700) at pthread_create.c:308
#9  0x000000c1b655a16d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Environment

VMware vSphere ESXi 7.0.x

Cause

It occurs while the VMM is doing its NUMA migration.

One vcpu will be handling memory release request while others are concurrently trying to allocate anonymous memory.

If pending bit is not getting set to 1 by the platform then, in theory, all the vcpus (except for vcpu 1) can repeatedly come back to the vmm, see no pending requests and start new rounds of allocations while only vcpu 1 is trying to release memory (so the allocation rate exceeds the release rate)

Resolution

Workarounds the customer can choose to apply:

Make some critical VMs reserve full memory, make sched.mem.min equals memsize of the VM in the vmx config file. This can also be configured by editing the VM's memory settings in the UI.
Disable numa migration on some critical VMs, add monitor_control.disable_numamigrate = TRUE to vmx config file.
Apply host wide config to reduce or disable monitor numa migration. Use the vsish command,
```
"set /config/Numa/intOpts/MonMigLocality 50"
```

Please keep in mind only one workaround should be applied, not all of them. (1) is the most conservative workaround, this guarantees the VM will not be affected by any slow reclamation.(2) can eliminate one scenario where the issue can be triggered but the issue may still be triggered in another rare case. (3) is just a way to achieve what (2) does, just at the host level. It won't protect the VM if it migrates to a different host w/o that setting.