Error: "KLMCall_RunVCPU terminated unexpectedly" powering on high-vCPU VM with memory tiering

Products

VMware vSphere ESXi

Issue/Introduction

You attempt to power on a virtual machine configured with more than 60 vCPUs on an ESXi 8.0 Update 3 or vSphere 9.0 host with NVMe memory tiering enabled. The VM fails to power on and the vmware.log file shows the following error:

KLMCall_RunVCPU terminated unexpectedly

The full error sequence in vmware.log appears as:

Msg_Post: Error
[msg.log.error.unrecoverable] VMware ESX unrecoverable error: (vcpu-XX)
KLMCall_RunVCPU terminated unexpectedly

The failure occurs at a consistent vCPU threshold. Power-on succeeds if you reduce the vCPU count below the threshold or disable memory tiering on the VM. This prevents you from using the full CPU capacity of the host for large-scale VM configurations.

Additional symptoms reported:

VM fails to power on when assigning more than a certain number of vCPUs (for example, 82 vCPUs)
Available reservable CPU capacity appears less than actual host CPU capacity
vSphere UI shows a gap between "Cluster Total Capacity" and "Total Reservation Capacity"

Environment

ESXi 8.0 Update 3 (all subversions) with NVMe Memory Tiering Tech Preview enabled
vSphere 9.0 with NVMe Memory Tiering enabled
Virtual machine configured with more than 60 vCPUs
Memory tiering enabled on the VM (sched.mem.enableTiering = "TRUE")

Cause

This is a known issue with overhead memory allocation in the NVMe Memory Tiering feature.

Each vCPU in a virtual machine requires overhead memory for its vCPU world initialization. The Mem.VMOverheadGrowthLimit advanced setting controls how much overhead memory the system can allocate for VM operations. When memory tiering is enabled, the default value of this setting (1) does not provide sufficient headroom for VMs with high vCPU counts.

When you attempt to power on a VM with more than 60 vCPUs and memory tiering is active, the system cannot allocate enough overhead memory to initialize all the vCPU worlds. The power-on process fails at the vCPU where the overhead memory limit is exceeded, resulting in the "KLMCall_RunVCPU terminated unexpectedly" error.

Resolution

This is planned to be fixed in a future release Until then, use one of the following workarounds.

Workaround 1: Increase the overhead memory growth limit (recommended)

This workaround allows you to keep memory tiering enabled while supporting high-vCPU VMs. The default value is 1. A value of 3 is recommended. Higher values may be appropriate for very high vCPU counts.

Log in to the vSphere Client.
Navigate to the affected ESXi host.
Click the Configure tab.
Under System, click Advanced System Settings.
Click Edit.
In the filter box, type VMOverheadGrowthLimit.
Locate Mem.VMOverheadGrowthLimit.
Change the value from 1 to 3.
Click OK.

No host reboot is required. You can now power on the VM with memory tiering enabled.

Workaround 2: Disable memory tiering on the VM

Power off the VM.
Edit the VM's VMX file or use the vSphere Client to add/modify the following advanced parameter:
```
sched.mem.enableTiering = "FALSE"
```
Power on the VM.

Workaround 3: Disable memory tiering at the host level

Use this approach only if memory tiering is not required for any workloads on the host.

Log in to the vSphere Client.
Navigate to the affected ESXi host.
Click the Configure tab.
Under System, click Advanced System Settings.
Click Edit.
In the filter box, type memoryTiering.
Locate VMkernel.Boot.memoryTiering.
Change the value to FALSE.
Click OK.
Reboot the host.

Additional Information

For more information, see vSphere 9.0 Known Issues - Memory Tiering.
For more information on configuring memory tiering, see Memory Tiering Configuration.