vmx crash during last stage svmotion
search cancel

vmx crash during last stage svmotion

book

Article ID: 323383

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
You might face this issue during the storage vmotion

You will observe the below similar error log snippets from the vmkernel and hostd logs of the ESXi host on which the VM is running on.

/var/run/log/vmkernel.log:

<YYYY-MM-DD>T<time> cpu33:4028434)WARNING: PFrame: vm XXXXXXX: YYYY: Trying to fault in page dir during the memory handoff phase of a fast suspend/resume!
<YYYY-MM-DD>T<time> cpu33:4028434)Log: 1487: Generating backtrace for XXXXXXX: vmm0:wuhcuremd
<YYYY-MM-DD>T<time> cpu33:4028434)Backtrace for current CPU #33, worldID=XXXXXXX, fp=0x453961ca1000
<YYYY-MM-DD>T<time> cpu33:4028434)0x453981a1bef8:[0x420034150715]WorldSwitch_out_label@vmkernel#nover+0x0 stack: 0x466080001220, 0x453981a21000, 0xfffffffffc407e8f, 0x401, 0x453981a1bfec
<YYYY-MM-DD>T<time> cpu33:4028434)WARNING: UserMem: 9746: vmx-svga:wuhcuremd: PF failed to handle a fault on mmInfo at va 0x6010cf4ff0: Out of memory. Terminating...

/var/run/log/hostd.log:

<YYYY-MM-DD>T<time> cpu33:4028434)WARNING: PFrame: vm XXXXXXX: 2005: Trying to fault in page dir during the memory handoff phase of a fast suspend/resume!
<YYYY-MM-DD>T<time> cpu33:4028434)Log: 1487: Generating backtrace for XXXXXXX: vmm0:wuhcuremd
<YYYY-MM-DD>T<time> cpu33:4028434)Backtrace for current CPU #33, worldID=4028434, fp=0x453961ca1000
<YYYY-MM-DD>T<time> cpu33:4028434)0x453981a1bef8:[0x420034150715]WorldSwitch_out_label@vmkernel#nover+0x0 stack: 0x466080001220, 0x453981a21000, 0xfffffffffc407e8f, 0x401, 0x453981a1bfec
<YYYY-MM-DD>T<time> cpu33:4028434)WARNING: UserMem: 9746: vmx-svga:wuhcuremd: PF failed to handle a fault on mmInfo at va 0x6010cf4ff0: Out of memory. Terminating...

-->    arguments = (vmodl.KeyAnyValue) [
-->       (vmodl.KeyAnyValue) {
-->          key = "1",
-->          value = (vim.event.VmEventArgument) {
-->             name = "<VM name>",
-->             vm = 'vim.VirtualMachine:YY'
-->          }
-->       }
-->    ],
-->    objectId = "52",
-->    objectType = "vim.VirtualMachine",
-->    objectName = "<VM name>",
--> }

<YYYY-MM-DD>T<time> info hostd[2100584] [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {
-->    key = 169,
-->    chainId = -1895273568,
-->    createdTime = "1970-01-01T00:00:00Z",
-->    userName = "",
-->    host = (vim.event.HostEventArgument) {
-->       name = "<host_name>",
-->       host = 'vim.HostSystem:ha-host'
-->    },
-->    vm = (vim.event.VmEventArgument) {
-->       name = "<VM name>",
-->       vm = 'vim.VirtualMachine:YY'
-->    },
-->    eventTypeId = "esx.problem.vm.kill.unexpected.vmx.fault.failure.2",
-->    arguments = (vmodl.KeyAnyValue) [
-->       (vmodl.KeyAnyValue) {
-->          key = "1",
-->          value = (vim.event.VmEventArgument) {
-->             name = "<VM name>",
-->             vm = 'vim.VirtualMachine:YY'
-->          }
-->       }
-->    ],
-->    objectId = "52",
-->    objectType = "vim.VirtualMachine",
-->    objectName = "<VM name>",
--> }
<YYYY-MM-DD>T<time> info hostd[2100584] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event XXXXXX: The user world daemon of wuhcuremd could not fault in a page. The virtual machine is terminated as further progress is impossible.

Environment

VMware vSphere 7.x

Cause

This issue has been identified as a VMW software defect [The SVGA device is not allowed to be accessing guest main-memory when the VM is quiesced/stunned]. this issue can be triggered by race conditions with the VM with HWv 18 and VMware tools version above 11.2.0. The issue could occur by either VMotion or storage VMotion.

 

Resolution

The issue is resolved in ESXi 7.0 U2c.


You can follow the instructions below as a workaround in case an upgrade/update is not possible:

To avoid such issues from happening, below is the list of workarounds: 

a. Downgrade VMware tools version to before 11.2.0. [As we are changing the SVGA driver, VMware tools have to be completely uninstalled/VM rebooted/older VMware tools version reinstalled]

b. If the VM happens to be HWv18 with tools version on/post 11.2.0, and downgrading the VMware tools is not an option:

  • You can edit the VM configuration file by adding :
    guestInfo.svga.wddm.enableMobCursor=FALSE
  • Reboot the VM for the change to take effect.