ESXi 6.5 host fails with PSOD: GP Exception 13 in multiple VMM world at VmAnon_AllocVmmPages
search cancel

ESXi 6.5 host fails with PSOD: GP Exception 13 in multiple VMM world at VmAnon_AllocVmmPages

book

Article ID: 317559

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

To work around this issue for virtual machines migrated to an ESXi 6.5 host from earlier version, recover the virtual machine from the ESXi 6.5 host and set Numa.FollowCoresPerSocket to 1 on all ESXi 6.5 hosts.

Symptoms:

  • ESXi 6.5 host fails with purple diagnostic screen:
  • You see stack traces similar to:

    #GP Exception 13 in world 187416:vmm3:XXXXXXXX @ 0x418021d8adc7
    xxxx-xx-xxTxx:xx:xx.xxxZ cpu16:187416)Backtrace for current CPU #16, worldID=187416, fp=0x0
    xxxx-xx-xxTxx:xx:xx.xxxZ cpu16:187416)0x4393e0c1bf20:[0x418021d8adc7]VmAnon_AllocVmmPages@vmkernel#nover+0x3b stack: 0x100008785, 0x192d6c43b, 0x4393e84a7000, 0x6, 0x4393e84a7100
    xxxx-xx-xxTxx:xx:xx.xxxZ cpu16:187416)0x4393e0c1bf80:[0x418021d18ad7]VMMVMKCall_Call@vmkernel#nover+0x157 stack: 0x4393e0c1bfec, 0x4024600000000, 0x418021d4a64b, 0xfffffffffc606d00, 0x0
    xxxx-xx-xxTxx:xx:xx.xxxZ cpu16:187416)0x4393e0c1bfe0:[0x418021d4a6d2]VMKVMM_ArchEnterVMKernel@vmkernel#nover+0xe stack: 0x0, 0xfffffffffc4074b3, 0x0, 0x0, 0x0
    xxxx-xx-xxTxx:xx:xx.xxxZ cpu16:187416)Panic: 623: Halting PCPU 16.
    xxxx-xx-xxTxx:xx:xx.xxxZ cpu20:187418)Panic: 514: Panic from another CPU (cpu 20, world 187418): ip=0x418021d4a2e2 randomOff=0x21c00000:

     
  • Virtual machines are migrated to the host from an:
     
    • ESXi host running a ESXi 6.5
    • ESXi host running a previous version

Environment

VMware vSphere ESXi 6.5

Cause

vSphere 6.5 has optimized the vNUMA layout for NUMA-aware virtual machines based on the physical NUMA configuration. In some cases, a virtual machine may see a different vNUMA topology after migrating to the destination ESXi 6.5 host from the vNUMA topology on the source ESXi host. Failing to recognize the change on the destination host (6.5) may lead to a host purple diagnostic screen or VMM panic during vMotion of the virtual machine.

This issue occurs if the virtual machine which is migrated contains custom VMX settings, such as cpuid.corePerSocket or numa.autosize.once.

Resolution

This issue is resolved in ESX 6.5.0a, available at Broadcom Downloads.

To work around this issue for virtual machines migrated to an ESXi 6.5 host from previous version, recover the virtual machine from the ESXi 6.5 host and set Numa.FollowCoresPerSocket to 1 on all ESXi 6.5 hosts:

To recover virtual machines that were migrated to a ESXi 6.5 host:

  1. Remove the virtual machine from the ESXi 6.5 hosts inventory and add it to a earlier ESXi host.
  2. Manually remove the numa.autosize.cookie and numa.autosize.vcpu.maxPerVirtualNode from the virtual machines vmx file. For more information, see Editing configuration files in VMware ESXi and ESX (1017022).

To manually set the Numa.FollowCoresPerSocket to 1 on the ESXi 6.5 hosts:

  1. Connect to vCenter Server using the vSphere Web Client.
  2. Select an ESXi host in the inventory.
  3. Ensure that there is no virtual machine on the current ESXi host.
  4. Click the Manage tab.

    Note: If you are using vCenter Server 6.5, click the Configure tab > Advanced Settings and proceed to step 7.
     
  5. Click Settings.
  6. Under the System heading, click Advanced System Settings.
  7. Search for the Numa.FollowCoresPerSocket setting.
  8. Click Edit.
  9. Change the value to 1.
  10. Click OK to accept the changes.