Memory corruption in the PShare subsystem causing host PSOD (purple screen of death)
search cancel

Memory corruption in the PShare subsystem causing host PSOD (purple screen of death)

book

Article ID: 318429

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • In the /var/log/vmkernel.log file, you see entries similar to :
    04:40:54.761Z cpu21:14392120)WARNING: UserMem: 14034: vmx-vthread-6: vpn 0xa00bc795 status: "Invalid address" (bad0026)
    04:40:54.763Z cpu21:14392120)WARNING: UserMem: 14034: vmx-vthread-6: vpn 0xa00bc7b5 status: "Invalid address" (bad0026)
    04:40:54.764Z cpu21:14392120)WARNING: UserMem: 14034: vmx-vthread-6: vpn 0xa00bc7d5 status: "Invalid address" (bad0026)
    04:40:54.765Z cpu21:14392120)WARNING: UserMem: 14034: vmx-vthread-6: vpn 0xa00bc7f5 status: "Invalid address" (bad0026)
    04:40:54.766Z cpu21:14392120)WARNING: UserMem: 14034: vmx-vthread-6: vpn 0xa00bc815 status: "Invalid address" (bad0026)
    04:40:54.768Z cpu21:14392120)WARNING: UserMem: 14034: vmx-vthread-6: vpn 0xa00bc835 status: "Invalid address" (bad0026)
  • You see PSOD (purple screen of death) stacks similar to:
    VmMemCowPShareRemoveWithCheck@vmkernel#nover+0x10f stack: 0x418011d0
    VmMemCow_CopyPageWithMPN@vmkernel#nover+0x19f stack: 0x3fffffffff, 0
    VmMemPf@vmkernel#nover+0x133 stack: 0x449fd475255d69,
    PShareHashTableWalkMatchMPN@vmkernel#nover+0x2d stack: 0x3110dc
    PShare_RemoveHint@vmkernel#nover+0xb3 stack: 0x4391ccaa7000
    VmMemCow_PShareRemoveHint@vmkernel#nover+0x72 stack: 0x4391ccc1bef8
    VmMemCowPFrameRemoveHint@vmkernel#nover+0xc6 stack: 0x304
    VmMemCowPShareFn@vmkernel#nover+0x5c3 stack: 0x6422bec
    VmAssistantProcessTasks@vmkernel#nover+0x144 stack: 0x0
    CpuSched_StartWorld@vmkernel#nover+0x99 stack: 0x0
  • You see multiple CPUs locked up messages similar to:
    10:49:52.654Z cpu4:155673)WARNING: Heartbeat: 794: PCPU 51 didn't have a heartbeat for 9 seconds; *may* be locked up.
    10:49:52.654Z cpu52:172048)WARNING: Heartbeat: 794: PCPU 27 didn't have a heartbeat for 12 seconds; *may* be locked up.


Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware vSphere ESXi 6.5
VMware vSphere ESXi 6.0

Cause

This issue occurs due to memory corruption in the PShare subsystem.

Resolution

This is a known issue affecting VMware ESXi 6.0 and 6.5.

The fix is available in ESXi 6.5 patch 2, and 6.0 patch 7.

Workaround:
To work around this issue, disable the page sharing.

To disable the page sharing:
  1. Log in to ESX\ESXi host or vCenter Server using the vSphere Client.

    Note: If you are connected to vCenter Server, select the relevant ESX\ESXi host.
     
  2. In the Configuration tab, click Advanced Settings under the Software section.
  3. In the Advanced Settings window, click Mem.
  4. Locate Mem.ShareScanGHz and set the value to 0.
  5. Click OK.
  6. Perform any of the below steps to make the page share changes effective immediately:
    • Migrate all the virtual machines to another host in the same cluster and migrate back to the original host.
    • Shutdown and Power On all the virtual machines.