NVMe Drive Failure Triggers Unrecoverable Machine Check Exception (MCE) and PSOD
search cancel

NVMe Drive Failure Triggers Unrecoverable Machine Check Exception (MCE) and PSOD

book

Article ID: 435833

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

VMware ESXi hosts experience a Purple Screen of Death (PSOD) immediately following the failure of an NVMe capacity drive. Instead of the drive being gracefully retired by the software-defined storage layer (vSAN), the system encounters a fatal hardware exception.

Symptoms:

  • Host crashes with a Machine Check Exception (MCE).

  • Backtrace may reference netCoalesce2 or Power_ArchPerformWait.

  • Log snippets (if captured) show:

    • MCA: 191: UC Excp G5 B6 Sbb80000000000e0b A0 M10100000 P0/0 I/O error reported by PCI 0000:10:02.0.

    • System has encountered a Hardware Error - Please contact the hardware vendor

  Panic Details: Crash at 2026-03-30T23:16:22.045Z on CPU 5 running world 2097171. VMK Uptime:375:03:04:26.878
   PSOD Message: @BlueScreen: Machine Check Exception on PCPU5 in world 2097171:netCoalesce2
   Backtrace for Current CPU - 5:
     0x45398099ba78:[0x42002c090e24]Power_ArchPerformWait@vmkernel#nover+0xd4 stack: 0x420041401880, 0x0, 0x0, 0x420041400000, 0x420041400000
     0x45398099ba80:[0x42002c090f75]Power_ArchSetCState@vmkernel#nover+0xba stack: 0x0, 0x0, 0x420041400000, 0x420041400000, 0x0
     0x45398099bad0:[0x42002c6d4111]CpuSchedIdleLoopInt@vmkernel#nover+0x292 stack: 0x0, 0x7fffffffffffffff, 0x1, 0x7fffffffffffffff, 0x453989c1f100
     0x45398099bb40:[0x42002c6d863c]CpuSchedDispatch@vmkernel#nover+0x1e31 stack: 0x452100000001, 0x420041401040, 0x420041401110, 0x420041401128, 0x420041401040
     0x45398099bd80:[0x42002c6d904e]CpuSchedWait@vmkernel#nover+0x35b stack: 0x8000000000000006, 0x0, 0x1000000000000, 0x6, 0x2
     0x45398099bef0:[0x42002c6d91b0]CpuSchedSleepUntilTC@vmkernel#nover+0xb5 stack: 0x4, 0x430000200013, 0x420041406b08, 0x431074e12008, 0x42002c6d4e50
     0x45398099bf90:[0x42002c24af73]NetCoalesce2WorldCB@vmkernel#nover+0x9c stack: 0x0, 0x108099f000, 0x45398099f000, 0x45398091f100, 0x45398099f100
     0x45398099bfe0:[0x42002c6d67b2]CpuSched_StartWorld@vmkernel#nover+0xbf stack: 0x0, 0x42002c144cf0, 0x0, 0x0, 0x0
     0x45398099c000:[0x42002c144cef]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0, 0x0, 0x0, 0x0, 0x0
   Errors from MCE.
     2026-03-30T23:16:21.957Z cpu5:2097171)ALERT: MCA: 191: UC Excp G5 B6 Sbb80000000000e0b A0 M10100000 P0/0 I/O error reported by PCI 0000:10:02.0.
     Machine Check Exception on PCPU5 in world 2097171:netCoalesce2
     System has encountered a Hardware Error - Please contact the hardware vendor
     2026-03-30T23:16:22.045Z cpu5:2097171)@BlueScreen: Machine Check Exception on PCPU5 in world 2097171:netCoalesce2
     System has encountered a Hardware Error - Please contact the hardware vendor

Environment

  • Product: VMware ESXi (All Versions)

Cause

The issue is caused by an Unrecoverable Machine Check Exception (MCE) which indicates a hardware issue on the server.

Resolution

Engage your hardware vendor for further troubleshooting.

 

Additional Information

Please review the following KB article on how to enable additional alerts for NVME drives in a vSAN environment.  

Enabling vSAN alerts for NVMe SMART data in vCenter