PSOD on ESXI Host Due to NVIDIA Drivers
search cancel

PSOD on ESXI Host Due to NVIDIA Drivers

book

Article ID: 388308

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 7.0

Issue/Introduction

PCPU 48: no heartbeat (2/3 IPIs received)
 cpu51:2097305)cr0=0x8001003d cr2=0x90148a6490 cr3=0x61c000 cr4=0x10016c
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)FMS=19/11/1 uCode=0xa101148
*PCPU51:2097305/RCUWorld
PCPU  0: SSSSVSSVUVVSSVSVSUSUSSSISSSSVSSVSUSUVVSSVVVVVVVVUUSSVVVUVSVUVIVV
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)Code start: 0x420026e00000 VMK uptime: 0:23:21:02.095
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9b670:[0x420026eff107]PanicvPanicInt@vmkernel#nover+0x327 stack: 0x453984c9b748
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9b740:[0x420026eff6b9]Panic_WithBacktrace@vmkernel#nover+0x56 stack: 0x453984c9b7b0
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9b7b0:[0x420027196b1f]Heartbeat_DetectCPULockups@vmkernel#nover+0x50c stack: 0x17e8
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9b820:[0x420026f0f56f]TimerWheelHandler@vmkernel#nover+0x88 stack: 0x42004cc05f60
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9b8a0:[0x420026f0f7b3]Timer_BHHandler@vmkernel#nover+0x48 stack: 0x42004cc00000
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9b8d0:[0x420026ec044e]BH_DrainAndDisableInterrupts@vmkernel#nover+0x97 stack: 0x0
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9b950:[0x420026edfeaa]IntrCookie_VmkernelInterrupt@vmkernel#nover+0xb3 stack: 0xffffffffffffffef
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9b970:[0x420026f55aac]IDT_IntrHandler@vmkernel#nover+0x9d stack: 0x0
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9b990:[0x420026f4e067]gate_entry@vmkernel#nover+0x68 stack: 0x0
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9ba58:[0x420026e848aa]Power_ArchPerformWait@vmkernel#nover+0x106 stack: 0x42004cc00980
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9ba60:[0x420026e84982]Power_ArchSetCState@vmkernel#nover+0x8f stack: 0x0
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9bab0:[0x4200271af368]CpuSchedIdleLoopInt@vmkernel#nover+0x275 stack: 0x42004cc00100
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9bb20:[0x4200271b342e]CpuSchedDispatch@vmkernel#nover+0x1aff stack: 0x42004cc00140
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9bd60:[0x4200271b4183]CpuSchedWait@vmkernel#nover+0x2f4 stack: 0x0
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9bee0:[0x4200271b42f7]CpuSchedSleepUntilTC@vmkernel#nover+0xbc stack: 0x430334602a10
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9bf70:[0x420026e23776]RCUWorldFunc@vmkernel#nover+0x37 stack: 0x0
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9bfe0:[0x4200271b4d55]CpuSched_StartWorld@vmkernel#nover+0x86 stack: 0x0
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)0x453984c9c000:[0x420026ec4ddf]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
[YYYY-MM-DDTHH:MM:SS]  cpu51:2097305)base fs=0x0 gs=0x42004cc00000 Kgs=0x0
[YYYY-MM-DDTHH:MM:SS]  cpu48:2123178)NMI: 712: NMI IPI: RIPOFF(base):RBP:CS [0x108ddb(0x420026e00000):0x0:0xf48] (Src 0x1, CPU48)
[YYYY-MM-DDTHH:MM:SS]  cpu48:2123178)NMI: 712: NMI IPI: RIPOFF(base):RBP:CS [0x108ddb(0x420026e00000):0x0:0xf48] (Src 0x1, CPU48)

Environment

VMware vSphere ESXi 7.0.3

Cause

PSOD was caused due to an unresponsive NVIDIA device.

The vmkernel thread seemed to be stuck on accessing the PCI configuration space for NVIDIA devices.

Resolution

This will require upgrading NVIDIA Driver and Firmware to the latest versions. 
Please engage with NVIDIA support.