Panic Details: Crash at 2020-01-01T00:00:00.718Z on CPU ## running world ### - nvidia_wq. VMK Uptime:9:00:37:17.308Panic Message: @BlueScreen: NMI IPI: Panic requested by another PCPU. RIPOFF(base):RBP:CS [hex code] (Src 0x1, CPU##)Backtrace: 0x4529c0472cf0:[0x4200034ff107]PanicvPanicInt@vmkernel#nover+0x327 stack: 0x4529c0472dc8, 0x4303b900a0d8, 0x4200034ff107, 0x4200039f4600, 0x4529c0472cf0 0x4529c0472dc0:[0x4200034ff6b9]Panic_WithBacktrace@vmkernel#nover+0x56 stack: 0x4529c0472e30, 0x4529c0472de0, 0x4529c0472e40, 0x4529c0472df0, 0x1bc6e97 0x4529c0472e30:[0x4200034fbe0c]NMI_Interrupt@vmkernel#nover+0x561 stack: 0x0, 0xf48, 0x0, 0x0, 0x0 0x4529c0472f00:[0x420003553392]IDTNMIWork@vmkernel#nover+0x7f stack: 0x420051c00000, 0x4200035546dd, 0x0, 0x4529c0472fd0, 0x0 0x4529c0472f20:[0x4200035546dc]Int2_NMI@vmkernel#nover+0x19 stack: 0x0, 0x42000354e068, 0xf50, 0xf50, 0x0 0x4529c0472f40:[0x42000354e067]gate_entry@vmkernel#nover+0x68 stack: 0x0, 0x4521ec000000, 0xb80000, 0x453a16c9afcf, 0x2e0405 0x453a16c9afc0:[0x420004fc6e97]_nv033315rm@(nvidia)#<None>+0x3b stack: 0x0, 0x20, 0xb81014, 0x4316eece83f8, 0x453a16c9b050 0x453a16c9b000:[0x420004aa53bd]_nv011481rm@(nvidia)#<None>+0x17a stack: 0x453a16c9b02c, 0x0, 0x0, 0x4316eece83f8, 0x4316eed42638 0x453a16c9b060:[0x420004aa5c14]_nv036150rm@(nvidia)#<None>+0x5d stack: 0x453a16c9b0d0, 0x420004ba682f, 0x4316eece83f8, 0x453a16c9b124, 0x300000005 0x453a16c9b080:[0x420004ba682e]_nv027945rm@(nvidia)#<None>+0xc3 stack: 0x300000005, 0x0, 0x4316eedda2a8, 0x4316eece83f8, 0x4316eecf3998 0x453a16c9b0e0:[0x420004ba6492]_nv027963rm@(nvidia)#<None>+0xaf stack: 0x0, 0x453a16c9b278, 0x453a16c9b110, 0x420004fc6e7d, 0x0 0x453a16c9b180:[0x420004bb80c9]_nv029586rm@(nvidia)#<None>+0x1e stack: 0x4316eecf40d8, 0x4316eece83f8, 0x109, 0x453a16c9b278, 0x0 0x453a16c9b190:[0x420004bb07e3]_nv029578rm@(nvidia)#<None>+0x6c stack: 0x109, 0x453a16c9b278, 0x0, 0x4316eecf0c18, 0x453a16c9b230 0x453a16c9b1d0:[0x420004bb6611]_nv029579rm@(nvidia)#<None>+0x2a stack: 0x453a16c9b230, 0x420004bb7ac9, 0x0, 0x4316eecf3998, 0x0 0x453a16c9b240:[0x420004bb105f]_nv029648rm@(nvidia)#<None>+0xe4 stack: 0x453a16c9b278, 0x300000002, 0x4316eecf3998, 0x36eecf3998, 0x4316eea5fd38 0x453a16c9b2d0:[0x420004feb52a]_nv000952rm@(nvidia)#<None>+0xf3 stack: 0x453a16c9b3d0, 0x453a16c9b3d0, 0x453a16c9b420, 0x453a16c9b3d0, 0x4316eece83f8 0x453a16c9b5a0:[0x420005082e0a]nv_interrupt_handler@(nvidia)#<None>+0x173 stack: 0xc6, 0x4302c4e56b80, 0xc6, 0x4302c4e56b10, 0x1vmkernel.all:2020-01-01T00:00:00.718Z cpu##:5358443)NVRM: Xid (PCI:0000:8c:00): 109, pid=#######, Ch 00000723, errorString CTX SWITCH TIMEOUT, Info 0x134050vmkernel.all:2020-01-01T00:00:00.718Z cpu##:5358443)NVRM: Xid (PCI:0000:8c:00): 109, pid=#######, Ch 00000724, errorString CTX SWITCH TIMEOUT, Info 0x2b4050vmkernel.all:2020-01-01T00:00:00.718Z cpu##:2109968)NVRM: failed to submit workqueue item: No free handlesvmkernel.all:2020-01-01T00:00:00.718Z cpu##:2109968)NVRM: failed to submit workqueue item: No free handles
vmkernel.all:2020-01-01T00:00:00.718Z cpu##:4922889)NVRM: Xid (PCI:0000:3b:00): 13, pid=0, Graphics Exception: ILLEGAL_OPCODEvmkernel.all:2020-01-01T00:00:00.718Z cpu##:4922889)NVRM: Xid (PCI:0000:3b:00): 13, pid=0, Graphics Exception: ESR 0x404490=0x80000004ESXi 7.x
ESXi 8.x
The nv_interrupt_handler() is taking too long to run, causing the CPU to struggle with handling other tasks. This can lead to heartbeat timeouts and potential CPU lockups.
This behavior may occur if the GPU driver, firmware, or hardware is not supported (not listed on the VMware Compatibility Guide) or due to a potential hardware/driver-related lockup condition with the NVIDIA card.
Check that the NVIDIA GPU card, driver, and firmware are supported with the current ESXi version and apply the latest supported NVIDIA driver and firmware package.
If the issue persists, collect diagnostic logs and work with NVIDIA support for further debugging.