PSOD occurred with "PCPU X locked up. Failed to ack TLB invalidate"
search cancel

PSOD occurred with "PCPU X locked up. Failed to ack TLB invalidate"

book

Article ID: 417686

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

PSOD screen reports information similar to following: 

PCPU X locked up. Failed to ack TLB invalidate (at least 1 locked up, PCPU(s): X).
PCPU(s) did not respond to NMI. Possible hardware problem; contact hardware vendor.


The vmkernel.log records that the NVIDIA device became unresponsive and reset before the PSOD.

YYYY-MM-DDTHH:MM:SS.536Z cpu4:2097455)WARNING: PCI: 740: Dev ####:##:##.1 is unresponsive after reset
YYYY-MM-DDTHH:MM:SS.154Z cpu8:2097387)WARNING: PCI: 740: Dev ####:##:##.2 is unresponsive after reset

Environment

ESXi 8.0

Cause

This issue is caused by an NVIDIA device becoming unresponsive.

The PCPU was stuck or taking time accessing the PCI configuration space for the NVIDIA device.
This caused a PSOD when the PCPU on the same physical core failed to handle the TLB Invalidate Request.

Resolution

Please contact NVIDIA regarding the cause of the NVIDIA device becoming unresponsive.

Additional Information

Understanding a "Failed to ack TLB invalidate" purple diagnostic screen