PSOD screen reports information similar to following:
PCPU X locked up. Failed to ack TLB invalidate (at least 1 locked up, PCPU(s): X).
PCPU(s) did not respond to NMI. Possible hardware problem; contact hardware vendor.
The vmkernel.log records that the NVIDIA device became unresponsive and reset before the PSOD.
YYYY-MM-DDTHH:MM:SS.536Z cpu4:2097455)WARNING: PCI: 740: Dev ####:##:##.1 is unresponsive after reset
YYYY-MM-DDTHH:MM:SS.154Z cpu8:2097387)WARNING: PCI: 740: Dev ####:##:##.2 is unresponsive after reset
ESXi 8.0
This issue is caused by an NVIDIA device becoming unresponsive.
The PCPU was stuck or taking time accessing the PCI configuration space for the NVIDIA device.
This caused a PSOD when the PCPU on the same physical core failed to handle the TLB Invalidate Request.
Please contact NVIDIA regarding the cause of the NVIDIA device becoming unresponsive.