LINT1/NMI when using VPMC or vmkstats features on an AMD EPYC 7002 processor
search cancel

LINT1/NMI when using VPMC or vmkstats features on an AMD EPYC 7002 processor

book

Article ID: 317660

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • You may experience a PSOD on a system using the AMD EPYC Series 7002 processor (codename Zen2/Rome), if you use either of the following two ESXi features:
  • You may see the following in the PSOD:
    • LINT1/NMI (motherboard nonmaskable interrupt), undiagnosed. This may be a hardware problem; please contact your hardware vendor.
    • "*PCPUx:", where x is a number other than 0.
    • NMIProfiler_DisableInt and/or VMMVMKCall_Call


Cause

On the AMD EPYC Series 7002, a performance counter NMI can arrive slightly late, sometimes a few microseconds after such NMIs have been disabled. In ESXi 6.7 U3 and 6.5 EP15, the VMkernel cannot diagnose the cause of such a late NMI and incorrectly classifies it as a fatal error reported by the motherboard, causing the host to fail.

Resolution

This issue is resolved in ESXi 6.5 P04 (ESXi650-201912002) which can be found at VMware Downloads
This issue is resolved in ESXi 6.7 P01 (ESXi670-201912001) which can be found at VMware Downloads

Workaround:
To work around this issue, avoid using the VPMC and vmkstats features.

Alternatively, you can ignore the NMI with advanced configuration option /Misc/NMILint1IntAction setting it to 3.  

Warning: This can lead to data corruption as undiagnosed NMIs will be logged and ignored.