"LINT1 motherboard interrupt" error in an ESX/ESXi host
search cancel

"LINT1 motherboard interrupt" error in an ESX/ESXi host

book

Article ID: 333947

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0

Issue/Introduction

  • ESXi/ESX hosts are unstable and may fail with a purple diagnostic screen citing an NMI, Non-Maskable, or LINT1 Interrupt.
     
  • The console displays an entry similar to:

    LINT1/NMI (motherboard nonmaskable interrupt), undiagnosed. This may be a hardware problem; please contact your hardware vendor.
     
  • A purple screen may occur when passing through a device to a virtual machine and when reviewing the vmkernel core dump, these events will be seen:

    WARNING: IOMMUIntel: 2211: IOMMU Unit # 0: R/W = 1, Device 007:00.0 Faulting PA = 0xdf63e000 Fault Reason = 6

    Note: NMI log entries appear in the /var/log/vmkernel.log file, on the console, or in the VMkernel core dump file if the condition triggers a VMkernel purple diagnostic screen.

Environment

VMware ESXi - All versions

Cause

An NMI is a physical hardware event. It is typically the result of a non-recoverable condition (in the context of continued operation during that specific boot cycle) that the system BIOS and/or management chipset encounters.

NMI events are routed by the CPU through the Advanced Programmable Interrupt Controller (APIC) to the operating system (in this case, the ESXi host) through the operating system kernel (in this case, the VMkernel). 

An NMI event occurs due to hardware issues such as:

  • A PCI bus error, typically caused by a misbehaving I/O device or an electrical glitch.
  • A bad memory module or processor.
  • Severe thermal cycling of a critical component, usually after an extended downtime or a cooling component failure.
  • Components running out-of-specification, such as an over-voltage or under-voltage condition due to hardware fault involving a voltage regulator module.
  • Unapproved or incompatible components, such as an active memory backplane whose design revision is too early for the chassis.
  • A firmware, BIOS or other component mismatch. For example, such as option-card of revision X requiring a minimum option-card firmware revision Y and a minimum chassis BIOS revision Z.
  • On some systems, the CPU IOMMU feature that is used to map the DMA memory for a device from the host operating system to the guest operating system is configured by firmware to raise an NMI when it encounters an error, instead of allowing the operating system to catch and diagnose the error.  IOMMU errors are typically caused by misbehaving I/O device drivers or firmware.

Resolution

If an NMI event is experienced:

  • Identify the virtual machines (if any) were powered on at the time of the NMI event.
  • Check if powering on a specific virtual machine triggers an NMI event.
  • Reseat the PCI cards and/or move them to different slots.

To resolve the NMI event, contact the hardware vendor and provide the below data:

  • Timeframe that the event happened.
  • At least 10 minutes of logs leading up to the event.
  • Chassis diagnostics log output and management chipset log output.
  • Chassis vital product data.
  • A copy of the vm-support output.

Notes:

  • Chassis management chipsets often function as an intelligent handler for chassis faults and can capture significant amounts of information during an NMI event.
  • The IBM xSeries chassis includes a BIOS option of Reboot on System NMI. When enabled, this results in an immediate chassis-reboot rather than a chassis-halt. In this event the ESXi host logs do not mention the NMI. Other enterprise hardware vendors may offer a similar BIOS option.

Additional Information