NMI watchdog: watchdog detected hard LOCKUP on cpu for NSX Managers
search cancel

NMI watchdog: watchdog detected hard LOCKUP on cpu for NSX Managers

book

Article ID: 418201

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • In the vSphere VM console of the affected NSX managers you see the error:
    "NMI watchdog: Watchdog detected hard LOCKUP on cpu #" 
  • NSX manager or entire cluster is unavailable through UI
  • You might see an error getting the cluster status as below:
    nsxmanager> get cluster status

    % The get cluster status operation cannot be processed currently, please try again later

  • Cluster might be degraded or down if only 1 manager affected in a cluster of 3.

Environment

VMware NSX

Cause

A soft lockup (or rarely even a hard lockup) timeout can also occur if the Linux system is running in a virtual machine and the hypervisor does not schedule the guest for a prolonged time. This is not specific to NSX environments.

To find the cause, ESXi and if relevant, also the vSAN logs need to be analysed.

Resolution

If the underlying issue is resolved, a restart of the NSX manager appliance can be performed one at a time, until the cluster shows STABLE. 

Additional Information

NSX Manager node(s) become unresponsive and the VM console shows errors like "BUG: soft lockup - CPU#<##> stuck for <##>s!..."
Error: "kernel: BUG: soft lockup - CPU#Y stuck for Xs" within VM
Edge failovers caused by CPU lockups on the Edge leading to the BFD tunnels\process to time out