NSX Manager node becomes unresponsive and its VM console has errors like "BUG: soft lockup - CPU# stuck for #s!"
search cancel

NSX Manager node becomes unresponsive and its VM console has errors like "BUG: soft lockup - CPU# stuck for #s!"

book

Article ID: 441504

calendar_today

Updated On:

Products

VMware NSX VMware vSphere ESXi

Issue/Introduction

  • In the NSX Manager UI from a working node, the cluster will show it is in a DEGRADED state under System > Appliances.



  • A VM console on the affected VM shows the CPU stuck for x seconds messages like in the screenshot below.



  • The following log is observed from the /var/log/kern.log file of the impacted NSX manager VM

    <Time-stamp> <NSX-Manager-hostname> kernel - - - [588###.585###] watchdog: BUG: soft lockup - CPU#20 stuck for 10s! [swapper/1:0]
    <Time-stamp> <NSX-Manager-hostname> kernel - - - [588###.585###] watchdog: BUG: soft lockup - CPU#6 stuck for 9s!

     

  • The following log prints in /var/run/log/vmkernel.log on the ESXi host that the NSX Manager VM is running on indicating an All Paths Down (APD) event has occurred affecting the datastore that the NSX Manager VM's are running on are observed. . 

    2026-05-12T10:50:43.435Z In(182) vmkernel: cpu19:2097454)StorageApdHandlerEv: 106: Device or filesystem with identifier [####-####] has entered the All Paths Down state.
    2026-05-12T10:52:43.441Z In(182) vmkernel: cpu18:2097454)StorageApdHandlerEv: 106: Device or filesystem with identifier [####-####] has entered the All Paths Down state.

Environment

VMware NSX

VMware ESXi

Cause

CPU soft lockups occur when a virtual machine's vCPU is unable to run a new task for more than 20 seconds. Possible causes for this include, but are not limited to, resource (CPU and memory) contention, storage outages (APD/PDL), snapshot quiesce and other VM stun operations, and guest OS kernel bugs.

The cause in this case is an APD event observed on the NSX Managers underlying storage.  

Resolution

Further investigation of the underlying storage is necessary. Either the storage issue causing these errors will need to be addressed, or the NSX Manager VM's can be Storage vMotioned to a stable datastore to avoid this issue in the future. 

Additional Information

Note: Snapshots of NSX Manager appliance VMs should not be taken as explained in the following doc Disable Snapshots on an NSX Appliance. However, because the CPU lockup state can also come about due to snapshot quiesce actions, make sure that no snapshots are present for the NSX Manager VMs. 

See NSX Manager node(s) become unresponsive and the VM console shows errors like "BUG: soft lockup - CPU#<##> stuck for <##>s!..." for another cause of this error.