Error: "kernel: BUG: soft lockup - CPU#Y stuck for Xs" within VM
search cancel

Error: "kernel: BUG: soft lockup - CPU#Y stuck for Xs" within VM

book

Article ID: 170185

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

On VM at the terminal or in the /var/log/message file, the following message appears:

kernel: BUG: soft lockup - CPU#Y stuck for Xs!

where Y is one of the CPU cores and X is an amount of time.  

Environment

Virtual Machines

Cause

Possible causes but not limited to the following include:

  1. The VM is undergoing a snapshot with quiesce.
  2. The VM is on an "over committed" host with insufficient RAM, CPU, or disk throughput to support the guests.
  3. Some other resource intensive host activity.

 A soft lockup occurs when a virtual machine's vCPU is unable to run a new task for more than 20 seconds. 

Soft lockups can cause VMs to become unresponsive for short periods of time and trigger application timeouts or failover. VMs that are experiencing a soft lockup might also have unusually high or unusually low CPU utilization, depending on the exact cause of the soft lockup.

Resolution

Possible solutions include the following:

  • Investigate the ESXi host and review CPU contention possibilities.  If a host is overcommited, the VM may struggle to have access to the resources it needs.
  • In some cases the soft lockup timer value can be increased within the guest OS.  This increases the time before a soft lockup is triggered, effectively disabling its sensitivity; however, this is generally not recommended as it can mask underlying issues and should only be used for troubleshooting purposes in a controlled environment.
    • Adjust the soft lockup timeout value:
      • Access the file: Login to Guest OS as root via SSH/Terminal and navigate to /proc/sys/kernel/softlockup_thresh.
      • Modify the value: (this may vary by linux distribution, consult the linux distributions documention before making any changes)
        • echo new_timeout_value > /proc/sys/kernel/softlockup_thresh
          Note: Replace "new_timeout_value" with the desired time in seconds before a soft lockup is triggered. 

Additional Information

Alternative approaches to managing soft lockups:

  • Analyze system logs and performance metrics to identify the root cause of the soft lockup, such as a resource bottlenecks. 
  • Adjust application settings or resource allocation to minimize potential CPU intensive operations. 
  • Update to the latest kernel version as it may include bug fixes related to soft lockup detection.