Cause and possible remedies for kernel: BUG: soft lockup - CPU#Y stuck for XXs!

book

Article ID: 170185

calendar_today

Updated On:

Products

Endpoint Detection and Response Advanced Threat Protection Platform

Issue/Introduction

On ATP Manager at the terminal or in the /var/log/message file, the following message appears:

kernel: BUG: soft lockup - CPU#Y stuck for XXs!

where Y is one of the CPU numbers and XX is an amount of time.  

On the appliance hardware, this is indicative of a kernel bug.  However on the virtual appliance, this can appear for many reasons.  Generally, a lack of resources on the real host means it cannot service the virtual CPUs in a timely fashion and the guest OS watchdog notices this and logs the error message.

 

kernel: BUG: soft lockup - CPU#Y stuck for XXs!

where Y is one of the CPU numbers and XX is an amount of time.

Cause

Possible causes include the following:

1. The VM is undergoing a snapshot with RAM recording.
2. The VM is on an "over committed" host with insufficient RAM, CPU, or disk throughput to support the guests.
3. Some other resource intensive host activity.
4. The ATP virtual host itself is under a heavy load from either network traffic or operations, such as backup and restore.

Environment

Hardware appliances and virtual appliances.

Resolution

Possible solutions include the following:

1. Identify if there is any activity on the host computer that might be using a lot of resources.  Is there a spike in storage or latency in the performance counters?
2. Has network traffic to the ATP VM spiked or has it been running high for a period of time?
3. Are there any ATP maintenance operations ongoing?

 

VMware offers extensive documentation on troubleshooting and improving datastores.  Some suggestions are as follows:

https://kb.vmware.com/selfservice/microsites/search.do? language=en_US&cmd=displayKC&externalId=2013160

https://kb.vmware.com/selfservice/microsites/search.do? language=en_US&cmd=displayKC&externalId=1006821

https://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vsphere.monitoring.doc_50%2FGUID-E813116C-9D72-4464-BF3E-1B19F70F45BE.html