Error "watchdog: BUG: soft lockup - CPU## stuck for ##s![containerd-##]"
search cancel

Error "watchdog: BUG: soft lockup - CPU## stuck for ##s![containerd-##]"

book

Article ID: 422871

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

  • VM lost it's IP address and it is not responsive and control plane changes it's status to 'not ready' state
  • SSH works but VM console show following error:
watchdog: BUG: soft lockup - CPU## stuck for 20s![containerd-###]
watchdog: BUG: soft lockup - CPU#4 stuck for 16s! [node_exporter:####]
watchdog: BUG: soft lockup - CPU#1 stuck for 12s! [kworker/1:1:####] 
watchdog: BUG: soft lockup - CPU#4 stuck for 32s! [node_exporter:####] 
watchdog: BUG: soft lockup - CPU#0 stuck for 33s! [runc:#######] 
watchdog: BUG: soft lockup - CPU#1 stuck for 31s! [containerd-shim:#######]
watchdog: BUG: soft lockup - CPU#1 stuck for 54s! [kthreadd:#######]

Environment

3.x

Cause

  • The kubelet agent must send a status update (Heartbeat) to the API Server every 10 seconds.

  • This internal Kubelet loop checks the status of container runtimes. It is very CPU intensive.

  • Under starvation, Pod lifecycle becomes unhealthy. The Kubelet fails to construct the HTTPS request to the Control Plane.

  • The Control Plane marks the node NotReady or Unknown. Controller may fail to retrieve node metadata (ExternalIP/InternalIP) and remove those fields from the Kubernetes Node object, causing the IP to disappear. 

 

Resolution

  • If the Worker node running photon 3 ( kernel version 4)
    • Re-start the VM to exit from CPU starvation.

  • If the worker node running  photon 5 ( kernel version 6 )