Aria Automation / Orchestrator Node stuck when booting up with error message message: "Soft lockup - CPU stuck for 30s"
search cancel

Aria Automation / Orchestrator Node stuck when booting up with error message message: "Soft lockup - CPU stuck for 30s"

book

Article ID: 398814

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

  • Aria Orchestrator node marked as Not Ready when viewing from kubectl get node 
  • The Aria Orchestrator node is not accessible by SSH.
  • When viewing the node from remote console, we see message: "Soft lockup - CPU stuck for 30s"

Environment

VMware Aria Automation 8.x

VMware Aria Automation Orchestrator 8.x 

Cause

  • This could be caused if the node was stunned due to high resource utilization on the host.
  • As the node was stunned when booting up, the node would not be accessible via SSH and it's pods and services would therefore be impacted. 

Resolution

  • Validate in vCenter, that the VM is not having a high CPU utilization,.
    • Validate that enough resources are present on the host on which the vm resides. 
    • Validate KB395309
  • Take Non-memory snapshots of the Aria Automation / Orchestrator Cluster nodes.
  • Power off the Aria Automation / Orchestrator node from vCenter.
  • Power on the Aria Automation / Orchestrator node from vCenter.
    • Validate that the node boots up successfully now as the CPU utilization would have now refreshed.
  • Wait for the first boot to complete - monitor status of command from an SSH session to the node:
    • watch -d vracli status first-boot
  • Run the below command to validate that all three nodes In the cluster are marked as Ready:
    • kubectl get nodes,
  • Initiate a rebuild of the pods :
    • /opt/scripts/deploy.sh
  • Wait for the script execution to complete.
  • The execution can also be tracked using the command :
    • watch -d kubectl get pods -n prelude.
  • Once all the pods are running successfully, validate that the pod deployment has completed successfully using the below command:
    • watch -d vracli status deploy
  • Now attempt to access the UI.