Pods fail to come up in Aria Automation 8.x with "ERROR Release 'Pod Name' in namespace 'prelude' failed to come up"
search cancel

Pods fail to come up in Aria Automation 8.x with "ERROR Release 'Pod Name' in namespace 'prelude' failed to come up"

book

Article ID: 315494

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

  • Execution of deploy.sh fails with deploy.log showing:
    Exit code of install/update of release postgres is 0
    + return 0
    [yyyy-mm-dd hh:mm:ss] ERROR Release 'Pod Name' in namespace 'prelude' failed to come up
  • Running the command  "kubectl describe pod <Pod-Name-pod_id> -n prelude" for each of the pods in pending status shows a message at the bottom similar to: 
    "Warning FailedScheduling 10m default-scheduler 0/N node(s) are available: N node(s) were unschedulable."
    Where "N" could be 1 - 3, depending if it is a single node or a 3 node cluster and depending on how many nodes are unscheduled.
  • Running the command "kubectl get nodes" on each node  shows a SchedulingDisabled status:

    NAME                STATUS                   ROLES           AGE             VERSION
    AA-Node1   Ready,SchedulingDisabled   control-plane,master   77d   v1.20.11-1+1f8a47eae6d024-dirty
    AA-Node2   Ready,SchedulingDisabled   control-plane,master   77d   v1.20.11-1+1f8a47eae6d024-dirty
    AA-Node3   Ready,SchedulingDisabled   control-plane,master   77d   v1.20.11-1+1f8a47eae6d024-dirty

Environment

Aria Automation 8.X

Cause

  • In the result above, the "Status" field shows "SchedulingDisabled," which indicates that Kubernetes on the Aria Automation nodes is in Maintenance Mode.
  • In such a scenario, the existing pods would continue running, unless stopped manually however, no new pod creation would be possible on the K8s node. 
  • Recent resource constraints like low disk space or memory which put the pods in Maintenance Mode may lead the cluster into such a state. 

Resolution

  • To revive the Kubernetes cluster node from the "Unshedulable" state, follow the below steps:
  • SSH into any one AA node
  • To bring Kubernetes Out Of Maintenance Mode and make K8s schedulable again, execute the following commands:

    • In case of a 3 node cluster
      kubectl uncordon <AAnode1fqdn>
      kubectl uncordon <AAnode2fqdn>
      kubectl uncordon <AAnode3fqdn>

    • In case of a single Node instance :
      kubectl uncordon <AAnodefqdn>
  • Run the command "Kubectl get nodes" to verify that the STATUS field now only shows "Ready"
  • After completing this, the pods should begin running again. If they still do not start, run the following command to rebuild the pods:
    /opt/scripts/deploy.sh

Additional Information

Pods do not function or deliver services while in Maintenance Mode. Therefore, they need to be removed from Maintenance Mode for AA to operate correctly.