Pods fail to come up in Aria Automation 8.x with "ERROR Release 'Pod Name' in namespace 'prelude' failed to come up"

search cancel

Pods fail to come up in Aria Automation 8.x with "ERROR Release 'Pod Name' in namespace 'prelude' failed to come up"

book

Article ID: 315494

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

Execution of deploy.sh fails with deploy.log showing:
Exit code of install/update of release postgres is 0
+ return 0
[yyyy-mm-dd hh:mm:ss] ERROR Release 'Pod Name' in namespace 'prelude' failed to come up
Running the command "kubectl describe pod <Pod-Name-pod_id> -n prelude" for each of the pods in pending status shows a message at the bottom similar to:
"Warning FailedScheduling 10m default-scheduler 0/N node(s) are available: N node(s) were unschedulable."Where "N" could be 1 - 3, depending if it is a single node or a 3 node cluster and depending on how many nodes are unscheduled.
Running the command "kubectl get nodes" on each node shows a SchedulingDisabled status:

NAME STATUS ROLES AGE VERSION
AA-Node1 Ready,SchedulingDisabled control-plane,master 77d v1.20.11-1+1f8a47eae6d024-dirty
AA-Node2 Ready,SchedulingDisabled control-plane,master 77d v1.20.11-1+1f8a47eae6d024-dirty
AA-Node3 Ready,SchedulingDisabled control-plane,master 77d v1.20.11-1+1f8a47eae6d024-dirty

Environment

Aria Automation 8.X

Cause

In the result above, the "Status" field shows "SchedulingDisabled," which indicates that Kubernetes on the Aria Automation nodes is in Maintenance Mode.
In such a scenario, the existing pods would continue running, unless stopped manually however, no new pod creation would be possible on the K8s node.
Recent resource constraints like low disk space or memory which put the pods in Maintenance Mode may lead the cluster into such a state.

Resolution

To revive the Kubernetes cluster node from the "Unshedulable" state, follow the below steps:
SSH into any one AA node
To bring Kubernetes Out Of Maintenance Mode and make K8s schedulable again, execute the following commands:
- In case of a 3 node cluster
  kubectl uncordon <AAnode1fqdn>
  kubectl uncordon <AAnode2fqdn>
  kubectl uncordon <AAnode3fqdn>
- In case of a single Node instance :
  kubectl uncordon <AAnodefqdn>
Run the command "Kubectl get nodes" to verify that the STATUS field now only shows "Ready"
After completing this, the pods should begin running again. If they still do not start, run the following command to rebuild the pods:
/opt/scripts/deploy.sh

Additional Information

Pods do not function or deliver services while in Maintenance Mode. Therefore, they need to be removed from Maintenance Mode for AA to operate correctly.

Further information can be used for Troubleshooting "CrashLoopBackOff" Status for Pods in Aria Automation 8.X

Feedback

thumb_up Yes

thumb_down No