Aria Automation nodes in not ready state and deploy.sh fails
search cancel

Aria Automation nodes in not ready state and deploy.sh fails

book

Article ID: 377044

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

  • Unable to shut down Aria Automation by running /opt/scripts/deploy.sh --shutdown
  • deploy.sh script fails with below error: 

Running check eth0-ip

Running check node-name

Running check non-default-hostname

Running check single-aptr

Running check nodes-ready
make: *** [/opt/health/Makefile:56: nodes-ready] Error 1
Running check nodes-count

Running check fips

make: Target 'deploy' not remade because of errors.

 

  • Running kubectl get nodes shows one node in a NotReady state
  • Running kubectl -n prelude get pods -o wide shows the postgres-0 pod in a pending state
  • Running kubectl describe nodes | grep "Name:\|Taints:" shows that the node where postgres-0 is running is tainted


Environment

  • Aria Automation 8.x three node cluster

Cause

  • One of the nodes is tainted and therefore in a NotReady state. This causes the health check scripts to fail when deploy.sh is run

Resolution

  • Work around this issue by removing the taint.
  • Workaround Steps: 
     
    • Run kubectl get nodes to determine which node is a NotReady state
    • Run kubectl -n prelude get pods -o wide to verify that the postgres-0 pod in a pending state
    • Run kubectl describe nodes | grep "Name:\|Taints:" verify that the node where postgres-0 is running is tainted

root@servername01 [ ~ ]# kubectl describe nodes | grep "Name:\|Taints:"
Name:               servername01.example.local
Taints:             node.kubernetes.io/unreachable:NoSchedule
Name:               servername02.example.local
Taints:             <none>
Name:               servername03.example.local
Taints:             <none>

    • Run this to remove the taint (replace the relevant servername in the command with the tainted node in your environment from the above commands):

      kubectl taint nodes servername01.example.local node.kubernetes.io/unreachable:NoSchedule-
    • Run this again to verify that no nodes are tainted:

      kubectl describe nodes | grep "Name:\|Taints:" 

It should now show that there is no taint:

root@servername01 [ ~ ]# kubectl describe nodes | grep "Name:\|Taints:"
Name:               servername01.example.local
Taints:             <none>
Name:               servername02.example.local
Taints:             <none>
Name:               servername03.example.local
Taints:             <none>

Note: In some cases the taint may still show up, rerun the command until the taint no longer shows on the affected node or any of the nodes. 

    • Now run kubectl get nodes
    • After all three nodes show as "Ready", the shutdown command can be run again. 

/opt/scripts/deploy.sh --shutdown