VMware Aria Automation 8.x
If it is determined that a node is faulty and we need to remove and rejoin the node in the cluster, take the following steps.
kubectl get pod `vracli status | jq -r '.databaseNodes[] | select(.["Role"] == "primary") | .["Node name"]' | cut -d '.' -f 1` -n prelude -o wide --no-headers=true
example:postgres-0 1/1 Running 0 39h ##.###.#.## healthy_node-fqdn-xxx-xx.company.com <none> <none>
If the faulty node has a damaged etcd
database or other Kubernetes elements, even after being removed from the cluster, then you can reset the k8s system by running this command on the faulty node:
This can allow the faulty node to join the cluster in cases where the vracli cluster join command above hangs indefinitely (giving no output after 10-15 minutes).