Aria Automation Postgres pods (postgres-0, postgres-1 and postgres-2 in a 3 node cluster) are in "Running" state with "0/1" showing as "Ready".
This can be verified by connecting to any of the Aria Automation appliances using SSH and running the following command:
kubectl get pods -n prelude -o wide --selector=app=postgres
This should result in output like:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
postgres-0 0/1 Running # #d ###.###.###.### server1.example.com <none> <none>
postgres-1 0/1 Running # #d ###.###.###.### server2.example.com <none> <none>
postgres-2 0/1 Running # #d ###.###.###.### server3.example.com <none> <none>
When we query the logs for a postgres pod using:
kubectl logs postgres-#
We consistently see the following error:
"ERROR: no node information was found please supply a configuration file"
We further confirm the issue when we see:
"No active masters found"
Aria Automation 8.18.x
Infrastructural network issues such as a change in DNS, NTP etc. can cause this issue where none of the postgres nodes are elected as the Primary database node in the cluster.
The election of Primary nodes are handled by the Postgres Replication Manager Daemon (repmgrd).
On all three nodes under the Postgres Database path ("/data/db/live/") check that they include the flag file ("standby.signal"), which would indicate that all 3 nodes are Standby nodes, and no Primary database node is assigned and/or available to Replication Manager.
Identify which database has the most up to date copy of the database.]
Back up then "standby.signal" on each node:
cp -p /data/db/live/standby.signal /home/root/standby.signal.bak
Remove the "standby.signal" from the "/data/db/live" path on the node with the most up to date copy of the database (this will become the Primary database node):
rm /data/db/live/standby.signal
Re-deploy the Aria Automation services using:
/opt/scripts/deploy.sh