Aria Automation postgres pods show "ERROR: no node information was found please supply a configuration file"

search cancel

Aria Automation postgres pods show "ERROR: no node information was found please supply a configuration file"

book

Article ID: 427030

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

Aria Automation Postgres pods (postgres-0, postgres-1 and postgres-2 in a 3 node cluster) are in "Running" state with "0/1" showing as "Ready".

This can be verified by connecting to any of the Aria Automation appliances using SSH and running the following command:

kubectl get pods -n prelude -o wide --selector=app=postgres

This should result in output like:

NAME                                              READY   STATUS                  RESTARTS   AGE    IP             NODE                     NOMINATED NODE   READINESS GATES
postgres-0                                        0/1     Running                 #          #d    ###.###.###.###    server1.example.com   <none>           <none>
postgres-1                                        0/1     Running                 #          #d    ###.###.###.###    server2.example.com   <none>           <none>
postgres-2                                        0/1     Running                 #          #d   ###.###.###.###   server3.example.com  <none>           <none>

When we query the logs for a postgres pod using:

kubectl logs postgres-#

We consistently see the following error:

"ERROR: no node information was found please supply a configuration file"

We further confirm the issue when we see:

 "No active masters found"

Environment

Aria Automation 8.18.x

Cause

Infrastructural network issues such as a change in DNS, NTP etc. can cause this issue where none of the postgres nodes are elected as the Primary database node in the cluster.

Resolution

The election of Primary nodes are handled by the Postgres Replication Manager Daemon (repmgrd).

On all three nodes under the Postgres Database path ("/data/db/live/") check that they include the flag file ("standby.signal"), which would indicate that all 3 nodes are Standby nodes, and no Primary database node is assigned and/or available to Replication Manager.

Identify which database has the most up to date copy of the database.]

Back up then "standby.signal" on each node:

cp -p /data/db/live/standby.signal /home/root/standby.signal.bak

Remove the "standby.signal" from the "/data/db/live" path on the node with the most up to date copy of the database (this will become the Primary database node):

rm /data/db/live/standby.signal

Re-deploy the Aria Automation services using:

/opt/scripts/deploy.sh

Feedback

thumb_up Yes

thumb_down No