Reconnecting postgres-0 and postgres-1 DB fails if restarted

search cancel

book

calendar_today

VMware Aria Suite

Symptoms:
In a vRA (or vRO) 8.0.x and 8.1.x cluster where postgres-2 pod has become the primary database, you experience these symptoms:

postgres-0 and postgres-1 pods fails to reconnect to it if they are restarted.
This affects the failover capability as there is no other standby node to be promoted if the primary fails.

VMware vRealize Automation 8.1.x
VMware vRealize Automation 8.0.x
VMware vRealize Orchestrator 8.0.x
VMware vRealize Orchestrator 8.1.x

This issue occurs due to a known bug in the find_current_master() function in the /scripts/utils.sh file in the postgres pods.

To resolve this issue:

Edit the postgres-scripts configmap in the prelude kubernetes namespaces through the kubectl edit configmap -n prelude postgres-scripts command.
Apply these changes:

In the find_current_master function definition in the file, replace the line:

local CLUSTER=$(ssh postgres@"${LINE}" repmgr cluster show --csv || true)

with this line, while observing and keeping the existing indentation.

local CLUSTER=$(ssh postgres@"${LINE}" repmgr cluster show --csv </dev/null 2>/dev/null || true)
This should resolve the issue live and postgres-0 and postgres-1 will be able to connect to postgres-2.
After this is confirmed, the fix should be made persistent by editing the content of /opt/charts/postgres/templates/scripts/configmap.yaml on each node, applying the same change.

thumb_up Yes

thumb_down No