Reconnecting postgres-0 and postgres-1 DB fails if restarted
search cancel

Reconnecting postgres-0 and postgres-1 DB fails if restarted

book

Article ID: 345164

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:
In a vRA (or vRO) 8.0.x and 8.1.x cluster where postgres-2 pod has become the primary database, you experience these symptoms:
  • postgres-0 and postgres-1 pods fails to reconnect to it if they are restarted.
  • This affects the failover capability as there is no other standby node to be promoted if the primary fails.


Environment

VMware vRealize Automation 8.1.x
VMware vRealize Automation 8.0.x
VMware vRealize Orchestrator 8.0.x
VMware vRealize Orchestrator 8.1.x

Cause

This issue occurs due to a known bug in the find_current_master() function in the /scripts/utils.sh file in the postgres pods.

Resolution

To resolve this issue:
  1. Edit the postgres-scripts configmap in the prelude kubernetes namespaces through the kubectl edit configmap -n prelude postgres-scripts command.
  2. Apply these changes:

    In the find_current_master function definition in the file, replace the line:

    local CLUSTER=$(ssh postgres@"${LINE}" repmgr cluster show --csv || true)

    with this line, while observing and keeping the existing indentation.

    local CLUSTER=$(ssh postgres@"${LINE}" repmgr cluster show --csv </dev/null 2>/dev/null || true)
     
  3. This should resolve the issue live and postgres-0 and postgres-1 will be able to connect to postgres-2.
  4. After this is confirmed, the fix should be made persistent by editing the content of /opt/charts/postgres/templates/scripts/configmap.yaml on each node, applying the same change.