Replica nodes are out-of-sync in the VAMI Cluster page and can not be reset
search cancel

Replica nodes are out-of-sync in the VAMI Cluster page and can not be reset

book

Article ID: 319628

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:
  • In the VAMI Cluster page, the Replica appliances are reported as Down (State column) and/or N/A (Valid column).
  • After clicking on the Reset button from the Primary appliance the Replicas are not being fixed.
  • The following error message can be observed on the /storage/db/pgdata/pg_log/postgres.csv log on the Replicas:
YYYY-MM-DD HH:MM:SS.MS UTC,"vcac_replication","vcac",40314,"127.0.0.1:36268",5c7e8955.9d7a,1,"",2019-03-05 14:36:05 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""
YYYY-MM-DD HH:MM:SS.MS UTC,,,5785,,5c7e839a.1699,300,,2019-03-05 14:11:38 UTC,,0,LOG,00000,"invalid resource manager ID 58 at 0/DC298988",,,,,,,,,""


Manually running the command "vcac-vami psql-set-replica -M <master-node-ip>" from the Replica results in the following error:

Located file '/etc/vr/psql/db.env' to preset current environment settings for the database.
[YYYY-MM-DD HH:MM:SS] [root] [INFO] Default settings are overridden!
[YYYY-MM-DD HH:MM:SS] [root] [INFO] <psql-set-replica> Replication user is: <vcac_replication>
[YYYY-MM-DD HH:MM:SS] [root] [INFO] <psql-set-replica> Alter current Postgres user role for replication
[YYYY-MM-DD HH:MM:SS] [root] [INFO] <psql-set-replica>: psql-set-replica cannot be run in parallel on the same node!


Environment

VMware vRealize Automation 7.5.x
VMware vRealize Automation 7.6.x

Cause

This is a corner case of the psql-manager service not being able to remove a lock file during the Reset operation.

Resolution

To work around the issue:
  1. Perform the following steps through an SSH session on the failing Replica appliances:
    • Stop the psql-manager service by running : service psql-manager stop
    • Remove the temporary lock file by running: rm /tmp/psql-set-replica
  2. Click on the Reset button for the failing Replica nodes from the Primary appliance or from the Replicas directly.
  3. Start the psql-manager service by running : service psql-manager start