Replica nodes are out-of-sync in the VAMI Cluster page and can not be reset

search cancel

Replica nodes are out-of-sync in the VAMI Cluster page and can not be reset

book

Article ID: 319628

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

Symptoms:

In the VAMI Cluster page, the Replica appliances are reported as Down (State column) and/or N/A (Valid column).
After clicking on the Reset button from the Primary appliance the Replicas are not being fixed.
The following error message can be observed on the /storage/db/pgdata/pg_log/postgres.csv log on the Replicas:

YYYY-MM-DD HH:MM:SS.MS UTC,"vcac_replication","vcac",40314,"127.0.0.1:36268",5c7e8955.9d7a,1,"",2019-03-05 14:36:05 UTC,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""
YYYY-MM-DD HH:MM:SS.MS UTC,,,5785,,5c7e839a.1699,300,,2019-03-05 14:11:38 UTC,,0,LOG,00000,"invalid resource manager ID 58 at 0/DC298988",,,,,,,,,""

Manually running the command "vcac-vami psql-set-replica -M <master-node-ip>" from the Replica results in the following error:

Located file '/etc/vr/psql/db.env' to preset current environment settings for the database.
[YYYY-MM-DD HH:MM:SS] [root] [INFO] Default settings are overridden!
[YYYY-MM-DD HH:MM:SS] [root] [INFO] <psql-set-replica> Replication user is: <vcac_replication>
[YYYY-MM-DD HH:MM:SS] [root] [INFO] <psql-set-replica> Alter current Postgres user role for replication
[YYYY-MM-DD HH:MM:SS] [root] [INFO] <psql-set-replica>: psql-set-replica cannot be run in parallel on the same node!

Environment

VMware vRealize Automation 7.5.x
VMware vRealize Automation 7.6.x

Cause

This is a corner case of the psql-manager service not being able to remove a lock file during the Reset operation.

Resolution

To work around the issue:

Perform the following steps through an SSH session on the failing Replica appliances:
- Stop the psql-manager service by running : service psql-manager stop
- Remove the temporary lock file by running: rm /tmp/psql-set-replica
Click on the Reset button for the failing Replica nodes from the Primary appliance or from the Replicas directly.
Start the psql-manager service by running : service psql-manager start

Feedback

thumb_up Yes

thumb_down No