In a clustered VMware Aria Suite Lifecycle environment managing VMware Identity Manager, the PCP recovery operation may fail with error:
LCMVIDM74055Unable to perform pcp recovery on the host. Check if there exists a primary node in the set-up.Unable to recover all the postgres nodes which are marked down. Ensure the nodes are powered on and delegateIp is assigned to primary node.
This issue occurs when there is an inconsistency between the Pgpool watchdog leader and the actual Postgres primary node in the vIDM cluster.
In a healthy cluster:
The node reported as MASTER by the pcp_watchdog_info command must match the node reported as primary in the show pool_nodes output
However, in the affected environment:
pcp_watchdog_info reports vIDM01 as MASTERshow pool_nodes reports vIDM03 as PRIMARY
This mismatch indicates a cluster state inconsistency, typically caused by an unclean failover, improper service restart sequence, delegate IP misalignment, or watchdog election inconsistency.
VMware Identity Manager 3.3.7
This issue occurs when a split-brain condition develops in the Postgres cluster, where the pgpool watchdog master node and the Postgres primary node are different systems.
Identify the master node:
su root -c "echo -e 'password'|/usr/local/bin/pcp_watchdog_info -p 9898 -h localhost -U pgpool"
Verify the current primary node using the below command:
su root -c "echo -e 'password'|/opt/vmware/vpostgres/current/bin/psql -h localhost -p 9999 -U pgpool postgres -c \"show pool_nodes\""
Since the master and primary are different, run the following on the two non-master nodes:
/etc/init.d/vpostgres stop
Confirm that show pool_nodes now reflects the master node as primary.
Restart Postgres on the other nodes:
/etc/init.d/vpostgres start
Verify that the cluster is stabilized and confirm that the vIDM UI is accessible.