When attempting to connect between nodes in a VMware Identity Manager (vIDM) cluster via SSH as the postgres user, the connection fails and unexpectedly prompts for a password.
This indicates that password-less SSH authentication is broken.
To restore database replication and cluster health, the public RSA keys must be manually reconstructed and added back to the authorized_keys file for the postgres user on every node.
VMware Identity Manager 3.3.7
Patched environments (starting with CSP-102092) where the embedded PostgreSQL database has been upgraded from version 9.6 to version 14.
Note: The steps below apply to older versions as well, but the postgres user's home directory path will differ. You can verify the home directory path by running
grep postgres /etc/passwd.
This issue is generally caused by a failure or desynchronization event within the vIDM cluster, which can corrupt or overwrite the authorized_keys file required for secure, password-less communication between the database nodes.
To resolve this issue, you will collect the public keys from all three nodes, combine them into a single authorized_keys file, and distribute this corrected file back to all nodes.
Prerequisites: * Take an offline snapshot of all 3 vIDM nodes before proceeding.
Identify your nodes by hostname or FQDN (referred to below as <node0>, <node1>, and <node2>).
Execute all following steps from <node0> as the root user from the /root directory.
Step 1: Backup the existing authorized keys Run the following commands from <node0> to rename the existing authorized_keys files on all three nodes:
ssh <node0> "mv /var/vmware/vpostgres/14/.ssh/authorized_keys /var/vmware/vpostgres/14/.ssh/authorized_keys.bkup"
ssh <node1> "mv /var/vmware/vpostgres/14/.ssh/authorized_keys /var/vmware/vpostgres/14/.ssh/authorized_keys.bkup"
ssh <node2> "mv /var/vmware/vpostgres/14/.ssh/authorized_keys /var/vmware/vpostgres/14/.ssh/authorized_keys.bkup"
Step 2: Collect the postgres public keys Retrieve the id_rsa.pub key from each node and save them locally on <node0>:
scp <node0>:/var/vmware/vpostgres/14/.ssh/id_rsa.pub id_rsa.pub.0
scp <node1>:/var/vmware/vpostgres/14/.ssh/id_rsa.pub id_rsa.pub.1
scp <node2>:/var/vmware/vpostgres/14/.ssh/id_rsa.pub id_rsa.pub.2
Step 3: Build the new authorized_keys file Concatenate the three collected public keys into a single file:
cat id_rsa.pub.0 id_rsa.pub.1 id_rsa.pub.2 > authorized_keys
Step 4: Distribute the new file Copy the newly created authorized_keys file to the postgres .ssh directory on all three nodes:
scp authorized_keys <node0>:/var/vmware/vpostgres/14/.ssh/
scp authorized_keys <node1>:/var/vmware/vpostgres/14/.ssh/
scp authorized_keys <node2>:/var/vmware/vpostgres/14/.ssh/
Step 5: Correct file ownership Ensure the postgres user owns the new file on all nodes:
ssh <node0> "chown postgres:root /var/vmware/vpostgres/14/.ssh/authorized_keys"
ssh <node1> "chown postgres:root /var/vmware/vpostgres/14/.ssh/authorized_keys"
ssh <node2> "chown postgres:root /var/vmware/vpostgres/14/.ssh/authorized_keys"
Step 6: Correct file permissions Apply the required 640 security permissions to the file on all nodes:
ssh <node0> "chmod 640 /var/vmware/vpostgres/14/.ssh/authorized_keys"
ssh <node1> "chmod 640 /var/vmware/vpostgres/14/.ssh/authorized_keys"
ssh <node2> "chmod 640 /var/vmware/vpostgres/14/.ssh/authorized_keys"
Step 7: Verify Password-less SSH Switch to the postgres user on <node0> and verify that you can SSH into every node (including itself) without a password prompt:
su - postgres
ssh <node0>
exit
ssh <node1>
exit
ssh <node2>
exit
Repeat this verification process starting from <node1> and <node2> to ensure full mesh connectivity.
Step 8: Clean up temporary files Once validation is successful, remove the temporary key files generated on <node0>:
exit # (To return to the root user if you are still logged in as postgres)
rm id_rsa.pub.0 id_rsa.pub.1 id_rsa.pub.2 authorized_keys
(Optional) You may now run the vidm-cluster-validation.sh script to confirm overall cluster health.