Fixing password-less SSH for the postgres user in a VMware Identity Manager cluster
search cancel

Fixing password-less SSH for the postgres user in a VMware Identity Manager cluster

book

Article ID: 434709

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

When attempting to connect between nodes in a VMware Identity Manager (vIDM) cluster via SSH as the postgres user, the connection fails and unexpectedly prompts for a password.

This indicates that password-less SSH authentication is broken.

To restore database replication and cluster health, the public RSA keys must be manually reconstructed and added back to the authorized_keys file for the postgres user on every node.

Environment

VMware Identity Manager 3.3.7

  • Patched environments (starting with CSP-102092) where the embedded PostgreSQL database has been upgraded from version 9.6 to version 14.

  • Note: The steps below apply to older versions as well, but the postgres user's home directory path will differ. You can verify the home directory path by running

    • grep postgres /etc/passwd.

 

Cause

This issue is generally caused by a failure or desynchronization event within the vIDM cluster, which can corrupt or overwrite the authorized_keys file required for secure, password-less communication between the database nodes.

Resolution

To resolve this issue, you will collect the public keys from all three nodes, combine them into a single authorized_keys file, and distribute this corrected file back to all nodes.

Prerequisites: * Take an offline snapshot of all 3 vIDM nodes before proceeding.

  • Identify your nodes by hostname or FQDN (referred to below as <node0>, <node1>, and <node2>).

  • Execute all following steps from <node0> as the root user from the /root directory.

Step 1: Backup the existing authorized keys Run the following commands from <node0> to rename the existing authorized_keys files on all three nodes:

ssh <node0> "mv /var/vmware/vpostgres/14/.ssh/authorized_keys /var/vmware/vpostgres/14/.ssh/authorized_keys.bkup"
ssh <node1> "mv /var/vmware/vpostgres/14/.ssh/authorized_keys /var/vmware/vpostgres/14/.ssh/authorized_keys.bkup"
ssh <node2> "mv /var/vmware/vpostgres/14/.ssh/authorized_keys /var/vmware/vpostgres/14/.ssh/authorized_keys.bkup"

Step 2: Collect the postgres public keys Retrieve the id_rsa.pub key from each node and save them locally on <node0>:

scp <node0>:/var/vmware/vpostgres/14/.ssh/id_rsa.pub id_rsa.pub.0
scp <node1>:/var/vmware/vpostgres/14/.ssh/id_rsa.pub id_rsa.pub.1
scp <node2>:/var/vmware/vpostgres/14/.ssh/id_rsa.pub id_rsa.pub.2

Step 3: Build the new authorized_keys file Concatenate the three collected public keys into a single file:

cat id_rsa.pub.0 id_rsa.pub.1 id_rsa.pub.2 > authorized_keys

Step 4: Distribute the new file Copy the newly created authorized_keys file to the postgres .ssh directory on all three nodes:

scp authorized_keys <node0>:/var/vmware/vpostgres/14/.ssh/
scp authorized_keys <node1>:/var/vmware/vpostgres/14/.ssh/
scp authorized_keys <node2>:/var/vmware/vpostgres/14/.ssh/

Step 5: Correct file ownership Ensure the postgres user owns the new file on all nodes:

ssh <node0> "chown postgres:root /var/vmware/vpostgres/14/.ssh/authorized_keys"
ssh <node1> "chown postgres:root /var/vmware/vpostgres/14/.ssh/authorized_keys"
ssh <node2> "chown postgres:root /var/vmware/vpostgres/14/.ssh/authorized_keys"

Step 6: Correct file permissions Apply the required 640 security permissions to the file on all nodes:

ssh <node0> "chmod 640 /var/vmware/vpostgres/14/.ssh/authorized_keys"
ssh <node1> "chmod 640 /var/vmware/vpostgres/14/.ssh/authorized_keys"
ssh <node2> "chmod 640 /var/vmware/vpostgres/14/.ssh/authorized_keys"

Step 7: Verify Password-less SSH Switch to the postgres user on <node0> and verify that you can SSH into every node (including itself) without a password prompt:

su - postgres
ssh <node0>
exit
ssh <node1>
exit
ssh <node2>
exit

Repeat this verification process starting from <node1> and <node2> to ensure full mesh connectivity.

Step 8: Clean up temporary files Once validation is successful, remove the temporary key files generated on <node0>:

exit # (To return to the root user if you are still logged in as postgres)
rm id_rsa.pub.0 id_rsa.pub.1 id_rsa.pub.2 authorized_keys

 

Additional Information

(Optional) You may now run the vidm-cluster-validation.sh script to confirm overall cluster health.