VCHA setup for vCenter Server 8.0 getting stuck with "PostgreSQL replication is not in progress." after reducing the resource configuration for the passive node
search cancel

VCHA setup for vCenter Server 8.0 getting stuck with "PostgreSQL replication is not in progress." after reducing the resource configuration for the passive node

book

Article ID: 401174

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • You deploy vCenter high availability (VCHA) for a vCenter Server Appliance 8.0
  • After cloning the passive and witness nodes, you reduce the memory size and/or  the virtual CPU count of the passive node, after all this node is only meant to take over in emergency situations
  • However, following this change, the VCHA status never goes to active, but instead is getting stuck with an error message, stating:

PostgreSQL replication is not in progress. For an initial deployment, PostgreSQL standby creation may take time depending on the database size. If this issue persists, verify that PostgreSQL is running on the passive node and that the passive node is reachable on the vCenter HA network.

  • The required TCP ports for VCHA have been confirmed to be open in the VCHA network between the nodes already
  • When reviewing the vcha service logs, vcha-<number>.log in /var/log/vmware/vcha/, the errors like the ones below are seen:
    <time stamp> info vcha[###] [Originator@6876 sub=vpxUtil] System command failed; '/usr/bin/rsync', args: [--recursive,--checksum,--perms,--times,--group,--owner,--links,--protect-args,--temp-dir=/storage/vcha/.tmpfiles,--info=progress,--timeout=60,--rsh=ssh -i /home/vcha/.ssh/id_rsa -o UserKnownHostsFile=/home/vcha/.ssh/known_hosts,/etc/vmware-vpx/extensions/com.vmware.vsan.health/locale/es/locmsg.vmsg,vcha@<passive_node_vcha_ip_address>:/etc/vmware-vpx/extensions/com.vmware.vsan.health/locale/es/], exit code: 255
    --> stdout:
    --> stderr: ssh: connect to host <passive_node_vcha_ip_address> port 22: Connection refused^M
    --> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
    --> rsync error: unexplained error (code 255) at io.c(232) [sender=3.4.1]
    -->
    <time stamp> error vcha[23596] [Originator@6876 sub=RsyncRepl-smallFrp] Rsync failed, retcode: 255, error: ssh: connect to host <passive_node_vcha_ip_address> port 22: Connection refused^M
    --> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
    --> rsync error: unexplained error (code 255) at io.c(232) [sender=3.4.1]
    -->

Environment

VMware vCenter Server 8.0.x

Cause

The configuration for the vPostgreSQL database, stored in /storage/db/vpostgres/postgresql.conf.auto in the vCenter Server Appliance (VCSA) is automatically configured, based on the overall memory size of the appliance. That includes the overall maximum number of accepted connections.

When the overall memory is being reduced, this number is lowered equally. For example, a VCSA 8 with 28 GByte memory size will usually have the maximum connections for the database set with about 700, while with 8 GByte memory that number is only about 300 connections.

If there is a discrepancy between these limits between the active and passive node, the active node will be unsuccessful in opening new connections with the database in the passive node, resulting in the data transfer used for the postgres replication to be disrupted.

Resolution

Please do not change the virtual hardware configuration of the additional nodes that are created as part of the VCHA deployment.

Such changes are not supported, the passive node is meant to be an exact mirror of the active node, especially because there are multiple configurations used by the components in the operating system of the VCSA that are automatically configured, mainly based on the overall memory size of the appliance.

Therefore both nodes should have the same virtual hardware configuration to prevent issues such as the one outlined above.