-- In the vSphere Client UI, the Passive Node status is up, but the vmware-vcha and vmware-vpostgres services are stopped.
The UI displays the following message and the cluster state is "Degraded".
----
PostgreSQL replication is not in progress. Verify if PostgreSQL server is running on the Passive node and that the Passive node is reachable on the vCenter HA network
----
-- VCHA logs on the Active Node and Passive Node show output similar to the following:
- Active Node -
----
XXXX-XX-XXTXX:XX:XX.XXX+09:00 info vcha[24295] [Originator@6876 sub=ClusterMgr opID=WorkQueue-11d6c4ab] Slave id is : XXX.XXX.XXX.XXX
XXXX-XX-XXTXX:XX:XX.XXX+09:00 info vcha[24295] [Originator@6876 sub=ClusterMgr opID=WorkQueue-11d6c4ab] Slave id is : XXX.XXX.XXX.XXX
XXXX-XX-XXTXX:XX:XX.XXX+09:00 info vcha[24295] [Originator@6876 sub=ClusterMgr opID=WorkQueue-11d6c4ab] MASTER XXX.XXX.XXX.XXX
XXXX-XX-XXTXX:XX:XX.XXX+09:00 info vcha[24295] [Originator@6876 sub=ClusterMgr opID=WorkQueue-11d6c4ab] Quorum: YES
XXXX-XX-XXTXX:XX:XX.XXX+09:00 verbose vcha[24295] [Originator@6876 sub=Cluster opID=WorkQueue-11d6c4ab] Setting Key = /pcluster/livenodes Value = 2
XXXX-XX-XXTXX:XX:XX.XXX+09:00 verbose vcha[24295] [Originator@6876 sub=Cluster opID=WorkQueue-11d6c4ab] New version 8589934798 {2, 206}
XXXX-XX-XXTXX:XX:XX.XXX+09:00 verbose vcha[24295] [Originator@6876 sub=Cluster opID=WorkQueue-11d6c4ab] SetKvStoreInt version: 8589934798 isUpdate: true
XXXX-XX-XXTXX:XX:XX.XXX+09:00 verbose vcha[24295] [Originator@6876 sub=Cluster opID=WorkQueue-11d6c4ab] compressed from size 2457 to size 519 (max 2470)
XXXX-XX-XXTXX:XX:XX.XXX+09:00 verbose vcha[24295] [Originator@6876 sub=Cluster opID=WorkQueue-11d6c4ab] name kvstore version (8589934798 ?> 8589934797) force true
XXXX-XX-XXTXX:XX:XX.XXX+09:00 verbose vcha[24295] [Originator@6876 sub=Cluster opID=WorkQueue-11d6c4ab] Sent proposal to XXX.XXX.XXX.XXX (version 8589934798)
XXXX-XX-XXTXX:XX:XX.XXX+09:00 verbose vcha[24295] [Originator@6876 sub=Cluster opID=WorkQueue-11d6c4ab] Sent proposal to XXX.XXX.XXX.XXX (version 8589934798)
XXXX-XX-XXTXX:XX:XX.XXX+09:00 verbose vcha[24287] [Originator@6876 sub=Cluster opID=WorkQueue-75191a64] Received ack=true from XXX.XXX.XXX.XXX for kvstore (version 8589934798)
XXXX-XX-XXTXX:XX:XX.XXX+09:00 info vcha[24288] [Originator@6876 sub=Message opID=WorkQueue-11d6c4ab] WriteComplete: Error N7Vmacore16TimeoutExceptionE(Operation timed out: Stream: SSL(<io_obj p:0x00007f8290001b80, h:-1, <TCP 'XXX.XXX.XXX.XXX : XXXX'>, <TCP 'XXX.XXX.XXX.XXX : XXXXX'>>), duration: XX:XX:XX.XXXXX (hh:mm:ss.us))
--> [context]zKq7AVECAQAAAEuUVQEZdmNoYQAAxbVTbGlidm1hY29yZS5zbwAAUglDAIwxRACaSEsARzg3ABQ5NwCaYDcBxc8UdmNoYQABBWkUAQ66EAFDJRIBhb0QAbrpDQFp+Q0BFFoSASBcEgHObR0B6HodAdDCGgGPwxoA5ss3APkkOACTwFECro4AbGlicHRocmVhZC5zby4wAAMv3g9saWJjLnNvLjYA[/context] - pending writes dropped
----
- Passive Node -
----
XXXX-XX-XXTXX:XX:XX.XXX+09:00 verbose vcha[12526] [Originator@6876 sub=VchaUtil] Executing system command; /opt/vmware/vpostgres/current/bin/psql, args: [--dbname=host=XXX.XXX.XXX.XXX port=5432 user=replicator password=xxxxxxxxxxxxxxxx dbname=postgres application_name=vcha sslmode=verify-ca sslrootcert=/storage/db/vpostgres_ssl/root_ca.pem replication=1,--command=IDENTIFY_SYSTEM,--no-password]
XXXX-XX-XXTXX:XX:XX.XXX+09:00 info vcha[12526] [Originator@6876 sub=vpxUtil] System command failed; '/opt/vmware/vpostgres/current/bin/psql', args: [--dbname=host=XXX.XXX.XXX.XXX port=5432 user=replicator password=xxxxxxxxxxxxxxxx dbname=postgres application_name=vcha sslmode=verify-ca sslrootcert=/storage/db/vpostgres_ssl/root_ca.pem replication=1,--command=IDENTIFY_SYSTEM,--no-password], exit code: 2
--> stdout:
--> stderr: psql.bin: error: connection to server at "XXX.XXX.XXX.XXX", port 5432 failed: SSL error: certificate verify failed
-->
XXXX-XX-XXTXX:XX:XX.XXX+09:00 verbose vcha[12522] [Originator@6876 sub=Election opID=clusterElection.cpp:1570-3b0d1f35] CheckVersion: Version[3] Other host GT : 8589934791 > 8589934789
XXXX-XX-XXTXX:XX:XX.XXX+09:00 verbose vcha[12522] [Originator@6876 sub=Election opID=clusterElection.cpp:1570-3b0d1f35] CheckVersion: Pending version change 8589934791 >= 8589934791
XXXX-XX-XXTXX:XX:XX.XXX+09:00 verbose vcha[12526] [Originator@6876 sub=VchaUtil] Executing system command; /opt/vmware/vpostgres/current/bin/psql, args: [--dbname=host=XXX.XXX.XXX.XXX port=5432 user=replicator password=xxxxxxxxxxxxxxxx dbname=postgres application_name=vcha sslmode=verify-ca sslrootcert=/storage/db/vpostgres_ssl/root_ca.pem replication=1,--command=IDENTIFY_SYSTEM,--no-password]
XXXX-XX-XXTXX:XX:XX.XXX+09:00 info vcha[12526] [Originator@6876 sub=vpxUtil] System command failed; '/opt/vmware/vpostgres/current/bin/psql', args: [--dbname=host=XXX.XXX.XXX.XXX port=5432 user=replicator password=xxxxxxxxxxxxxxxx dbname=postgres application_name=vcha sslmode=verify-ca sslrootcert=/storage/db/vpostgres_ssl/root_ca.pem replication=1,--command=IDENTIFY_SYSTEM,--no-password], exit code: 2
--> stdout:
--> stderr: psql.bin: error: connection to server at "XXX.XXX.XXX.XXX", port 5432 failed: SSL error: certificate verify failed
-->
----
VMware vCenter Server 7.x
VMware vCenter Server 8.x
This issue occurs when the Postgres SSL certificate on the active and passive nodes has expired.
- /storage/db/vpostgres_ssl/server.crt
You can check the certificate expiration date with the following command:
# openssl x509 -in /storage/db/vpostgres_ssl/server.crt -text -noout | grep -ie "Not Before" -ie "Not After";
----
Example output:
# openssl x509 -in /storage/db/vpostgres_ssl/server.crt -text -noout | grep -ie "Not Before" -ie "Not After";
Not Before: Jul 20 20:07:48 2023 GMT
Not After : Jul 20 08:07:48 2025 GMT
----
Unconfigure VCHA, update the SSL certificate, and then configure VCHA again to synchronize the certificate.
- vCert - Scripted vCenter Expired Certificate Replacement
- There may be cases where the same symptom occurs even if there is no issue with the certificate expiration date. In this case, there may be a mismatch between the machine SSL certificate and the postgres certificate.Please refer to the following KB.
vCenter HA configuration is failing with error message "PostgreSQL replication is not in progress. Verify if PostgreSQL server is running on the Passive node and that the Passive node is reachable on the vCenter HA network"
- Japanese KB: https://knowledge.broadcom.com/external/article/414953