Replica node fails to join cluster after upgrading to vRealize Automation 7.5 with "Waiting for services to start"
search cancel

Replica node fails to join cluster after upgrading to vRealize Automation 7.5 with "Waiting for services to start"

book

Article ID: 325966

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:
  • After upgrading vRealize Automation to 7.5:
    • Cannot join new replica node to cluster
    • Cluster operation appears to stall at 85% at "Waiting for services to start"
  • Component-Registry and other dependent services do not register.
  • In catalina.out see errors similar to:
Exception handled during retry operation with message: 503 Service Unavailable.  No Server is available to handle this request.
  • Errors may be seen in Horizon.log:
com.vmware.vcac.cli.configurator.services.cluster.impl.ClusterNodeServiceImpl.isNodeInClusterMode:92 - Exception while loading node info with id: cafe.node.104141903.16299

org.springframework.transaction.CannotCreateTransactionException: Could not open JDBC Connection for transaction; nested exception is org.postgresql.util.PSQLException: Connection to 127.0.0.1:5433 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.


Environment

VMware vRealize Automation 7.5.x

Cause

  • Some configuration files may not have copied over during the join cluster operation within the following folder:

/opt/vmware/horizon/workspace/conf/

Resolution

To verify, check sha1sum of files:

  1. SSH into both vRealize Automation appliances, arrange SSH sessions side-by-side to help with comparison

cd /opt/vmware/horizon/workspace/conf

ls -1 | sort | xargs -n 1 sha1sum

  1. Compare the differences in sha1sum between the below files:

catalina.policy
server.xml
web.xml

  1. On both nodes determine which files differ:
    1. Copy the files from the master node to the replica that are different. 
Note: Do not copy the entire /conf folder, only copy individual files from the comparison above.
  1. Verify ownership by running the below commands: 

ls -l

  1. Set correct ownership:
chown horizon:www <filename>
  1. Restart he horizon-workspace service, or reboot appliance
service horizon-workspace restart
  1. Attempt to rejoin the replica node to the cluster