vpostgres-repl fails to start on Primary node in Aria Operations Manager
search cancel

vpostgres-repl fails to start on Primary node in Aria Operations Manager

book

Article ID: 311422

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • Aria Operations cluster HA is Enabled, or failed to disable.
  • Attempting to start the vpostgres-repl service fails with an error similar to:
    Reset vRealize Operations vPostgres Replication Database...
    Test connection to ...
    Failed testing connection to
  • The /storage/db/vcops/vpostgres/repl/serverlog file shows errors similar to:
    FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                    Is the server running on host "###.###.###.###" and accepting
                    TCP/IP connections on port 5433?
  • The /storage/vcops/log/analytics-UUID.log shows error similar to:
    2018-03-29 17:50:53,484 ERROR [Analytics Main Thread ]  com.integrien.analytics.AnalyticsMain.run - AnalyticsMain.run failed with error: CannotCreateTransactionException: Could not open JDBC Connection for transaction; nested exception is java.lang.reflect.UndeclaredThrowableException
    org.springframework.transaction.CannotCreateTransactionException: Could not open JDBC Connection for transaction; nested exception is java.lang.reflect.UndeclaredThrowableException

Environment

Aria Operations 8.x

Cause

A failed failover between a Primary and Primary Replica node can leave inconsistencies in the cluster.

Resolution

If you believe you are encountering this issue, please contact Broadcom Support

Additional Information

***********BELOW STEPS MUST STAY INTERNAL****************

If one section does not resolve the issue, continue on to the next section.

Set Primary and Replica

  1. Log into the Master node as root via SSH or Console.
  2. Run the following command to stop the vcops service, and ensure the casa service is started:
service vmware-vcops stop; service vmware-casa start
  1. Change to the /usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/ directory:
cd /usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/
  1. Repeat steps 1-3 on all nodes in the Aria Operations Manager cluster.
  2. Run the following commands on the respective nodes to set the Primary and Replica nodes:
  • Run on Primary node
$VMWARE_PYTHON_BIN ./vcopsConfigureRoles.py --action configureRoles --admin true --data true --ui true --adminCS IPOfPrimaryNode,IpOfReplicaNode --enableHA true --startServices false
  • Run on Replica node
$VMWARE_PYTHON_BIN ./vcopsConfigureRoles.py --action configureRoles --admin true --data true --ui true --adminCS IPOfPrimaryNode,IpOfReplicaNode --enableHA true --replica true --startServices false
  • Run on Data nodes
$VMWARE_PYTHON_BIN ./vcopsConfigureRoles.py --action configureRoles --admin false --data true --ui true --adminCS IPOfPrimaryNode,IpOfReplicaNode --enableHA true --startServices false
  • Run on Remote Collector nodes
$VMWARE_PYTHON_BIN ./vcopsConfigureRoles.py --action configureRoles --admin false --data false --ui false --remoteCollector true --adminCS IPOfPrimaryNode,IpOfReplicaNode --enableHA true --startServices false

Note: Replace IPOfPrimaryNode and IpOfReplicaNode with the IP address of the Primary node, and Replica node respectively.
Example$VMWARE_PYTHON_BIN ./vcopsConfigureRoles.py --action configureRoles --admin true --data true --ui true --adminCS 192.168.3.1,192.168.3.2 --enableHA true --startServices false​
  1. Start the vcops service:
service vmware-vcops start
  1. Repeat step 6 on all nodes in the Aria Operations Manager cluster.

Rename recovery.conf.bootstrap on Primary

  1. Log into the Primary node as root via SSH or Console.
  2. Run the following command to rename the recovery.conf.bootstrap file:
mv $STORAGE_DB_VCOPS/recovery.conf.bootstrap $STORAGE_DB_VCOPS/recovery.conf.bootstrap.bak
  1. Restart the vcops service:
service vmware-vcops restart
  1. Repeat step 3 on all nodes in the Aria Operations Manager cluster.

Rename backup_label on Primary

  1. Log into the Primary node as root via SSH or Console.
  2. Run the following command to rename the backup_label file:
mv /storage/db/vcops/vpostgres/repl/backup_label /storage/db/vcops/vpostgres/repl/backup_label.bak
  1. Restart the vcops service:
service vmware-vcops restart
  1. Repeat step 3 on all nodes in the Aria Operations Manager cluster.


Notes:

If Aria Operations Manager cluster HA needs to be disabled, and the Replica node needs to become a data node, run the following commands in lieu of step 5 under Set Primary and Replica:

  • Run on Primary node
$VMWARE_PYTHON_BIN ./vcopsConfigureRoles.py --action configureRoles --admin true --data true --ui true --adminCS IPOfPrimaryNode --enableHA false --startServices false
  • Run on Replica node
$VMWARE_PYTHON_BIN ./vcopsConfigureRoles.py --action configureRoles --admin false --data true --ui true --adminCS IPOfMasterNode --enableHA false --startServices false
  • Run on Data nodes
$VMWARE_PYTHON_BIN ./vcopsConfigureRoles.py --action configureRoles --admin false --data true --ui true --adminCS IPOfMasterNode --enableHA false --startServices false
  • Run on Remote Collector nodes
$VMWARE_PYTHON_BIN ./vcopsConfigureRoles.py --action configureRoles --admin false --data false --ui false --remoteCollector true --adminCS IPOfMasterNode --enableHA false --startServices false

Note: Replace IpOfMasterNode with the IP address of the Primary node.
Example$VMWARE_PYTHON_BIN ./vcopsConfigureRoles.py --action configureRoles --admin true --data true --ui true --adminCS 192.168.3.1 --enableHA false --startServices false​