Aria Operations node stuck on "Waiting for Analytics" status
search cancel

Aria Operations node stuck on "Waiting for Analytics" status

book

Article ID: 378322

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Cluster cannot be brought online because cluster node is not Online.  The node stuck in "Waiting for Analytics" state. Analytics service is not running.

Environment

Aria Operations 8.12

Aria Operations 8.14

Aria Operations 8.16

Aria Operations 8.17

Aria Operations 8.18

Cause

The internal vPostgres certificate was manually updated with the web certificate chain.  The following files were modified

  • /storage/vcops/user/conf/ssl/cacert.pem
  • /var/vmware/vpostgres/current/vpostgres_cacert.pem
  • /var/vmware/vpostgres/current/vpostgres_key.pem

The following log exception is identified in the analytics log file:

2024-09-25T20:36:51,391+0000 WARN  [C3P0PooledConnectionPoolManager[identityToken->30hbiyb6bwi9n912rp92j|40a77b42]-HelperThread-#6]  com.mchange.v2.resourcepool.BasicResourcePool.log - c
om.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@3b9421fe -- Acquisition Attempt Failed!!! Clearing pending acquires. While trying to acquire a needed new resource, we f
ailed to succeed more than the maximum number of allowed acquisition attempts (30). Last acquisition attempt exception:
org.postgresql.util.PSQLException: The SSLSocketFactory class provided org.postgresql.ssl.LibPQvROpsFactory could not be instantiated.
        at org.postgresql.Driver$ConnectThread.getResult(Driver.java:385) ~[postgresql-42.5.1.jar:42.5.1]
        at org.postgresql.Driver.connect(Driver.java:298) ~[postgresql-42.5.1.jar:42.5.1]

Caused by: org.postgresql.util.PSQLException: Could not open SSL root certificate file /storage/vcops/user/conf/ssl/cacert.pem.
        at org.postgresql.ssl.LibPQvROpsFactory.<init>(LibPQvROpsFactory.java:153) ~[vcops-postgres-sslfactory-1.0-SNAPSHOT.jar:42.5.1]

Caused by: java.io.FileNotFoundException: /storage/vcops/user/conf/ssl/cacert.pem (Permission denied)
        at java.io.FileInputStream.open0(Native Method) ~[?:?]
        at java.io.FileInputStream.open(Unknown Source) ~[?:?]

Resolution

NOTE: Modification to the Internal vPostgres certificate is not advised as it will potentially break the internal database communication, causing analytics service to stop running.

Perform the following steps to revert the changes.

  1. Take Snapshot of all cluster nodes prior to making any changes.
  2. Stop the vcops service and bring node offline.
    • SSH into the Primary node.
    • Issue the following command: $VMWARE_PYTHON_BIN $VCOPS_BASE/../vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py --action bringSliceOffline --offlineReason "Maintenance"
  3. Revert the updated internal certificate files by issuing the following commands:
    • cd /storage/vcops/user/conf/ssl/
    • cp -p cacert.pem.old cacert.pem
    • cd /var/vmware/vpostgres/current/
    • cp -p vpostgres_cacert.pem.old vpostgres_cacert.pem
    • cp -p vpostgres_key.pem.old vpostgres_key.pem
  4. Bring node back online:
    • Issue the following command: $VMWARE_PYTHON_BIN $VCOPS_BASE/../vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py --action bringSliceOnline
  5. Bring cluster online:
    • Issue the following command: $VMWARE_PYTHON_BIN $VCOPS_BASE/../vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py --action bringSliceOnline