The Aria Operations Continuous Availability Cluster fails to initialize displaying the status "Waiting for Analytics" with error message "Attempt to start node from not accessible zone"
search cancel

The Aria Operations Continuous Availability Cluster fails to initialize displaying the status "Waiting for Analytics" with error message "Attempt to start node from not accessible zone"

book

Article ID: 368819

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

  • Aria Operations fails to come online after a cluster restart. This issue may occur if there were changes in the backend infrastructure, primarily due to a network glitch
  • The CA status will report Degraded and all the analytics nodes status will be "Waiting for Analytics"

Environment

VMware Aria Operations (formerly known as vRealize Operations) 8.14.X

Cause

This error occurs when you attempt to start services while the Fault Domain is marked as OFFLINE/FAILURE in casa

Resolution

To resolve this issue:

1.    Log into the Aria Operations  admin UI as the local admin user
2.    Click Take Cluster Offline under Cluster Status
                 Note: Wait for Cluster Status to show as Offline
3.   In vCenter, preform a Shut Down Guest OS action on the VM. Aria Operations nodes should be taken offline in the following order:
 
               1. Witness node
               2. Remote collector node(s)
               3. FD2 Replica Data node(s)
               4. FD2 Primary replica node
               5 .FD1 Data node(s)
               6. FD1 Primary node

4.   In vCenter, preform a Power On action on the VM. Aria Operations nodes should be brought online in the following order:

               1.  FD1 Primary node
               2.  FD1 Data node(s)
               3.  FD2 Primary replica node
               4.  FD2 Replica Data node(s)
               5.  Remote collector node(s)
               6.  Witness node


5.   Once each Aria Operations Analytics  node has restarted, log back into the Aria Operations admin UI as the local admin user
   
6.   Click Bring Cluster Online under Cluster Status
     
          Note: Wait for the Cluster Status to show as Online


If the cluster does not respond to the "Bring Cluster Online" button, please contact VMware by Broadcom Technical Support for assistance and reference this KB

Additional Information

  • The Analytics Service is experiencing frequent crashes on the analytics nodes, even after multiple manual restart attempts
  • In the Aria Operations  Analytics-wrapper.log file, you see the entries similar to:

    | INFO   | jvm 1    | WARNING: An illegal reflective access operation has occurred
    | INFO   | jvm 1    | WARNING: Illegal reflective access by com.vmware.vcops.casarest.client.HttpRequesterURLConnectionImpl (file:/usr/lib/vmware-vcops/common/lib/casa-rest-client-1.0-SNAPSHOT.jar) to field java.lang.reflect.Field.modifiers
    | INFO   | jvm 1    | WARNING: Please consider reporting this to the maintainers of com.vmware.vcops.casarest.client.HttpRequesterURLConnectionImpl
    | INFO   | jvm 1    | WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
    | INFO   | jvm 1    | WARNING: All illegal access operations will be denied in a future release
    | INFO   | jvm 1    | >>> AnalyticsMain.run failed with error: IllegalStateException: Attempt to start node from not accessible zone.
    | INFO   | jvm 1    | WrapperManager Debug: WrapperManager.stop(-1) called by thread: SystemExitThread
    | INFO   | jvm 1    | WrapperManager Debug: Send a packet STOP : -1
    | INFO   | jvm 1    | WrapperManager Debug: Pausing for 1,000ms to allow a clean shutdown...
    | DEBUG  | wrapperp | read a packet STOP : -1
    | DEBUG  | wrapper  | JVM requested a shutdown. (-1)
    | DEBUG  | wrapper  | wrapperStopProcess(-1, FALSE) called.
    | DEBUG  | wrapper  | Sending stop signal to JVM
    | DEBUG  | wrapper  | Enqueue Event 'jvm_stop'
    | INFO   | jvm 1    | WrapperManager Debug: Stopped checking for control events.

  • In the Aria Operations Analytics.log file, you see the entries similar to:


    INFORMATION [Analytics Main Thread]  com.vmware.vcops.analytics.cluster.ClusterCoordinatorImpl.setAccessibleZones - Available zones are: [domain2]
    ERROR [Analytics Main Thread]  com.integrien.analytics.AnalyticsMain.run - AnalyticsMain.run failed with error: IllegalStateException: Attempt to start node from not accessible zone.java.lang.IllegalStateException: Attempt to start node from not accessible zone.
            at com.vmware.vcops.analytics.cluster.ClusterCoordinatorImpl.<init>(ClusterCoordinatorImpl.java:135) ~[vcops-analytics-1.0-SNAPSHOT.jar:?]
            at com.vmware.vcops.analytics.cluster.ClusterCoordinatorImpl.getInstance(ClusterCoordinatorImpl.java:100) ~[vcops-analytics-1.0-SNAPSHOT.jar:?]
            at com.integrien.analytics.AnalyticsMain.doRun(AnalyticsMain.java:475) ~[vcops-analytics-1.0-SNAPSHOT.jar:?]
            at com.integrien.analytics.AnalyticsMain.run(AnalyticsMain.java:2251) ~[vcops-analytics-1.0-SNAPSHOT.jar:?]
    INFO  [Analytics Main Thread]  com.vmware.vcops.platform.common.PlatformEnvironment.exitSystem - Terminating process with exit code -1 ...
    INFORMATION [SystemExitThread]  com.vmware.vcops.platform.common.PlatformEnvironment.run - exitSystem has been called by:
    java.lang.Throwable: null
            at com.vmware.vcops.platform.common.PlatformEnvironment.exitSystem(PlatformEnvironment.java:276) ~[alive_platform.jar:?]
            at com.vmware.vcops.platform.common.PlatformEnvironment.exitSystem(PlatformEnvironment.java:303) ~[alive_platform.jar:?]
            at com.integrien.analytics.AnalyticsMain.run(AnalyticsMain.java:2256) ~[vcops-analytics-1.0-SNAPSHOT.jar:?]
    INFORMATION [WrapperListener_stop_runner]  com.integrien.analytics.AnalyticsMain.stop - Analytics is stopping... exit code -1
    WARN  [WrapperListener_stop_runner]  com.integrien.analytics.AnalyticsMain.stop - Analytics is stopping... exit code -1. Generating thread dump in the call stack directory ..
    INFORMATION [Take Analytics Offline Thread]  com.integrien.analytics.AnalyticsMain.run - Analytics is going offline..


  • In the Aria Operations casa.db.script file, you see the entries like ("regionA":"FAILURE" and "regionB":"FAILURE"):

    INSERT INTO CASA_DOCS VALUES('CA_INFO','{"document_time":1716437986633,"document_name":"CA_INFO","document_version":52,"document_body":{"regionA":"FAILURE","regionA_last_down_time":1716338218598,"regionA_last_processing_time":1716338137983,"regionB":"FAILURE","regionB_last_down_time":1712270794768,"regionB_last_processing_time":1712270687798,"split_brain":null,"regionA_display_name":null,"regionB_display_name":null,"document_version":54}}')
    INSERT INTO CASA_DOCS VALUES('FIPS','{"document_time":1698882002578,"document_name":"FIPS","document_version":4,"document_body":{"state":"DISABLED"}}'
    )