The Aria Operations Continuous Availability Cluster fails to initialize displaying the status "Waiting for Analytics" with error message "Attempt to start node from not accessible zone"
book
Article ID: 368819
calendar_today
Updated On:
Products
VMware Aria Suite
Issue/Introduction
Aria Operations fails to come online after a cluster restart. This issue may occur if there were changes in the backend infrastructure, primarily due to a network glitch
The CA status will report Degraded and all the analytics nodes status will be "Waiting for Analytics"
Environment
VMware Aria Operations (formerly known as vRealize Operations) 8.14.X
Cause
This error occurs when you attempt to start services while the Fault Domain is marked as OFFLINE/FAILURE in casa
Resolution
To resolve this issue:
1. Log into the Aria Operations admin UI as the local admin user 2. Click Take Cluster Offline under Cluster Status Note: Wait for Cluster Status to show as Offline 3. In vCenter, preform a Shut Down Guest OS action on the VM. Aria Operations nodes should be taken offline in the following order:
5. Once each Aria Operations Analytics node has restarted, log back into the Aria Operations admin UI as the local admin user
6. Click Bring Cluster Online under Cluster Status
Note: Wait for the Cluster Status to show as Online
If the cluster does not respond to the "Bring Cluster Online" button, please contact VMware by Broadcom Technical Support for assistance and reference this KB
Additional Information
The Analytics Service is experiencing frequent crashes on the analytics nodes, even after multiple manual restart attempts
In the Aria Operations Analytics-wrapper.log file, you see the entries similar to:
| INFO | jvm 1 | WARNING: An illegal reflective access operation has occurred | INFO | jvm 1 | WARNING: Illegal reflective access by com.vmware.vcops.casarest.client.HttpRequesterURLConnectionImpl (file:/usr/lib/vmware-vcops/common/lib/casa-rest-client-1.0-SNAPSHOT.jar) to field java.lang.reflect.Field.modifiers | INFO | jvm 1 | WARNING: Please consider reporting this to the maintainers of com.vmware.vcops.casarest.client.HttpRequesterURLConnectionImpl | INFO | jvm 1 | WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations | INFO | jvm 1 | WARNING: All illegal access operations will be denied in a future release | INFO | jvm 1 | >>> AnalyticsMain.run failed with error: IllegalStateException: Attempt to start node from not accessible zone. | INFO | jvm 1 | WrapperManager Debug: WrapperManager.stop(-1) called by thread: SystemExitThread | INFO | jvm 1 | WrapperManager Debug: Send a packet STOP : -1 | INFO | jvm 1 | WrapperManager Debug: Pausing for 1,000ms to allow a clean shutdown... | DEBUG | wrapperp | read a packet STOP : -1 | DEBUG | wrapper | JVM requested a shutdown. (-1) | DEBUG | wrapper | wrapperStopProcess(-1, FALSE) called. | DEBUG | wrapper | Sending stop signal to JVM | DEBUG | wrapper | Enqueue Event 'jvm_stop' | INFO | jvm 1 | WrapperManager Debug: Stopped checking for control events.
In the Aria Operations Analytics.log file, you see the entries similar to:
INFORMATION [Analytics Main Thread] com.vmware.vcops.analytics.cluster.ClusterCoordinatorImpl.setAccessibleZones - Available zones are: [domain2] ERROR [Analytics Main Thread] com.integrien.analytics.AnalyticsMain.run - AnalyticsMain.run failed with error: IllegalStateException: Attempt to start node from not accessible zone.java.lang.IllegalStateException: Attempt to start node from not accessible zone. at com.vmware.vcops.analytics.cluster.ClusterCoordinatorImpl.<init>(ClusterCoordinatorImpl.java:135) ~[vcops-analytics-1.0-SNAPSHOT.jar:?] at com.vmware.vcops.analytics.cluster.ClusterCoordinatorImpl.getInstance(ClusterCoordinatorImpl.java:100) ~[vcops-analytics-1.0-SNAPSHOT.jar:?] at com.integrien.analytics.AnalyticsMain.doRun(AnalyticsMain.java:475) ~[vcops-analytics-1.0-SNAPSHOT.jar:?] at com.integrien.analytics.AnalyticsMain.run(AnalyticsMain.java:2251) ~[vcops-analytics-1.0-SNAPSHOT.jar:?] INFO [Analytics Main Thread] com.vmware.vcops.platform.common.PlatformEnvironment.exitSystem - Terminating process with exit code -1 ... INFORMATION [SystemExitThread] com.vmware.vcops.platform.common.PlatformEnvironment.run - exitSystem has been called by: java.lang.Throwable: null at com.vmware.vcops.platform.common.PlatformEnvironment.exitSystem(PlatformEnvironment.java:276) ~[alive_platform.jar:?] at com.vmware.vcops.platform.common.PlatformEnvironment.exitSystem(PlatformEnvironment.java:303) ~[alive_platform.jar:?] at com.integrien.analytics.AnalyticsMain.run(AnalyticsMain.java:2256) ~[vcops-analytics-1.0-SNAPSHOT.jar:?] INFORMATION [WrapperListener_stop_runner] com.integrien.analytics.AnalyticsMain.stop - Analytics is stopping... exit code -1 WARN [WrapperListener_stop_runner] com.integrien.analytics.AnalyticsMain.stop - Analytics is stopping... exit code -1. Generating thread dump in the call stack directory .. INFORMATION [Take Analytics Offline Thread] com.integrien.analytics.AnalyticsMain.run - Analytics is going offline..
In the Aria Operations casa.db.script file, you see the entries like ("regionA":"FAILURE" and "regionB":"FAILURE"):
INSERT INTO CASA_DOCS VALUES('CA_INFO','{"document_time":1716437986633,"document_name":"CA_INFO","document_version":52,"document_body":{"regionA":"FAILURE","regionA_last_down_time":1716338218598,"regionA_last_processing_time":1716338137983,"regionB":"FAILURE","regionB_last_down_time":1712270794768,"regionB_last_processing_time":1712270687798,"split_brain":null,"regionA_display_name":null,"regionB_display_name":null,"document_version":54}}') INSERT INTO CASA_DOCS VALUES('FIPS','{"document_time":1698882002578,"document_name":"FIPS","document_version":4,"document_body":{"state":"DISABLED"}}' )