Degraded," or nodes stuck in."Waiting for Analytics."service vmware-casa status on the appliance shell returns active (running) on all nodes/storage/log/vcops/analytics.log file may contain the following entriesWARN [Threshold checker worker thread 3] com.vmware.vcops.platform.common.fsdb.FsdbClientBase.executeOnResourceMembersWithTimeout - Failed to execute function 'FsdbInterface.saveObservations' on server vRealize Ops Fsdb-##### for resourceId=####: FunctionException: The requested server(s) are not running
[AlarmQuery--thread-4] com.vmware.vcops.platform.common.sharding.ShardingGemfireFunctionExecutor.executeForPlatformResultOnNamedServer - Failed to execute function AlarmShardServerInterface.cancelAlarms : FunctionException: The requested server(s) are not running org.apache.geode.cache.execute.FunctionException: The requested server(s) are not running Caused by: org.apache.geode.cache.execute.FunctionInvocationTargetException: The requested server(s) are not running
com.vmware.vcops.platform.common.fsdb.FsdbClientBase.executeOnResourceMembersWithTimeout - Failed to execute function 'FsdbInterface.saveObservations' on server vRealize Ops Fsdb-###### for resourceId=#####: FunctionException: The requested server(s) are not running
Aria operation sharding refers to scalability and High Availability (HA) configurations within the VMware Aria Operations platform, where data and operations are distributed across multiple nodes (shards) for performance, scale, and redundancy.
The issue is caused by a Gemfire Cluster Partition or a "Split Brain" scenario where the Analytics service on one or more Data Nodes may lose connectivity with the Primary Node.
Impact on Alerting: Aria Operations uses a sharding mechanism to distribute data. If the node hosting the specific "Alarm Shard" for an object (e.g., a Virtual Machine) is unreachable, the system cannot process metrics against alert definitions.
Perform a cluster-wide power cycle post validating the status to reset GemFire's internal node coordination.
Validate Cluster Status
Log in to the Admin UI (https://<primary-node-ip>/admin).
Confirm the status of the nodes (e.g., Offline, Degraded).
Take Cluster Offline
In the Admin UI, select the cluster and click Take Offline.
Wait for the status to show "Offline" for all nodes.
Reboot Nodes
Proceed to perform a power cycle on all nodes in the Aria operations cluster. Refer to Shutdown and Startup sequence for Aria Operations cluster
Bring Cluster Online
Once all nodes are powered on and reachable, login to Admin UI.
Click Bring Online.
Verify Resolution
Ensure the Cluster Status is Running in the Admin UI