There are situations where the "CACHED ROLES" document can have the primary node's Admin role removed in HA/CA clusters '/storage/db/casa/webapp/hsqldb/casa.db.script' file.
Manually editing this file is risky and prone to mistakes causing additional cluster issues. The script mentioned in this article is used to detect and correct the cached role values automatically.
This issue has been faced during cluster maintenance processes (online, offline, etc) and during the upgrade process.
This issue could be a reason of different problems in an Aria Operations cluster. This problem can be faced during the upgrade process and because of the invalid roles in the cached_roles document, analytics isn't able to run successfully. This causes the whole upgrade process failure or you may find that the upgrade hangs.
Error in the /storage/log/vcops/log/casa/casa.log similar to below:
2025-05-08T05:46:41,798+0000 INFO [ajp-nio-xxx.x.0x.x-8011-exec-2] [xxxxxxxx] support.subprocess.GeneralCommand:255 - Command '/usr/bin/sudo -n /usr/lib/vmware-python-3/bin/python /usr/lib/vmware-vcopssuite/utilities/pakManager/bin/vcopsPakManager.py --action new_validate --pak vRealizeOperationsManagerEnterprise-818324521385 --json --force_content_update false --roles ADMIN,DATA,UI' threw exception: CommandLineExitException: key=general.failure; args=1,; cause=
2025-05-08T05:46:41,798+0000 WARN [ajp-nio-xxx.x.0x.x-8011-exec-2] [xxxxxxxx] casa.exception.CasaControllerExceptionHandler:212 - cause for exception = CommandLineExitException: key=general.failure
Another example is "Inventory sync" failure in VMware Aria Suite Lifecycle with Error code LCMVROPCONFIG20066, which uses casa API to get nodes roles. And will not see Aria Operations node having invalid roles in the cached_roles document in VMware Aria Suite Lifecycle under Aria Operations Environment.
In general we can say that if /casa/cluster/status API returns non valid role for one of the nodes (most likely for the master node), this means that you have faced this issue.
VMware Aria Operations 8.x
The root cause of this failure currently is not known.
To understand that we have this problem we need to check CACHED_ROLES document on each cluster member.
getCachedRoles.py and restoreCachedRoles.py scripts attached to this article.service vmware-casa status to validate$VMWARE_PYTHON_3_BIN getCachedRoles.py to dump "cached roles" document from all nodes. The result could be found in the same directory with the name "cachedRoles.json".
Example of a valid cluster:
> Primary: ADMIN, DATA, UI
> Primary Replica : ADMIN, DATA, UI, REPLICA
> Data. : DATA, UI
> Remote Collector : REMOTE_COLLECTOR
> Witness : WITNESS
$VMWARE_PYTHON_3_BIN restoreCachedRoles.py --restore to restore the CACHED_ROLES document, which should fix the roles