A long running provisioning job using Rest API calls caused the access manager databases of secondary site nodes to go out of sync, making these nodes unavailable to PAM users due to the failed health check.
PAM cluster nodes compare each others access database at regular time intervals, the "Cluster Access Database Consistency Check Period" referenced on page https://docops.ca.com/ca-privileged-access-manager/3-2-4/en/deploying/set-up-a-cluster/cluster-configuration, by default 15 minutes. Once a mismatch is detected, the database will be checked again every 5 minutes. If the checks continue to fail five consecutive times, the database will be regarded out of sync. Secondary site nodes are not synchronized in real time and poll for updates every 10 seconds by default. This can result in database differences during times of access data changes, such as the provisioning of new device or access policies. If there are continuous changes for more than about half an hour, they can result in the observed behavior, even though there is no synchronization problem other than the delayed updates due to the polling frequency.
PAM 3.0.2 multi-site cluster. The problem may be observed on any current PAM release, including the latest 3.2 release.
Avoid running large jobs creating or updating access data in PAM that cause continuous changes over a long period of time. Rather split the job into shorter tasks. The risk of running into the problem may be reduced by changing the cluster tuning parameters discussed on the documentation page mentioned above. Per online documentation use cluster tuning only with the direction of CA Support. If the jobs can be spread out, that is the preferred option.