CA Privileged Access Manager - Cloakware Password Authority (PA)PAM SAFENET LUNA HSMCA Privileged Access Manager (PAM)
Issue/Introduction
In a 3.2.2 or 3.2.3 multisite cluster, the session manager in the primary site nodes often gets out of sync, as does the session and credential manager of the nodes in the secondary sites.
Sometimes restarting the cluster or resynching the secondary node in its site is able to recover the cluster to a synchronized situation, but there are situations when the credential managers of the nodes in the secondary sites go out of sync and will not recover back to a synchronized situation no matter what action is carried out: be it site synchronization or cluster restart.
If the logs of the master node in the primary site are verified, there are countless errors like the following
Jan 24, 2019 10:50:07 AM com.cloakware.cspm.server.app.SiteReplicationServlet a SEVERE: Unauthorized request from site at host:63.90.3.170 [63.90.3.170]
And on any of the secondary site nodes where the problem is occurring, the following lines are displayed
Jan 24, 2019 10:50:07 AM com.cloakware.cspm.server.replication.ReplicationPoller poll SEVERE: ReplicationPoller.poll got failed commandResult from master: 401 : Site Not Authenticated
Cause
This is an issue which may come from possible network conditions or some other condition that makes Primary Master believe that the sec site is far behind in replication, thus marking it as inactive
Environment
CA PAM 3.2.0, 3.2.2 or 3.2.3
Resolution
There have been several fixes included in version 3.2.4 which deal with this and other cluster synchronization issues, among other the one where the session manager database goes out of sync.
As a possible workaround it is advisable to increase the Max number of queued replication records before member deactivation setting in the cluster. To do so:
Login to Primary master
Turn off the cluster
Enable cluster tuning under Configuration->Diagnostics->System
Go to cluster config - "Tuning" tab and change the "Max number of queued replication records before member deactivation" from the default 10000 to 20000