Secondary node fails to join active cluster in PROD and shows Elapsed Time Since Last Contact as "Contact failed"

book

Article ID: 193494

calendar_today

Updated On:

Products

CA Privileged Access Manager (PAM) CA Privileged Access Manager - Cloakware Password Authority (PA) PAM SAFENET LUNA HSM CA Privileged Access Manager - Server Control (PAMSC)

Issue/Introduction

Under version 3.2.2 or other version under 3.3.1 if trying to join secondary nodes on secondary site and this stays in status RED for Credential Manager asking to re-resync and showing the column "Elapsed Time Since Last Contact" as "Contact failed" for the secondary node. If try: 

  • Re-synch Site Member: same issue
  • Leave Cluster and join the machine again: Same problem

Cause

The issue can be because the version of PAM.

There is a known problem resolved after version 3.1.1

Look inside of Tomcat Log from Secondary Node if have these messages:

Jun 19, 2020 6:26:24 PM com.cloakware.cspm.server.app.SiteReplicationServlet a
SEVERE: ReplicationPoller.poll got failed commandResult from master: 401 : Site Not Authenticated
Jun 19, 2020 6:21:26 PM com.cloakware.cspm.server.replication.ReplicationPoller poll
SEVERE: ReplicationPoller.poll got failed commandResult from master: 401 : Site Not Authenticated
Jun 19, 2020 6:21:36 PM com.cloakware.cspm.server.replication.ReplicationPoller poll
SEVERE: ReplicationPoller.poll got failed commandResult from master: 401 : Site Not Authenticated
Jun 19, 2020 6:21:46 PM com.cloakware.cspm.server.replication.ReplicationPoller poll
SEVERE: ReplicationPoller.poll got failed commandResult from master: 401 : Site Not Authenticated
Jun 19, 2020 6:21:57 PM com.cloakware.cspm.server.replication.ReplicationPoller poll
SEVERE: ReplicationPoller.poll got failed commandResult from master: 401 : Site Not Authenticated
Jun 19, 2020 6:22:07 PM com.cloakware.cspm.server.replication.ReplicationPoller poll
SEVERE: ReplicationPoller.poll got failed commandResult from master: 401 : Site Not Authenticated

At same time check also Tomcat Logs from Primary Site/Primary Node if have similar errors like this:

Jun 19, 2020 6:26:22 PM com.cloakware.cspm.server.replication.ReplicationManager deactivateSitesTooFarBehind
SEVERE: ReplicationManager.deactivateInactiveSites Site secondarysitenodename.domain.com (lastRecordProcessed=0) is more than 20000 behind the primary site (MaxReplRecordId=4738956).  Deactivating site so events can be purged.
Jun 19, 2020 6:26:22 PM com.cloakware.cspm.server.replication.ReplicationManager deactivateSitesTooFarBehind
SEVERE: ReplicationManager.deactivateInactiveSites Site secondarysitenodename.domain.com (lastRecordProcessed=0) is more than 20000 behind the primary site (MaxReplRecordId=4738956).  Deactivating site so events can be purged.
Jun 19, 2020 6:26:24 PM com.cloakware.cspm.server.app.SiteReplicationServlet a

 

Environment

Release : 3.2

Component : PRIVILEGED ACCESS MANAGEMENT

Resolution

To workaround the issue have top Stop cluster and Start again if receiving this behavior and verified that messages in Tomcat Logs from primary and secondary nodes.

The node will NOT sync without stop/start cluster.

To definitive resolve this issue it's necessary 

Update the PAM to most recent version if your are with PAM 3.2.2

The solution isi published in this intermediate version 3.3.1 and also included in recent versions.

https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/layer7-privileged-access-management/privileged-access-manager/3-4/release-information/resolve-issues-in-earlier-3-x-releases/resolved-issues-in-3-3_1.html

Salesforce Case Number 20001227
Internal Defect ID DE420853
Resolved Issue Credential Manager database out-of-sync on secondary cluster node and attempts to resync that node fail.