Premature cluster start after accidental stop causes a full outage
search cancel

Premature cluster start after accidental stop causes a full outage

book

Article ID: 232140

calendar_today

Updated On:

Products

CA Privileged Access Manager (PAM)

Issue/Introduction

We were working on applying a hotfix to PAM cluster nodes one at a time. By accident, the cluster was turned off from one of the secondary nodes. We tried to recover quickly by going to the first node and turning the cluster back on. But this left all nodes, except for the master node, in a bad state, with secondary site nodes inaccessible and primary site nodes coming back only after a very long time, but not synching with the master.

Environment

Release : 3.4

Component : PRIVILEGED ACCESS MANAGEMENT

Cause

When the cluster is turned off from one node, other cluster nodes activated the "TURN CLUSTER ON" button on the Configuration > Clustering page too soon. Attempting to turn the cluster on while it's not complete shut down breaks synchronization.

Resolution

Always wait for the cluster to be reported as off on the node from where you started turning the cluster off. Make it a habit to turn the cluster on and off only from the first node in the primary site.

As of January 2022 the problem is fixed in 3.4.6 and will be fixed in 4.0.2 and future releases. The TURN CLUSTER ON button will not be active until the cluster is fully stopped.