PAM users report observing 'PAM-CMN-1161' errors on their dashboard: "The cluster is currently out of sync, or a node is missing. Please go to the Configuration Clustering page for more information.". When checking on the cluster status as PAM admin we find that the cluster indeed is out of sync. This is a single-site two-node cluster and both nodes rebooted within a short time prior to the problem being observed.
Environment
Release: Component: CAPAMX
Cause
When a PAM cluster node in a primary site comes back online after a reboot, it can rejoin the cluster in a good state only if the current cluster master is up and running. If the master also is in the process of rebooting, both nodes will try to assume the master role temporarily because neither one finds the other in a good state. This will result in a bad cluster state once both nodes are fully online.
Resolution
When multiple cluster nodes in the primary site, including the master node, go down at about the same time, the cluster has to be stopped and started again as soon as possible after the nodes are back online. Syslog integration, or setting up email notifications on the Configuration > Monitor page can alert the PAM admin per email when a node reboots. We recommend to check on cluster status at least after every reboot of the designated master node, the first node in the primary cluster site. PAM Engineering continues to work on improvements in cluster resilience to problems and future PAM releases may be able to recover from such a problem w/o administrator intervention.