Trying to add a node to a cluster fails all the time with the following error: "Error restarting cluster. Please refer to appliance logs". If the node is in a secondary site, or if the node is in a primary site with more than two nodes, it shows replication "Timeout" and it is inaccessible.
Checking the aactrl entries for the node having trouble, the following messages appear
11/09/20 12:33:41 - aactrl.sh: Requesting a full database dump
11/09/20 12:33:42 - aactrl.sh: Waiting for dump to be ready
11/09/20 12:33:47 - aactrl.sh: Requesting if the database is ready return error: Please check your Synchronization settings and try again., aborting ...!
11/09/20 12:33:47 - Syncing with the master database failed.
We can see that the DB dump is requested but it almost immediately fails. The same messages appear irrespective of whether the node is in a primary or secondary site
In the session logs there will be messages like the following
2020/11/09 12:34:02,system,alert, --, --, --, --, --, --,10.10.10.10, --, --, --, --,"PAM-CMN-1417: PAM appliance (0.0.0.0) attempted to perform cluster operation, but is not part of the cluster list.",0, --,,0
Release : 3.3.X and 3.4.X
Component : PRIVILEGED ACCESS MANAGEMENT
This is a communications error caused by a mismatch in the communications settings of the node with respect to the rest of the cluster
For instance, let's imagine we have two nodes, node A and node B.
node A is configured to use one NIC interface: GB1 with IP address 10.10.10.10/24 and gateway 10.10.10.1
node B is configured with two NIC interfaces: GB1 with IP address 192.168.10.100/24 and gateway 192.168.10.1, and GB2, with address 10.10.10.10/24 and gateway 10.10.10.1. This cluster member is configured to use GB2 for the cluster, which is in the same subnet ad node A
This may cause a problem with the cluster communication, since the DB may be requested through the subnet intended to provide cluster communication, 10.10.10.0, but the information may be coming through the other defined interface.
If this is the case we will see messages like the ones indicated previously in the session logs, that is
2020/11/09 12:34:02,system,alert, --, --, --, --, --, --,10.10.10.10,10.10.10.10, --, --, --, --,"PAM-CMN-1417: PAM appliance (0.0.0.0) attempted to perform cluster operation, but is not part of the cluster list.",0, --,,0
The (0.0.0.0) message is caused by the inability of the cluster to recognize the origin of the cluster packets coming from the node we are trying to add to the cluster
Set the cluster to use the same interface in each node and make sure that traffic is always going through that interface. For instance in the previous case make sure that in both nodes GB1 points to the 10.10.10.0 subnet IPs and that for both nodes as well the cluster is configured to use GB1 as the cluster interface. The other interface, GB2, may be added as well once it is clear that communications between nodes follow the correct path.