Following some rare use cases whose nature we still do not have a full understanding of and after a cluster restart, appliances in cluster start experiencing strange problems, like nodes of the cluster showing up as inactive, incorrect certificates shown even though they are apparently correctly placed in the PAM store
This is caused by a rare condition which wipes out the local configuration_f table of the node having rebooted. As a result, the node is unable to retrieve its configuration and several functionality of the cluster/node starts misbehaving.
To understand if this is the problem, please run from an ssh session open to the node experiencing the problem
The result of this command should be that there should be around 50 rows in that table. If there are less than that, we may be experiencing an issue which requires resolving.
It is also quite frequent after this error is corrected, that a supplementary one regarding cluster startup is presented. In this particular case the catalina daemon will keep restarting with a message about a missing cluster site in the site table. This will happen irrespective of whether the cluster has been deactivated and xpa-clusctl -d has been issued in the node to delete cluster configuration
CA PAM versions 3.3.X, 3.4.0-3.4.5
The following command:
populates the whole configuration_f table in case it has missing elements.
As far as correcting the error in the site table which prevents successful startup of tomcat due to the site entry in the site table in cspm missing, there is as well a healing procedure