Cluster node missing configuration_f database table and trying to start cluster while offline

book

Article ID: 207654

calendar_today

Updated On:

Products

CA Privileged Access Manager (PAM)

Issue/Introduction

Following some rare use cases whose nature we still do not have a full understanding of and after a cluster restart, appliances in cluster start experiencing strange problems, like nodes of the cluster showing up as inactive, incorrect certificates shown even though they are apparently correctly placed in the PAM store

Cause

This is caused by a rare condition which wipes out the local configuration_f table of the node having rebooted. As a result, the node is unable to retrieve its configuration and several functionality of the cluster/node starts misbehaving. 

To understand if this is the problem, please run from an ssh session open to the node experiencing the problem

machineId=`cat /var/uag/config/machineid`; mysql --defaults-extra-file=/var/uag/db/pamClient.cnf -h 127.0.0.1 -P 3306 --protocol=TCP uag -e "select count(*) from configuration_f where machine_id = 0x${machineId}"

The result of this command should be that there should be around 50 rows in that table. If there are less than that, we may be experiencing an issue which requires resolving.

It is also quite frequent after this error is corrected, that a supplementary one regarding cluster startup is presented. In this particular case the catalina daemon will keep restarting with a message about a missing cluster site in the site table. This will happen irrespective of whether the cluster has been deactivated and xpa-clusctl -d has been issued in the node to delete cluster configuration

 

Environment

CA PAM versions 3.3.X, 3.4.0-3.4.5

Resolution

The following command:

machineId=`cat /var/uag/config/machineid`;applianceType=`/sbin/identify-gk`;hostName=`uname -n`;cat /var/uag/db/init/install-initial-db-data.sql | sed -e "s/_MACHINEID_/${machineId}/" -e "s/_APPLIANCE_TYPE_/${applianceType}/" -e "s/_HOST_NAME_/${hostName}/" -e "s/_IP_ADDR_/127.0.0.1/" | mysql  --defaults-extra-file=/var/uag/db/pamClient.cnf -h 127.0.0.1 -P 3306 --protocol=TCP uag  --force

populates the whole configuration_f table in case it has missing elements. 

As far as correcting the error in the site table which prevents successful startup of tomcat due to the site entry in the site table in cspm missing, there is as well a healing procedure

machineId=`cat /var/uag/config/machineid`; mysql --defaults-extra-file=/var/uag/db/pamClient.cnf -h 127.0.0.1 -P 3306 --protocol=TCP uag -e "select count(*) from configuration_f where machine_id = 0x${machineId}"

sitename=$(mysql -uroot -N -e "select propertyvalues from cspm.local_properties where propertyname = 'sitename'");echo $sitename;


count=$(mysql -uroot -N -e "select count(*) from cspm.site where name = '${sitename}'");echo $count;

Here count must be greater than zero or it means the information is missing

siteid=$(/sbin/getClusterConfig --my-server-id);

siteHost=$(hostname -I | awk '{print $1}');

if [ "$count" -eq 0 ]; then mysql -uroot cspm -e "INSERT INTO cspm.site(siteid, name, isMaster, hostName, au_createdate, au_createuserid, au_updatedate, au_updateuserid, active) VALUES ( ${siteid},'${sitename}',0,'${siteHost}', NOW(), 'super',NOW(), 'super',1)";fi