Broadcom API Gateway 10 - Scheduled task executed on multiple nodes although "One Node" option is checked
search cancel

Broadcom API Gateway 10 - Scheduled task executed on multiple nodes although "One Node" option is checked

book

Article ID: 233171

calendar_today

Updated On:

Products

CA API Gateway

Issue/Introduction

Upon setting up a custom scheduled task via Policy Manager, the "One Node" option is checked in its settings, hence the task is expected to be ran only by a single node all the time. 

However, Audit records are showing the task executed by two nodes despite its settings, causing an issue with task overlapping.

Environment

Release : 10.x

 

Cause

Any scheduled task that has been set to be execute on "One node", will run on on what is identified as *MASTER node.

*This not to be confused with master/slave role in the replication schema.

In a Gateway cluster, the master node is identified by data value (nodeid) stored in the Database table ssg.cluster_master .

If the task is executed on multiple nodes (although is set to be ran on a one node only), that could be caused by a mismatch of nodeid's stored in the ssg.cluster_master table between Primary and Secondary DB node. The value must be identical on both nodes. This can be confirmed by issuing the following MySQL query on both, Primary and Secondary DB node:

# select * from ssg.cluster_master;  

Here an example output:

 

 

Resolution

Before to proceed:

We recommend to scheduled a maintenance window, as Gateway service need to be stopped on all nodes while fixing this. 

Ensure replication between Primary and Secondary node is healthy by running MySQL statement "show slave status\G" on both. If replication is broken, fix it by following steps from following KB Article: 

API Gateway: Reinitialize replication in a multi-node cluster - https://knowledge.broadcom.com/external/article/44402/api-gateway-reinitialize-replication-in.html

Resolution Steps:

1. Stop the Gateway service on all 4 nodes that are part of the same Gateway cluster by running the following command at OS prompt 

# service ssg stop

2. Once Gateway service is stopped on all nodes, open MySQL console on the Primary DB node

3. Execute the following SQL statements in the order shown below:

# delete from ssg.cluster_master;

# INSERT INTO ssg.cluster_master (nodeid, touched_time, version) VALUES (NULL, 0, 0);

Then confirm cluster_master table is now showing empty values by running :

# select * from ssg.cluster_master;

The output must look as below example:


4. The active replication should update the cluster_master table on the Secondary DB node as well. Hence open MySQL console on the secondary and confirm that is indeed the case by running:

# select * from ssg.cluster_master;

5. Restart Gateway service on all nodes, firstly Primary DB node, then Secondary DB node and finally the remaining two Gateway nodes.

6. Once Gateway nodes are started, check again the cluster_master table on both Primary and Secondary DB node by running below command:

# select * from ssg.cluster_master;

They should both show the exact nodeid value this time.