Automation Point customers want to know that all of their Automation Point machines are running properly. They also want to be able to "hot swap" machines in case of a hardware error. You can use a combination of REXX programs, PPQs, and rules to "heartbeat check" between two Automation Point machines.
First start to configure program-to-program queues (PPQs) on both Automation Point machines. PPQs are an inter-process communications tool. They are small data repositories that can be accessed via TCP/IP between Automation Point machines. This article discusses how to use PPQs to pass an "I'm Alive" message between two Automation Point computers.
To configure PPQs, on each of the two Automation Point machines, do the following:
The PPQ Service starts when you close Configuration Manager.
Once PPQs are configured, you need to write a REXX program that creates the queues shared between the Automation Point computers. The REXX program attempts to create a shared queue between machines. The name of the shared queue is the Automation Point machine name.
In this example a REXX program is configured on each of the Automation Point computers so that it starts as soon as Unicenter Automation Point starts, . The first computer to start creates the shared queue. The REXX program would look like the below example:
/* Hbeat_start.rexx */ /* First we initialize our REXX variables to 0, then try to create */ /* the shared PPQ queue. If the create fails, we send a message */ /* to the AP message window, which can be automated by rules */ remotemachinename_status = 0 address GLV "putp remotemachinename_status" remotemachinename_failure = 0 address GLV "putp remotemachinename_failure" Address PPQ "create queue (machinename) share(yes)" If rc <> 0 address axc "wtxc ' PPQ create failed. Please check network connectivity '"
Next set up the rules file to perform the heartbeat checks. In this case, assume a time rule is set that fires a REXX program.
For Example:
TIME(00:00), EVERY(5 MINUTES) REXX(HBEAT.REX)
PPQs can be manipulated directly from rules, but REXX programs are much more flexible. The REXX program first writes a "|" to the proper PPQ queue, reads the proper PPQ queue, and sets two variables remotemachinename_status and remotemachinename_failure. Replace machinename with the local Automation Point machine name and remotemachinename with the remote Unicenter Automation Point machine name. If the queue is read successfully, the remotemachinename_status variable is set to 1. Otherwise it remains 0. The remotemachinename_failure variable counts how many consecutive times the program fails to read the queue. If the remotemachinename_failure variable becomes greater than 3, the program sends a message to the Automation Point messages window.
Example of REXX program:
/* HBEAT.REX */ /* We write our heartbeat message to the proper queue. If we do not */ /* get a 0 return code, we do a wtxc, which can have a rule written */ /* against it to do a notification. The user should change */ /* machinename to the name of the remote PC */ Address PPQ "write queue( machinename ) item(heartbeat)" If rc <> 0 then address axc "wtxc ' PPQ write failed. Please check network connectivity '" else call checkit call resolve exit checkit: /* Now we look to see if we have received a heartbeat from*/ /* the remote AP machine */ Address PPQ "read queue( remotemachinename ) prefix(item)" If rc == 0 then do remotemachinename _status = 1 address GLV " putp remotemachinename _status" remotemachinename _failure = 0 address GLV "putp remotemachinename _failure" end else do remotemachinename _status = 0 address GLV " putp remotemachinename _status " address GLV " get r emotemachinename _failure " /* Increase the failure count by 1 */ nu_fail = remotemachinename _failure + 1 remotemachinename _failure = nu_fail address GLV "putp remotemachinename _failure" end resolve: /* If the failure count gets above 3 we need to do something */ /* so we send a message to the AP message window */ Address GLV " get remotemachinename _failure" If remotemachinename _failure > 3 then address axc "wtxc 'Heartbeat failure on remote AP'"