The "HA" or High Availability probe is traditionally used to ensure availability of the Primary Hub core services (NAS, data_engine, etc.) and the use for this purpose is well documented.
Less well-known/well-documented is the fact that the HA probe can be used for any pair of hubs to provide a layer of redundancy/high availability.
A good example of this is providing redundancy for tunnel servers, so that a tunnel client can have more than one path to send data to the primary hub.
This document can serve as an example/guide for such a setup.
In this example we will focus on the following hubs:
Tunnel Redundancy
In this environment, Tunnel-Client-One has two client connections defined - one for each of the tunnel servers. These tunnel servers can be located in completely different datacenters/locations as long as the client can reach both of them over the network.
Further explanation of why this can take some time can be found in this article.
Queue Configuration
Once the redundant tunnels have been established, you should configure queues as follows:
At this point we now should have data flowing from the client hub through the GET queues from the Main tunnel server, then from there to the primary hub through the GET queues there.
HA deployment
The next step is to deploy the HA probe to the Standby tunnel server.
The probe will deploy in a deactivated state. You can leave it deactivated for now, and double-click on it to bring up the configuration GUI.
Go to the 'Configure' tab, and select the Main Tunnel Server as the hub to synchronize with:
Next, under "Queues to enable" add the three GET queues (the ones which we left deactivated earlier):
Next in the "Options" tab - if you do not have NAS probes on your tunnel servers then make sure to uncheck the NAS AO option:
Otherwise, click OK and then activate the HA probe.
Upon activation, you should see a message indicating that contact was "restored" with the Main tunnel server:
Feb 26 18:24:28:404 0 HA: ****************[ Starting ]****************
Feb 26 18:24:29:407 0 HA: INFO: FAILBACK: Connection to '/ExampleDomain/Tunnel-Server-Main/tunnel-server-robot/hub' restored. Issuing state change.
Verification
To validate the setup, first, open the hub probe GUI on the Main tunnel server, and in the "Status" tab, verify that the three GET queues are active:
On the tunnel client the Status tab should show the Main tunnel server connected to the same queues:
Now, to simulate an outage, stop the robot (e.g. stop the underlying Service) on the Main tunnel server.
After a moment, you should see that the HA probe has noticed the outage by looking at the log:Feb 26 18:29:55:717 0 HA: WARN: FAILOVER: Failed to contact primary hub '/ExampleDomain/Tunnel-Server-Main/tunnel-server-robot/hub': communication error. Issuing state change.
And if you check the hub GUI/Status tab on the Backup tunnel server you will note that the queues there have activated:
As mentioned above, the Infrastructure Manager client or Admin Console will temporarily show the client hub as unreachable along with the primary tunnel server:
This willl normally take around 40-60 minutes to correct itself and allow communication with the client - however, be assured that alarms and data are still coming in from this hub during that time.
Note on Data Origins
In this example, the tunnel servers themselves do not have additional robots attached, so there is no monitoring data being submitted from their respective origins. It is assumed that all monitoring data is coming from robots attached to the "Client" server in which case the Origins will not change.
In the event that you do have robots attached to the tunnel servers, you may need to update/override the Origin on the secondary/standby hub to match that of the first/main hub (assuming that the robots attached to the Main tunnel server will fail over to the standby hub at the same time the tunnels fail over.)
If there are no robots or monitoring data being submitted from the tunnel servers directly this is not necessary.
More information about Origin overrides is available here.