After an unplanned reboot of the HyperV cluster supporting our Primary and Secondary hub we experienced issues and corruption in certain files on both the primary and secondary hubs. We've been able to completely recover our primary hub and the secondary hub, but we have been unable to get the Discovery Server probe to start on the secondary hub. This Discovery Server is our primary way to discover our environment. When I activate the probe all I get is an 'error' status with no log created.
I have reinstalled the discovery server probe with no success. I have deleted the discovery server probe and reinstalled again with no success. I have no log files to provide, but I have attached a copy of the discover_servery.cfg. I need help in find out what is wrong with this probe.
Release : 20.3
Component : UIM - DISCOVERY_SERVER
The discovery_server probe is normally only runs on the primary hub (link). As it sounds like you have the secondary hub running for HA purposes, it would be normal to have the discovery_server probe also deployed to the secondary hub, but it would only be started as part of a failure recovery event. Starting the probe would be done by the HA probe when it detects that the primary hub is down. The discovery_server probe also has a requirement that the data_engine probe start first. The data_engine probe is also one that normally only runs on the primary hub, but would be part of the HA probe failover situation. If the data_engine probe is down, we would expect that the discovery_server probe does not start. So, it sounds normal for the probe to not run on the secondary hub.
Here is a link to the HA guide that describes how the HA probe will handle a failover event and start the probes that normally only run on the primary hub:
AIOps - DX Infrastructure Manager - UIM High Availability Guide