Customer used the above solution as well as related tasks to the originally reported problem and the inability to run discovery or query the discovery_server without errors/issues.
- Significantly reduced the number of hub subscribers which was 75+ or higher - Analyzed discovery_agent TO discovery_server communication/intermittent errors. The DA was referring to another discovery_server which was NOT part of the same UIM domain.
Stopped the robot where the discovery_agent sits and deleted the following files:
- robot_env.sds - robot.sds - hubs.sds - niscache
After clearing out the cached info, and starting the robot again, the DA log showed no further evidence/no more log entries for the old DS being referred to previously in the discovery_agent log.
The log was clean from that point on and referred ONLY to the correct Primary hub.
In the meantime, previously, snmpcollector/pollagent alarms were not showing up in IM alarm console when errors occurred with snmpcollector when trying to query the discovery_server failed. In some environments, alarms may load very slowly into the Infrastructure Manager console, and sometimes the complete list will not load.
Therefore, we added the following config key to the "setup" section of nas.cfg: “get_alarms_force_wait = yes” and restarted the nas.
Lastly, the customer logged in to the Admin Console and using the snmpocollector probe, tested the query of the discovery_server and it was successful without any further errors.