Discovery would not run and we have to reboot discovery_server

book

Article ID: 127835

calendar_today

Updated On:

Products

DX Infrastructure Management NIMSOFT PROBES

Issue/Introduction

Manual discovery would not run. researched the error and found this workaround:

https://communities.ca.com/docs/DOC-231175201-tech-tip-discovery-wizard-missing-for-all-or-some-users

But the errors persisted.

Cause

- configuration and cached data that was no longer valid

Environment

- UIM 9.0.2

Resolution

https://communities.ca.com/docs/DOC-231175201-tech-tip-discovery-wizard-missing-for-all-or-some-users

Customer used the above solution as well as related tasks to the originally reported problem and the inability to run discovery or query the discovery_server without errors/issues.

- Significantly reduced the number of hub subscribers which was 75+ or higher
- Analyzed discovery_agent TO discovery_server communication/intermittent errors. The DA  was referring to another discovery_server which was NOT part of the same UIM domain.

Stopped the robot where the discovery_agent sits and deleted the following files:

- robot_env.sds
- robot.sds
- hubs.sds
- niscache

After clearing out the cached info, and starting the robot again, the DA log showed no further evidence/no more log entries for the old DS being referred to previously in the discovery_agent log.

The log was clean from that point on and referred ONLY to the correct Primary hub.

In the meantime, previously, snmpcollector/pollagent alarms were not showing up in IM alarm console when errors occurred with snmpcollector when trying to query the discovery_server failed. In some environments, alarms may load very slowly into the Infrastructure Manager console, and sometimes the complete list will not load.

Therefore, we added the following config key to the "setup" section of nas.cfg: “get_alarms_force_wait = yes” and restarted the nas.

Lastly, the customer logged in to the Admin Console and using the snmpocollector probe,  tested the query of the discovery_server and it was successful without any further errors.