Technical details regarding discovery_server communications
search cancel

Technical details regarding discovery_server communications

book

Article ID: 14966

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) Unified Infrastructure Management for Mainframe CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

How does the discovery_server probe communicate with discovery_agents?

Environment

  • UIM 8.51 or higher

Resolution

  1. discovery_server  probe asks the primary hub for a list of hubs using 'gethubs' callback

  2. discovery_server probe then asks each hub in that list for 'getrobots' to get a list of all robots 

  3. discovery_server probe then does a 'nametoip' callback to find the IP address and port of the robot

  4. If there IS a tunnel, then nametoip will return the IP/port of a tunnel session and connect to that tunnel session which will handle the request to the robot

  5. If there is NO tunnel, nametoip will return the actual IP/port of the robot itself on port 48000 and the discovery_server will try to connect directly to the robot - so this may fail if:

    a) there is no route to the robot from the primary or

    b) there is no tunnel between hubs

Additional Information

Example scenario (no metrics displayed in Operator Console or OC for one or more robots):

In one particular customer case, where there was no tunnel between the primary and remote hubs, the discovery_server relies on the nametoip callback, and gets the address back in the result (e.g., 10.##.#.###) and then attempts to connect directly to that IP, port 48000. 

In the discovery_server log, you must search for errors regarding the given robot hostname and/or other hosts for which no metrics are being displayed.

As an example, this robot below shows the type of error you would expect to see in the discovery_server log indicating that it was trying to communicate directly with the 10.##.#.### robot to fetch the niscache elements but it failed. 

An example error showing a failure to fetch the nis_cache elements on the given robot is shown below: 

[robotWorker-2] WARN com.nimsoft.discovery.server.nimbus.scan.NisCacheUpdater - fetch nis cache failed on pass=0 with 0 total elems received for /<uim_domain>/<uim_secondary_hub>/<hostname> : (80) Session error, Unable to open a client session for 10.24.xx.xxx:48000: Connection timed out: connect 

The discovery_server log should be set to loglevel 5 with a large logsize, e.g., 50000 or higher, and check the actual losg on the file system.