How does the discovery_server probe communicate with discovery_agents?
discovery_server probe asks the primary hub for a list of hubs using 'gethubs' callback
discovery_server probe then asks each hub in that list for 'getrobots' to get a list of all robots
discovery_server probe then does a 'nametoip' callback to find the IP address and port of the robot
If there IS a tunnel, then nametoip will return the IP/port of a tunnel session and connect to that tunnel session which will handle the request to the robot
If there is NO tunnel, nametoip will return the actual IP/port of the robot itself on port 48000 and the discovery_server will try to connect directly to the robot - so this may fail if:
a) there is no route to the robot from the primary or
b) there is no tunnel between hubs
Example scenario (no metrics displayed in Operator Console or OC for one or more robots):
In one particular customer case, where there was no tunnel between the primary and remote hubs, the discovery_server relies on the nametoip callback, and gets the address back in the result (e.g., 10.##.#.###) and then attempts to connect directly to that IP, port 48000.
In the discovery_server log, you must search for errors regarding the given robot hostname and/or other hosts for which no metrics are being displayed.
As an example, this robot below shows the type of error you would expect to see in the discovery_server log indicating that it was trying to communicate directly with the 10.##.#.### robot to fetch the niscache elements but it failed.
An example error showing a failure to fetch the nis_cache elements on the given robot is shown below:
[robotWorker-2] WARN com.nimsoft.discovery.server.nimbus.scan.NisCacheUpdater - fetch nis cache failed on pass=0 with 0 total elems received for /<uim_domain>/<uim_secondary_hub>/<hostname> : (80) Session error, Unable to open a client session for 10.24.xx.xxx:48000: Connection timed out: connect
The discovery_server log should be set to loglevel 5 with a large logsize, e.g., 50000 or higher, and check the actual losg on the file system.