Robots registering with 127.0.0.1 IP/APIPA address and failed to collect data & alerts

book

Article ID: 145428

calendar_today

Updated On:

Products

NIMSOFT PROBES DX Infrastructure Management

Issue/Introduction

We had a major outage on critical system due to disk space issue , UIM did not alert when the disk is full. When verified, UIM robot we identified that the robot is having 127.0.0.1 IP address and all the probes except controller are in failed state. We had to update the robot with the correct ip address in robot.cfg and restart nimbus service to fix the issue as a work around. We need a permanent fix for this issue as we observe the behavior with quite few systems on regular basis which is impacting the monitoring mechanism and giving wrong impression on the tool capability and design. We would also like to know is there any way we can use FQDN for communicating with robot and the hub gets the IP address using the FQDN.

Cause

- known issue

Environment

- UIM v9.02
- Robot v7.96 Build 230 / v7.97 Build 283
- Windows or Linux robots

Resolution

Prior to robot v9.20, rebooting of servers would sometimes brings up the robot using the loopback address (127.0.0.1) on the server. Based on customer experience, this seems to be caused by a 'race condition' between UIM starting and the network going active. So it may also still be worth setting the Nimsoft Robot Watcher service to 'Automatic (Delayed Start),' on Windows if the upgrade and parameters listed below do not avoid the issue of the robot taking the loopback or APIPA address.

Please refer to to robot (controller) release notes:
https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/it-operations-management/ca-unified-infrastructure-management-probes/GA/alphabetical-probe-articles/controller/controller-release-notes.html

**********
Fixed an issue in which when a robot was unable to find a valid IP address in the robot.cfg or DNS, it was taking the loopback address. This was disrupting the communication. This issue has been resolved in this release. To resolve this issue, two configurable parameters have been added to robot.cfg:

max_retry_IfLoopBack_attempt
Specifies the maximum number of retries that controller can use to find the system IP address. In the case of loopback IP address, if robot_ip is not configured in robot.cfg, then controller tries to get the system IP. It tries to find the IP address for the number of retries that are specified in the parameter. After trying for the specified count, controller starts again with the loopback IP, resulting into the same initial behavior. The default value is 10. 

sleep_btwRetry_IfLoopBack_attempt
Specifies the maximum interval for which controller waits before trying again for the subsequent attempt. The default value is 2 seconds.

UIM admins indicated that this appears to occur after the robot IP address is changed and the system is rebooted. Note that other UIM customers have noticed this behavior after a reboot as well.

After the robot_update v9.20 was distributed to the robot, the robot restarted and no longer displayed the loopback address.

To further ensure that the robot would no longer use the loopback/APIPA address after an IP change and subsequent reboot, in the robot.cfg controller section, we set the following parameters:

max_retry_IfLoopBack_attempt = 10

sleep_btwRetry_IfLoopBack_attempt = 5