After upgrading to UIM 20.3x and/or hub 9.31 I am seeing issues with my robots in Infrastructure Manager, such as:
The root cause of this issue is generally related to robot cloning but that is not the only cause.
Very likely, someone in your environment has, at some point recently or in the past, been creating robots by some sort of cloning process where they have a master image with UIM robot and they have been making clones of those machines with the robot already installed.
When you do this, you have to follow certain best practices which are not being followed:
https://knowledge.broadcom.com/external/article/33662/best-practices-for-cloning-systems-with.html
So to explain what the underlying problem is: Every robot has a robot_device_id (also known as dev_id) associated with it and all the metrics for the robot get linked to the robot based on this ID. This device ID is uniquely generated at install time and is based on a hash of the robot's name and IP address.
When you clone a machine with the robot installed already you will also clone the robot_device_id if you don't follow the best practices above, which then causes numerous problems with metrics and duplication.
Due to some unforeseen changes in the new version of the hub, v9.31, it also caused this new problem with robot duplication in IM.
The KB link above has more detail and you will need to share this process with your teams who are creating new images and spinning up new servers, but basically, they will need to remove the niscache folder contents on the "master image" before they make clones, or they will have to clear it out on each clone after creating it.
If you're impacted by the duplication or any one of the other symmptoms listed in the Article introduction listed above, please do the following:
- Upgrade all hubs in the environment to hub 9.33HF1 which is available at the UIM hotifix site:
or directly here:
- RENAME (do not delete, in case you need to restore) the file: ($NIMSOFT_HOME)/nimsoft/hub/robot.sds on every hub
- After renaming this file, stop the hub (on Linux, "service nimbus stop", on Windows, stop the Nimsoft Robot Watcher Service)
- wait a full 6 minutes (use a timer/stopwatch) and then start the hub again.
- after the restart many/all of your robots may be temporarily missing from the Infrastructure Manager (IM) GUI.
- wait approx. 15 minutes and all active robots should check in and re-register, and then your robots should appear correctly as they did prior to the upgrade.
Note that if some robots are missing, restarting the Nimsoft Robot service should make them show up in the console again. If the robots do not reappear, you may not have left the hub down long enough. Try the steps above again, and this time, wait a full 10 minutes.
If it still doesn't work you will have to restore the robot.sds which you renamed earlier and restart the hubs, and then contact support to troubleshoot why the robots aren't coming back.
Note: if the robots are able to reach a secondary hub, then within the 5-minute shutdown period, the robots may temporarily assign themselves to that secondary hub. If this is the case, then after you start the hub back up, you can either restart each robot to move them back to their original hub, or you can shut down the secondary hub for 5 minutes (without removing robot.sds) in which case they should then move back to their original hub.
- Next you need to follow this KB Article to reset all the dev ID's in your environment on every active robot:
https://knowledge.broadcom.com/external/article?articleId=208622
After this step, you should be able to upgrade the hub back to the very latest version, but please make sure you are not spinning up any new clones without following the best practices link above or the issue will return.
Best Practices for cloning systems with Nimsoft (UIM) Robot already installed
https://knowledge.broadcom.com/external/article/33662
How can I clear niscache and/or reset the robot_device_id of every active robot in my UIM domain?
https://knowledge.broadcom.com/external/article/208622
Background
- why hub v9.33HF1 or higher must be deployed to resolve 1 or more of the symptoms listed in this article:
In hub v9.20HF16 which was merged forward into hub v9.31, there was a fix applied to the hub for situations where renaming a robot caused the hub to send "robot inactive" alarms. (Support Case: 31888942)
In this hotfix, the robot_device_id was used for checking robot uniqueness at the hub level to handle robot name changes. Unfortunately, this caused issues in certain environments where there was a possibility of duplicate device IDs.
Development then reverted the changes done for defect# DE459104 in 9.33HF1.