After upgrading to UIM 20.3x and/or hub 9.31 I am seeing issues with my robots in Infrastructure Manager, such as:
- some robots appear more than once
- some robots do not appear at all
- some robots appear at first, and then disappear later
- which robots are appearing or duplicated may shift over time and affect a different set of robots
- some robots appear red when they were fine before
- some robots, when you click on them, do not show any probes
- any combination of the above behaviors
The root cause of this issue is related to robot cloning.
Very likely, someone in your environment has, at some point recently or in the past, been creating robots by some sort of cloning process where they have a master image with UIM robot and they have been making clones of those machines with the robot already installed.
When you do this you have to follow certain best practices which are not being followed:
So to explain what the problem is: Every robot has a robot_device_id (also known as dev_id) associated with it and all the metrics for the robot get linked to the robot based on this ID. This device ID is uniquely generated at install time and is based on a hash of the robot's name and IP address.
When you clone a machine with the robot installed already you will also clone the robot_device_id if you don't follow the best practices above, which then causes numerous problems with metrics and duplication.
Due to changes in the new version of the hub it also causes this new problem with duplication in IM.
The KB link above has more detail and you will need to share this process with your teams who are creating new images and spinning up new servers, but basically they will need to remove the /niscache/ folder contents on the "master image" before they make clones or they will have to clear it out on each clone after creating it.
Release : 20.3
Component : UIM - HUB
If you're impacted by the duplication issue you'll have to do the following:
- downgrade all hubs in the environment to hub 9.20HF1 which is available here:
- RENAME (do not delete, in case you need to restore) the file: ($NIMSOFT_HOME)/nimsoft/hub/robot.sds on every hub and restart the hub/robot
- after the restart many/all of your robots may be missing from IM
- wait about 15 minutes and all active robots should check in and re-register, and now your robots should appear correctly as they did prior to the upgrade
note: if the robots do not reappear, you will have to restore the robot.sds which you renamed earlier and restart the hubs, and then contact support to troubleshoot why the robots aren't coming back.
- now you can follow this KB to reset all the ID's in your environment on every active robot:
After this you should be able to upgrade the hub back to the latest version, but please make sure you are not spinning up any new clones without following the best practices link above or the issue will return.
Best Practices for cloning systems with Nimsoft (UIM) Robot already installed
How can I clear niscache and/or reset the robot_device_id of every active robot in my UIM domain?