As cloning VMs becomes an increasingly common practice, we are also seeing issues where robot device IDs are also being duplicated, thereby preventing robots from showing up in OC(USM).
At the appropriate logging levels, one can see evidence of the discovery_server keeping a particular robot and ignoring others.For example from the discovery_server log at loglevel 5:
13 May 2020 15:27:17,654 [scanTask-1] WARN com.nimsoft.discovery.server.nimbus.scan.RobotWithDuplicateDeviceIdPruner - Ignoring robot '/MyDomain/robot1'. IP: 'xx.xx.xxx.x'. Created time: '1589391810'.
13 May 2020 15:27:17,654 [scanTask-1] WARN com.nimsoft.discovery.server.nimbus.scan.RobotWithDuplicateDeviceIdPruner - Ignoring robot '/MyDomain/MyHub/robot2'. IP: 'xx.xx.xxx.xx'. Created time: '1589391810'.
13 May 2020 15:27:17,654 [scanTask-1] WARN com.nimsoft.discovery.server.nimbus.scan.RobotWithDuplicateDeviceIdPruner - Ignoring robot '/MyDomain/MyHub/robot3'. IP: 'xx.xx.xxx.xx'. Created time: '1589391810'
The goal of this article is to use that logging to generate alarms that will be parsed by a NAS Auto operator to automatically reset the robot's device ID.
*** It is important to note that implementing the nas Auto-Operator portion of this procedure will cause the robot to be restarted as the duplicate robot IDs are detected.
If you need to control this process, do not implement the nas Auto-Operator portion so you can act on the alarms manually.
Discovery Server setup
RDP or SSH into the Primary hub and access the discovery_server probe folder.
Windows:
default location is C:\Program Files (x86)\Nimsoft\probes\service\discovery_server
Linux:
default location is /opt/nimsoft/probes/service/discovery_server
Open log4j.xml in a text editor
Locate the "MaxFileSize" directive and increase the value to "15242880".
This will give logmon time more time to capture the entries in the discovery_server.log before it rolls over.
You can reduce the "MaxBackupIndex" to store fewer rolled log files if disk space is a concern.
For example, here are the default values to be modified:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/" threshold="null">
<appender name="ProbeLogFile" class="org.apache.log4j.RollingFileAppender" >
<param name="File" value="discovery_server.log"/>
<param name="Append" value="true"/>
<param name="MaxFileSize" value="5242880"/>
<param name="MaxBackupIndex" value="5"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{DATE} [%t] %-5p %c - %m%n"/>
</layout>
</appender>
Save the file.
Open Infrastructure Manager and open the discovery_server configuration in raw configure.
Change the loglevel to 5 and click OK
Logmon setup
From Infrastructure Manager, access your archive and download logmon if it is not already.
Deploy logmon to your Primary hub
Open the logmon configuration UI
Right Click in the left pane and click New
Give the profile a name like "RobotWithDuplicateDeviceId"
Click the check box next to the profile name to activate it.
On the General tab, browse to the location of your discovery_server.log.
In Windows the default location is C:\Program Files (x86)\Nimsoft\probes\service\discovery_server
In Linux the default location is /opt/nimsoft/probes/service/discovery_server
Uncheck 'Generate Quality of Service'
Click the Watcher Rules tab
Right-click in the left pane of this tab and choose New
Give the watcher a name of "Keeping"
Click the checkbox next to the Watcher name to activate it.
On the Standard tab,
Match Expression: *WARN ?com.nimsoft.discovery.server.nimbus.scan.RobotWithDuplicateDeviceIdPruner - Keeping robot*
Message to Send on Match: Keeping duplicate robot ${robot}
On the Variables tab
Right-click in the list section and choose New
Name: robot
Source FROM Position
Select Column and enter 10 for the value
Source TO Position
Select To Column and enter 10 for the value
Click OK
On the Alarm tab
Suppression Key: ${robot}
Now we need to create a second Watcher
Right-click in the left pane of the Watcher Rules tab again and choose New
Give the watcher a name of "Ignoring"
Click the checkbox next to the watcher name to activate it.
On the Standard tab,
Match Expression: *WARN ?com.nimsoft.discovery.server.nimbus.scan.RobotWithDuplicateDeviceIdPruner - Ignoring robot*
Message to Send on Match: Ignoring duplicate robot ${robot}
On the Variables tab
Right-click in the list section and choose New
Name: robot
Source FROM Position
Select Column and enter 10 for the value
Source TO Position
Select To Column and enter 10 for the value
Click OK
On the Alarm tab
Suppression Key: ${robot}
Click OK on the probe GUI to save and activate the new profile.
At this point, logmon will generate Informational alarms related to duplicate robot IDs.
For some customers, this may be sufficient to simply alert on which robots need further action.
To automatically generate a new robot device ID and close the alarms, continue with the following instructions.
Remember, this portion will cause the robots to restart automatically in order to generate new robot IDs.
Setting up the NAS AO profile
Download the ResetRobotDeviceId,zip file which contains the ResetRobotDeviceId LUA script attached to this article.
Extract the ResetRobotDeviceId LUA script from the zip archive and save it to your primary UIM server, placing it in your nas\scripts directory.
On Windows the default location is C:\Program Files (x86)\Nimsoft\probes\service\nas\scripts
On Linux the default location is /opt/nimsoft/probes/service/nas/scripts
From Infrastructure Manager, open your nas configuration.
Click the Auto-Operator tab
Click the Profiles tab
Right-click in the nas Auto Operator list and choose New
Action type: script
Script: ResetRobotDeviceId (please see attached zip file for the script)
Action mode: On message arrival
Severity Level: Informational
Message string: /*Keeping duplicate robot*/
Click OK
Enter new profile name: KeepingDuplicateRobot
Click OK
Right-click in the Auto-Operator list and choose New
Action type: script
Script: ResetRobotDeviceId
Action mode: On message arrival
Severity Level: Informational
Message string: /*Ignoring duplicate robot*/
Click OK
Enter new profile name: IgnoringDuplicateRobot
Click OK
Click OK to save the nas configuration and restart.
***Please find the attached ResetRobotDeviceId.zip file.***
ResetRobotDeviceId script contents:
===========================================================
-- Reset robot device ID
-- Caution: This will cause the robot service to restart
Alarm = alarm.get()
AlarmMessage = Alarm.message
TargetRobot = AlarmMessage:match("'(.*)'")
nimbus.request(TargetRobot, "_nis_cache_clean")
nimbus.request(TargetRobot, "_reset_device_id_and_restart")
action.close(Alarm.nimid)
============================================================