How to automatically fix missing Operator Console robots due to cloning
search cancel

How to automatically fix missing Operator Console robots due to cloning

book

Article ID: 56948

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) Unified Infrastructure Management for Mainframe CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

As cloning VMs becomes an increasingly common practice, we are also seeing issues where robot device IDs are also being duplicated, thereby preventing robots from showing up in OC(USM).

At the appropriate logging levels, one can see evidence of the discovery_server keeping a particular robot and ignoring others.For example from the discovery_server log at loglevel 5:

13 May 2020 15:27:17,654 [scanTask-1] WARN  com.nimsoft.discovery.server.nimbus.scan.RobotWithDuplicateDeviceIdPruner - Ignoring robot '/MyDomain/robot1'.  IP: 'xx.xx.xxx.x'.  Created time:  '1589391810'.
13 May 2020 15:27:17,654 [scanTask-1] WARN  com.nimsoft.discovery.server.nimbus.scan.RobotWithDuplicateDeviceIdPruner - Ignoring robot '/MyDomain/MyHub/robot2'.  IP: 'xx.xx.xxx.xx'.  Created time:  '1589391810'.
13 May 2020 15:27:17,654 [scanTask-1] WARN  com.nimsoft.discovery.server.nimbus.scan.RobotWithDuplicateDeviceIdPruner - Ignoring robot '/MyDomain/MyHub/robot3'.  IP: 'xx.xx.xxx.xx'.  Created time:  '1589391810'

The goal of this article is to use that logging to generate alarms that will be parsed by a NAS Auto operator to automatically reset the robot's device ID.

*** It is important to note that implementing the nas Auto-Operator portion of this procedure will cause the robot to be restarted as the duplicate robot IDs are detected.

If you need to control this process, do not implement the nas Auto-Operator portion so you can act on the alarms manually.

Environment

  • Release: UIM 8.51 or higher
  • Component: UIMDSC
  • discovery_server
  • discovery_agent

Cause

  • VM cloning

Resolution

Discovery Server setup

RDP or SSH into the Primary hub and access the discovery_server probe folder.

Windows:
default location is C:\Program Files (x86)\Nimsoft\probes\service\discovery_server

Linux:
default location is /opt/nimsoft/probes/service/discovery_server

Open log4j.xml in a text editor

Locate the "MaxFileSize" directive and increase the value to  "15242880".
This will give logmon time more time to capture the entries in the discovery_server.log before it rolls over.

You can reduce the "MaxBackupIndex" to store fewer rolled log files if disk space is a concern.

For example, here are the default values to be modified:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/" threshold="null">

    <appender name="ProbeLogFile" class="org.apache.log4j.RollingFileAppender" >
        <param name="File" value="discovery_server.log"/>
        <param name="Append" value="true"/>
        <param name="MaxFileSize" value="5242880"/>
        <param name="MaxBackupIndex" value="5"/>
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern" value="%d{DATE} [%t] %-5p %c - %m%n"/>
        </layout>
    </appender>

Save the file.

Open Infrastructure Manager and open the discovery_server configuration in raw configure.

Change the loglevel to 5 and click OK
 

Logmon setup

From Infrastructure Manager, access your archive and download logmon if it is not already.

Deploy logmon to your Primary hub

Open the logmon configuration UI

Right Click in the left pane and click New

Give the profile a name like "RobotWithDuplicateDeviceId"

Click the check box next to the profile name to activate it.

On the General tab, browse to the location of your discovery_server.log.

In Windows the default location is C:\Program Files (x86)\Nimsoft\probes\service\discovery_server
In Linux the default location is /opt/nimsoft/probes/service/discovery_server

Uncheck 'Generate Quality of Service'


Click the Watcher Rules tab

Right-click in the left pane of this tab and choose New

Give the watcher a name of "Keeping"

Click the checkbox next to the Watcher name to activate it.

On the Standard tab,

Match Expression: *WARN ?com.nimsoft.discovery.server.nimbus.scan.RobotWithDuplicateDeviceIdPruner - Keeping robot*
Message to Send on Match: Keeping duplicate robot ${robot}

On the Variables tab

Right-click in the list section and choose New

Name: robot
Source FROM Position

   Select Column and enter 10 for the value
   
   Source TO Position

   Select To Column and enter 10 for the value

   Click OK


On the Alarm tab

Suppression Key: ${robot}

Now we need to create a second Watcher

Right-click in the left pane of the Watcher Rules tab again and choose New

Give the watcher a name of "Ignoring"

Click the checkbox next to the watcher name to activate it.

On the Standard tab,

Match Expression: *WARN ?com.nimsoft.discovery.server.nimbus.scan.RobotWithDuplicateDeviceIdPruner - Ignoring robot*
Message to Send on Match: Ignoring duplicate robot ${robot}


On the Variables tab

Right-click in the list section and choose New
Name: robot
Source FROM Position

Select Column and enter 10 for the value

Source TO Position

Select To Column and enter 10 for the value

Click OK


On the Alarm tab

Suppression Key: ${robot}

 

Click OK on the probe GUI to save and activate the new profile.

 

At this point, logmon will generate Informational alarms related to duplicate robot IDs.

For some customers, this may be sufficient to simply alert on which robots need further action.

To automatically generate a new robot device ID and close the alarms, continue with the following instructions.

Remember, this portion will cause the robots to restart automatically in order to generate new robot IDs.

 

Setting up the NAS AO profile

Download the ResetRobotDeviceId,zip file which contains the ResetRobotDeviceId LUA script attached to this article.

Extract the ResetRobotDeviceId LUA script from the zip archive and save it to your primary UIM server, placing it in your nas\scripts directory.

On Windows the default location is C:\Program Files (x86)\Nimsoft\probes\service\nas\scripts

On Linux the default location is /opt/nimsoft/probes/service/nas/scripts

From Infrastructure Manager, open your nas configuration.

Click the Auto-Operator tab

Click the Profiles tab

Right-click in the nas Auto Operator list and choose New

Action type: script
Script: ResetRobotDeviceId (please see attached zip file for the script)
Action mode: On message arrival
Severity Level: Informational
Message string: /*Keeping duplicate robot*/
Click OK
Enter new profile name: KeepingDuplicateRobot

Click OK

Right-click in the Auto-Operator list and choose New

Action type: script
Script: ResetRobotDeviceId
Action mode: On message arrival
Severity Level: Informational
Message string: /*Ignoring duplicate robot*/
Click OK
Enter new profile name: IgnoringDuplicateRobot
Click OK

Click OK to save the nas configuration and restart.

***Please find the attached ResetRobotDeviceId.zip file.***

Additional Information

ResetRobotDeviceId script contents:

===========================================================
-- Reset robot device ID

-- Caution: This will cause the robot service to restart

Alarm = alarm.get()
AlarmMessage = Alarm.message
TargetRobot = AlarmMessage:match("'(.*)'")

nimbus.request(TargetRobot, "_nis_cache_clean")
nimbus.request(TargetRobot, "_reset_device_id_and_restart")

action.close(Alarm.nimid)
============================================================
 
Clone Robot VM Images

Important!
We do not support cloning of robots. We recommend you use the cloud installation option for the robot to ensure that it is not started until the vm-clone instance is created, and then using a request.cfg file to install the required probes and probe configurations.

For further information see Install a Windows Robot at:

Install a Windows Robot

"Cloud option installs a robot onto a master image of a virtual machine (VM) for provisioning purposes. Using this method, you can monitor new VMs as they are deployed. Cloud installation leaves the installed robot in a latent state. The robot starts after a configurable number of host restarts."
 

Attachments

1568404408140__ResetRobotDeviceId.zip get_app