identifying robots that are not communicating
search cancel

identifying robots that are not communicating

book

Article ID: 248861

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

Is there a way to identify UIM robots which are not working properly?

This includes:

- robots which are down/red in IM
- robots which appear green in IM but are unresponsive or experiencing communication errors

 

Environment

Any release

Resolution

Attached is a lua script (robot_status_script_save_to_file.lua) which will assist with creating a list of robots that are down or which may need attention.

Steps:

1. Edit the script and change the "uimdomain" variable value to reflect your UIM Domain name. 
2. Copy/paste the script into a new NAS Script window (NAS->Auto-Operator tab->Scripts subtab)
3. save the script with a name such as "checkrobots" and click OK to exit the script editor.
4. Create a new NAS Auto-Operator as in the screenshot below:

-type: script
-script chosen: the checkrobots script you just created
-select severity "informational" only
-On overdue age 1 min
-message field should match like this:    /.*robotcheck.*/

Save this profile and ensure it is enabled. This will cause the script to run 1 time, 1 minute after a matching alarm is received.

5. Now we will send an alarm to activate it.  Go back to the Scripts tab and create a new/empty script.  Paste the following line and execute it:

nimbus.alarm(1,"robotcheck",1,1)

Upon executing this you will see the creation of an alarm similar to the following:

Leave this alarm alone, do not acknowledge or close or assign it. One minute after this alarm is received, if it has not been touched, then the script will automatically run based on the AO profile we created in the previous step.

After 1-15 minutes (depending on environment size and status of robots) a file called "robotstatus.txt" will appear in the NAS folder  (Windows-  C:\Program Files(x86)\probes\service\nas ---- Linux- /opt/nimsoft/probes/service/nas) on the primary hub.   This will contain a list of robots which are in a problematic state.

Robots will be listed as one of the following status codes:

ATTN:  This means the robot is probably red/down in IM.  When getting a list of robots from the hub, the hub did not report an "OK" status for this robot.
ERROR: This means that the robot is listed as OK by the hub, but did not properly respond to the get_info callback.  This usually means the robot is green in IM but not communicative and often reprsents a firewall issue (the most common being that robot-->hub communication is allowed, but hub-->robot is blocked.)

 

Additional Information

NOTE: Remember that you will need to change the first line of the script to reflect the appropriate name for your UIM domain.

If you want a list of all robots (good and bad), look for the following line in the script directly below the domain entry:


   local loglevel = 0

Change to:

   local loglevel = 1

This will output all robots instead of just the ones which indicated some problem.

Attachments

1663091991495__robot_status_script_save_to_file.zip get_app