Automatically validate hdb and spooler probes via script


Article ID: 34372


Updated On:


DX Infrastructure Management NIMSOFT PROBES


The following script can verify any probe for which controller has generated an alarm like the following:
Jun 3 10:59:48:951 [3086726848] Controller: Probe 'spooler' FAILED to start, file check determines changes in the probe?
This type of issue is usually seen following a IP address or hardware change in the environment.
There is a 'start up' order for probes on a robot. Some of them have dependencies on other probes.
Occasionally, the probes will start out of order and cause issues like this. That is usually seen on systems with resource issues.


Component: CAUIM


Step 1: Create script
In Nas auto operator->Scripts section, create a new script with following contents:
-- Start of script
al = alarm.list() -- Get alarm list

re = "%p%a+%d*_*%a*%d*%p" -- Regex to match probe name with alpha, numbers and underscore

if al ~= null then
for i = 1,#al do
if al[i].prid == "controller" then -- First, filter to get alarms from controller probe only

if string.match(al[i].message,"FAILED to start") then -- Second, filter to get controller alarms with specific text i-e "FAILED to start"

probe = string.gsub(string.match(al[i].message,re),"'","") -- Get probe name from alarm message and then remove quotes from probe name to use in probe_verify callback
--print(al[i].message.."! Probe-> "..probe) -- View alarms with probe names which failed to start

addr = "/"[i].domain.."/"[i].hub.."/"[i].robot.."/".."controller" -- Build Nimsoft address
-- printf("/"[i].domain.."/"[i].hub.."/"[i].robot.."/".."controller".."<->Probe="[i].prid) -- Print Nimsoft address(es)

-- Now run the probe_verify callbacks on each probe which FAILED to start

local args = pds.create()
sleep (100) -- A little delay between each probe callback
-- End of script

Note: for troubleshooting the script the preceding '--' can be removed so it will return the name of the robot and probes. 
--print(al[i].message.."! Probe-> "..probe) -- View alarms with probe names which failed to start
-- printf("/"[i].domain.."/"[i].hub.."/"[i].robot.."/".."controller".."<->Probe="[i].prid) -- Print Nimsoft address(es)
Step 2: Setup nas profile
Setup a nas auto operator profile with following settings:
a- matching criteria:
severity = major
probe = controller
matching text = /.*Probe.*\sFAILED\sto\sstart.*/
b- Action type: script
Script = choose the script created in step 1 from the drop down list
c- Action Mode: On Overdue age with your desired time settings
Click Ok and Apply to save changes and restart nas probe. Now, the above settings will automatically verify probes if they failed to start due to cheksum change detection.