Automatically validate hdb and spooler probes via script
Updated On:15-08-2018 11:55
CA Unified Infrastructure Management On-Premise (Nimsoft / UIM), NIMSOFT PROBES
The following script can verify any probe for which controller has generated an alarm like the following:
Jun 3 10:59:48:951  Controller: Probe 'spooler' FAILED to start, file check determines changes in the probe?
This type of issue is usually seen following a IP address or hardware change in the environment. There is a 'start up' order for probes on a robot. Some of them have dependencies on other probes. Occasionally, the probes will start out of order and cause issues like this. That is usually seen on systems with resource issues.
Release: Component: CAUIM
Step 1: Create script
In Nas auto operator->Scripts section, create a new script with following contents:
-- Start of script
al = alarm.list() -- Get alarm list
re = "%p%a+%d*_*%a*%d*%p" -- Regex to match probe name with alpha, numbers and underscore
if al ~= null then for i = 1,#al do if al[i].prid == "controller" then -- First, filter to get alarms from controller probe only
if string.match(al[i].message,"FAILED to start") then -- Second, filter to get controller alarms with specific text i-e "FAILED to start"
probe = string.gsub(string.match(al[i].message,re),"'","") -- Get probe name from alarm message and then remove quotes from probe name to use in probe_verify callback --print(al[i].message.."! Probe-> "..probe) -- View alarms with probe names which failed to start
-- Now run the probe_verify callbacks on each probe which FAILED to start
local args = pds.create() pds.putString(args,"name",probe) nimbus.request(addr,"probe_verify",args) nimbus.request(addr,"probe_activate",args) pds.delete(args) sleep (100) -- A little delay between each probe callback end end end end
-- End of script
Note: for troubleshooting the script the preceding '--' can be removed so it will return the name of the robot and probes. --print(al[i].message.."! Probe-> "..probe) -- View alarms with probe names which failed to start -- printf("/"..al[i].domain.."/"..al[i].hub.."/"..al[i].robot.."/".."controller".."<->Probe="..al[i].prid) -- Print Nimsoft address(es)
Step 2: Setup nas profile
Setup a nas auto operator profile with following settings:
a- matching criteria:
severity = major
probe = controller matching text = /.*Probe.*\sFAILED\sto\sstart.*/
b- Action type: script
Script = choose the script created in step 1 from the drop down list
c- Action Mode: On Overdue age with your desired time settings
Click Ok and Apply to save changes and restart nas probe. Now, the above settings will automatically verify probes if they failed to start due to cheksum change detection.