The following script can verify any probe for which controller has generated an alarm like the following:
Controller: Probe 'spooler' FAILED to start, file check determines changes in the probe
-- Start of script
al = alarm.list() -- Get alarm list
re = "%p%a+%d*_*%a*%d*%p" -- Regex to match probe name with alpha, numbers and underscore
if al ~= null then
for i = 1,#al do
if al[i].prid == "controller" then -- First, filter to get alarms from controller probe only
if string.match(al[i].message,"FAILED to start") then -- Second, filter to get controller alarms with specific text i-e "FAILED to start"
probe = string.gsub(string.match(al[i].message,re),"'","") -- Get probe name from alarm message and then remove quotes from probe name to use in probe_verify callback
--print(al[i].message.."! Probe-> "..probe) -- View alarms with probe names which failed to start
addr = "/"..al[i].domain.."/"..al[i].hub.."/"..al[i].robot.."/".."controller" -- Build Nimsoft address
-- printf("/"..al[i].domain.."/"..al[i].hub.."/"..al[i].robot.."/".."controller".."<->Probe="..al[i].prid) -- Print Nimsoft address(es)
-- Now run the probe_verify callbacks on each probe which FAILED to start
local args = pds.create()
pds.putString(args,"name",probe)
nimbus.request(addr,"probe_verify",args)
nimbus.request(addr,"probe_activate",args)
pds.delete(args)
sleep (100) -- A little delay between each probe callback
end
end
end
end
-- End of script
Note: for troubleshooting the script the preceding '--' can be removed so it will return the name of the robot and probes.
--print(al[i].message.."! Probe-> "..probe) -- View alarms with probe names which failed to start
-- printf("/"..al[i].domain.."/"..al[i].hub.."/"..al[i].robot.."/".."controller".."<->Probe="..al[i].prid) -- Print Nimsoft address(es)
IMPORTANT:
These alarms only generate upon initial failure, so if the alarms are acknowledged they wont be available/regenerated. Therefore, to ensure the script can leverage the use of the alarms as a trigger for the script to be run for each probe in an error state on the robot(s), if you used a probe package to reconfigure your robots to point to new hubs, you can simply redeploy the probe/robot_update package deployment that caused the original alarms to the subset of robots throwing the Controller: Probe '<probe_name>' FAILED to start, file check determines changes in the probe alarms.
Tip: If you enable the Group node in IM via Tools->Options and create an Infrastructure Group that includes the robots where some probes are throwing the alarm mentioned above, you can deploy your robot_update package only to those robots to force the Probe FAILED... alarms to regenerate if you had already accidentally acknowledged them. Then the script will pick up on those alarms and the script will be triggered and repair those probes and the alarms will clear within a 5 or so minutes.
The below script stub is an alternate/simplified version of this script which can be put in place on an Auto-Operator, e.g. "On arrival", or (for more reliability) "Overdue age 5m" and will automatically respond to such an alarm by extracting the robot and probe name and validating the probe: