in our production environment, we are noticing a strange behavior.
We are monitoring numerous services on remote devices using the RSP probe.
Now, on some of these devices, these services are permanently down (at the moment). RSP is configured to raise an alarm if a service is down, and the alarms are being raised (good!).
The problem is, that these alarms get cleared always after about 30-35 minutes automatically by the robot (see how the robot icon is already green?).
If we wait another 5 minutes, a new alarm will have been raised for this issue, and it will be cleared automatically again after roughly 30 minutes by a clear alarm. (We are NOT clearing this alarm, check the attached spreadsheet for proof of the clear alarm issued by the robot).
Attached is an Excel spreadsheet with the NAS_TRANSACTION_SUMMARY and the NAS_TRANSACTION_LOG for just this service on this device.
Please note: This problem is not limited to only this service, it appears for other services on the same device too. This problem is also not limited to this device, we observe it on other devices too.
We are running:
robot version 9.20HF13
rsp version 5.35
Please advise on how to resolve this issue.
This is a massive problem for us because each new alarm creates a new ticket in our Incident Management system, which means that we have a much higher number of tickets being created than necessary.