Spectrumgtw probe, version 8.68+ stops syncing alarms intermittently - OneClick shows Contact Lost and timeout with ems probe.
A restart of the spectrumgtw probe fixes the issue for a short period of time
On the Spectrum side:
Contact lost with CA UIMs Spectrum Gateway Probe (Spectrumgtw)
In the spectrumgtw/logs/alarm.log file:
[EmsServiceTherad] [awaitingResponse] - Timeout happened while calling EMS Service, no response from EMS Service
[EmsServiceTherad] [getClosedAlarms] - getClosedAlarms - EMS connection failedCount : ###
[EmsServiceTherad] [awaitingResponse] - Ems timeout interval : 15 seconds
Note: The messages listed above can be used to pinpoint the time the problem first starts. This can be helpful to understand if the problem is related to something like an OS level event like a snapshot or backup of the probe machine.
Release : Any
Component : ems, spectrumgtw
- timeout setting
- OS level event
Update the following setting in the spectrumgtw probe:
Setup -> alarm -> Ems_Alarm_Wait_Timeout_Interval = 30
Note: The spectrumgtw Ems_Alarm_Wait_Timeout_Interval is set to 15 seconds by default.
You may need to increase this based on what you are seeing in the ..spectrumgtw/logs/alarm.log looking for:
[UimAlarmConsumer] [logMethodTime] - getClosedAlarms(long) : took 51.68s
It should be noted that if getClosedAlarms(long) is taking a long time to finish this could be an indication that there are excessive alarms in the database.
When a problem happens, it may be necessary to review the ems.log to see if there are errors there. A restart of ems may be needed in some cases with stopping of alarm flow.
Alarm load changes can be reviewed with the following queries:
Total closed (by month):
SELECT
FORMAT(closed, 'yyyy-MM') AS Closed_Month,
COUNT(*) AS Total_Alarms
FROM NAS_TRANSACTION_SUMMARY
WHERE closed IS NOT NULL
GROUP BY FORMAT(closed, 'yyyy-MM')
ORDER BY Closed_Month DESC;
Total closed (by week):
SELECT
DATEADD(WEEK, DATEDIFF(WEEK, 0, closed), 0) AS Week_Starting_Monday,
COUNT(*) AS Total_Alarms
FROM NAS_TRANSACTION_SUMMARY
WHERE closed IS NOT NULL
GROUP BY DATEADD(WEEK, DATEDIFF(WEEK, 0, closed), 0)
ORDER BY Week_Starting_Monday DESC;
Total closed (by robot):
SELECT
robot,
hub,
COUNT(*) AS Total_Closed_Alarms
FROM NAS_TRANSACTION_SUMMARY
WHERE closed IS NOT NULL
GROUP BY robot, hub
ORDER BY Total_Closed_Alarms DESC;
Total closed (by robot/probe):
SELECT
hub,
robot,
prid AS Probe,
COUNT(*) AS Total_Closed_Alarms
FROM NAS_TRANSACTION_SUMMARY
WHERE closed IS NOT NULL
GROUP BY hub, robot, prid
ORDER BY Total_Closed_Alarms DESC;