Spectrumgtw probe, version 8.68+ stops syncing alarms intermittently - OneClick shows Contact Lost and timeout with ems probe.
A restart of the spectrumgtw probe fixes the issue for a short period of time
On the Spectrum side:
Contact lost with CA UIMs Spectrum Gateway Probe (Spectrumgtw)
In the spectrumgtw/logs/alarm.log file:
[EmsServiceTherad] [awaitingResponse] - Timeout happened while calling EMS Service, no response from EMS Service
[EmsServiceTherad] [getClosedAlarms] - getClosedAlarms - EMS connection failedCount : ###
[EmsServiceTherad] [awaitingResponse] - Ems timeout interval : 15 seconds
Note: The messages listed above can be used to pinpoint the time the problem first starts. This can be helpful to understand if the problem is related to something like an OS level event like a snapshot or backup of the probe machine.
Release : Any
Component : ems, spectrumgtw
- timeout setting
- OS level event
Update the following setting in the spectrumgtw probe:
Setup -> alarm -> Ems_Alarm_Wait_Timeout_Interval = 30
Note: The spectrumgtw Ems_Alarm_Wait_Timeout_Interval is set to 15 seconds by default.
You may need to increase this based on what you are seeing in the ..spectrumgtw/logs/alarm.log looking for:
[UimAlarmConsumer] [logMethodTime] - getClosedAlarms(long) : took 51.68s
It should be noted that if getClosedAlarms(long) is taking a long time to finish this could be an indication that there are excessive alarms in the database.
When a problem happens, it may be necessary to review the ems.log to see if there are errors there. A restart of ems may be needed in some cases with stopping of alarm flow.