spectrumgtw stops syncing alarms intermittently - OneClick shows Contact Lost and timeout in ems
search cancel

spectrumgtw stops syncing alarms intermittently - OneClick shows Contact Lost and timeout in ems

book

Article ID: 205016

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) Unified Infrastructure Management for Mainframe CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

Spectrumgtw probe, version 8.68+ stops syncing alarms intermittently - OneClick shows Contact Lost and timeout with ems probe.

A restart of the spectrumgtw probe fixes the issue for a short period of time

On the Spectrum side:

Contact lost with CA UIMs Spectrum Gateway Probe (Spectrumgtw)

In the spectrumgtw/logs/alarm.log file:

[EmsServiceTherad] [awaitingResponse] - Timeout happened while calling EMS Service, no response from EMS Service
[EmsServiceTherad] [getClosedAlarms] - getClosedAlarms - EMS connection failedCount : ###
[EmsServiceTherad] [awaitingResponse] - Ems timeout interval : 15 seconds

Note: The messages listed above can be used to pinpoint the time the problem first starts. This can be helpful to understand if the problem is related to something like an OS level event like a snapshot or backup of the probe machine.

Environment

Release : Any
Component : ems, spectrumgtw

Cause

- timeout setting
- OS level event

Resolution

Update the following setting in the spectrumgtw probe:

   Setup -> alarm -> Ems_Alarm_Wait_Timeout_Interval = 30

Note: The spectrumgtw Ems_Alarm_Wait_Timeout_Interval is set to 15 seconds by default.

You may need to increase this based on what you are seeing in the ..spectrumgtw/logs/alarm.log looking for:

   [UimAlarmConsumer] [logMethodTime] - getClosedAlarms(long) : took 51.68s
To see this in the alarm.log file you will need to turn on debugging temporarily:

- edit ..spectrumgtw/log4j2.properties, update the following:
 
#alarmLogger is for alarm.log
logger.alarmLogger.level = debug
 
- save, and deactivate/activate the probe

You may want to run the probe with alarm debug on for a day or two and you can see if there are higher time lags at certain parts of the day. This may indicate there is another outside process during that time that may also contribute to issues.

Additional Information

It should be noted that if getClosedAlarms(long) is taking a long time to finish this could be an indication that there are excessive alarms in the database.

When a problem happens, it may be necessary to review the ems.log to see if there are errors there. A restart of ems may be needed in some cases with stopping of alarm flow.