OneClick is missing "DEVICE HAS STOPPED RESPONDING TO POLLS" alarms in the Alarm Tab
search cancel

OneClick is missing "DEVICE HAS STOPPED RESPONDING TO POLLS" alarms in the Alarm Tab

book

Article ID: 45109

calendar_today

Updated On:

Products

CA Spectrum DX NetOps

Issue/Introduction

We have noticed that Spectrum randomly does not display "DEVICE HAS STOPPED RESPONDING TO POLLS" for some of our very important routers. We see the device model in a critical state, and the device shows the "Contact Status" Lost. All indications are there should be an alarm displayed for the device model, however it is not seen in the Alarm Tab of OneClick.

Environment

Release: Any

Cause

If you are using the Enterprise VPN Manager, or VPN Manager applications in OneClick, then you will likely see this behavior. When using Enterprise VPN Manager or the VPN Manager applications, Spectrum fault isolation intelligence will make any device level Contact Lost alarms symptoms of an associated Provider_Cloud model, when we detect that the Provider_Cloud itself is at fault.

Resolution

Spectrum uses the port polling and BGP peer session polling to update the status of the Provider_Cloud model. By default the Provider_Cloud model will assert a minor alarm against the Provider_Cloud if Spectrum detects 1% failure rate for connected customer edge (CE) devices/interfaces. Spectrum will assert a major alarm against the Provider_Cloud model if 3% of the connected CE devices/interfaces are down, and a critical alarm if 5% or more of the CE devices/interfaces are down. When an alarm is raised against the Provider_Cloud model, Spectrum's fault isolation makes the associated "DEVICE HAS STOPPED RESPONDING TO POLLS" alarms a symptom of the Provider_Cloud alarm, since the condition of the Provider Could is typically the root cause.

 

To determine if this is what is causing your alarm to not be seen in OneClick, select the device model that is missing the alarm. Navigate to the Event tab and filter the events to the time you expected to see the "DEVICE HAS STOPPED RESPONDING TO POLLS" alarm should have been seen. If you see a similar event logged, then the "DEVICE HAS STOPPED RESPONDING TO POLLS" alarm (Probable Cause 0x10009) has been made a symptom of the "PROVIDER CONDITION" alarm, and therefore has been hidden form the Alarm Tab.

 

Alarm xxxx (critical, probable cause id 0x10009) in now being hidden because it is caused by alarm yyyy (critical, probable cause id 0x5180407) on model 0xnnnnnn).

 

There are three out-of-the-box PROVIDER CONDITION alarms:

  • Probable Cause ID 0x5180405 - PROVIDER CONDITION IS MINOR
  • Probable Cause ID 0x5180406 - PROVIDER CONDITION IS MAJOR
  • Probable Cause ID 0x5180407 - PROVIDER CONDITION IS CRITICAL

 

If these "hidden" alarms are becoming a problem for you you may want to review the "Service Monitoring" section of the documentation and set the thresholds to a more useful value for your environment, so that a more reasonable number of CE devices are down before we mark the Provider_Cloud as being at fault. 

 

You may also want to adjust the Alarm Filter options for the users to show symptom alarms in the Alarm Tab. This can be done in OneClick by opening the Alarm Filter dialog widow, select the State Tab and set "Symptoms" to "Show Symptoms".

 

You could also change the alarm in Event Configuration mapping, so that the minor and major event (0x5180405, 0x5180406) will generate a critical alarm (if that's what you desire).