search cancel

CPU and Memory Utilization alarms set and clear within seconds in CA Spectrum

book

Article ID: 21370

calendar_today

Updated On:

Products

CA Spectrum

Issue/Introduction

When you monitor CPU or memory utilization in CA Spectrum, cpu and/or memory violation alarms may generate and clear in the same second, or generate and clear a few seconds later.

    Environment

    DX NetOps 20.2 or later Spectrum

    Cause

    The problem is that the poll interval and duration are set to be exactly the same interval.

    Resolution

    The solution is to increase the Duration value to be greater than the device polling interval (by at least 5-10 seconds). This will allow more than enough time for CA Spectrum to poll and process the info prior to generating the alarm.

    If the issue is still occurring, then the duration needs to be set to a greater value. When choosing a value that is much higher than default, ensure that it is NOT evenly divisible by the polling interval. So if the polling interval is 300 (default), then set duration to 610 or 615 for example. This will prevent a poll from falling exactly on the end of the duration time.

    Additional Information

    When CA Spectrum polls a given host and determines that the CPU is above the threshold CA Spectrum will generate an event (0x10f07) then kick off a timer (duration attribute 0x12bce) and if the normal event (0x10f08) isn't received within the defined duration then an alarm will be generated. In this case the alarm is being generated as the poll is initiated. The poll is then finding that CPU is now below the threshold (normal) and the clear event (0x10f08) is generated within seconds of the alarm being generated.

    The EventDisp entries define the process:

    0x00010f07 R Aprisma.EventPairTimeAttr, 0x00010f08, "0x00010f09 -:-", 0x12bce
    • 0x10f07 is generated when the CPU threshold is crossed. If the reset event, 0x10f08 isn't received within the time specified in the Duration attribute (0x12bce) then generate the alarm event, 0x10f09
    0x00010f09 E 50 A 2,0x00010f09,N
    0x00010f08 E 50 C 0x00010f09

    The Duration attribute, 0x12bce, is currently set to 300 seconds.

    The actual event sequence can be seen below. The event indicating that CPU utilization is above the threshold is generated at 6:58:29 (0x10f07). At 7:03:29, 5 minutes later the alarm event (0x10f09) is generated. At the same second, but after the 0x10f09 event has been processed, the reset event (0x10f08) is generated, clearing the previously generated alarm.

    1. Event Time: Sep 19, 2011 6:58:29 AM PDT
      Model Name: host01.ca.com
      Event Message: High Aggregate CPU Utilization.
      The average CPU Utilization of 89% for all CPU instances exceeds the 85% threshold on model host01.ca.com
      Event Type: 0x10f07

    2. Severity: Major
      Event Time: Sep 19, 2011 7:03:29 AM PDT
      Clear Time: Sep 19, 2011 7:03:29 AM PDT
      Model Name: host01.ca.com
      Event Message: High Aggregate CPU Utilization.
      The average CPU Utilization of 89% for all CPU instances has exceeded the 85% threshold on model host01.ca.com for more than the acceptable time period.
      Event Type: 0x10f09

    3. Event Time: Sep 19, 2011 7:03:29 AM PDT
      Model Name: host01.ca.com
      Event Message: Normal Aggregate CPU Utilization.
      The average CPU Utilization for all CPU instances is now below the % reset threshold for model host01.ca.com
      Event Type: 0x10f08