Threshold profile setting [Event rules]
Metric family: Data Aggregator rollup calculation time Metric: number of completed rollup processes Duration (seconds): 300 seconds Window (seconds): 300 seconds Importance: Major
[Conditions for Violation]
Event type: Violation Metric: number of completed rollup processes Threshold: <= 0.0 "less than or equal to" Condition type: fixed value
[Conditions for clearing the violation]
Event type: clear Metric: number of completed rollup processes Threshold: > 0.0 "greater than" Condition type: fixed value
However, it doesn't seem to work.
DX NetOps CAPM all currently supported releases
This is working as intended.
Rollups are run hourly, so the data is hourly timestamps. It's running between 21 and 40 rollups an hour. All those values (between 21 and 40) are > 0.0, so no events are created. Which is why no alarms are generated.
Thresholding only works of the metric data collected. Rollups self monitoring is metric data. So when rollups are run, CAPM marks marks them in the DB to indicate they were run. However, if it doesn't run rollups for a Metric Family (MF), it doesn't record self monitoring data for it. Which is the case here.
CAPM doesn't cycle all metric families every rollup period to see if it has data to rollup. It goes based off end of cycle messages in the AMQ Rollup_<MF> queues. So at this time, there is no real way to determine if rollups are not running for a MF from a monitoring point of view.
To do this, you would probably need to run a script that does a vsql query on the DR (Vertica) every hour and queries various _ltd (hourly), or _eqd (daily) tables for each MF and if the latest timestamp is more than say 2-3 hrs old, send an email or something to notify administrators.
You can view the Health Monitoring data as per:
TechDocs : DX NetOps CAPM 24.3 : View Health Monitoring Information
Under there, view the last 24 hours worth of data and check the rollups.