Where in the logs we can see the times when the Thresholding State goes to Degraded or Shutdown state as seen here?
NetOps Performance Management : All releases
There are messages in the Data Aggregator karaf.log (default location /opt/IMDataAggregator/apache-karaf-4.3.8/data/log) when the Thresholding state changes to Degraded or Shutdown. An event is also generated for the same.
For example:
WARN | ory-thread-34559 | 2024-07-02T02:29:30,174 | onitoringProcessLimitManagerImpl | onitoringProcessLimitManagerImpl 98 | .ca.im.aggregator.loader | | Threshold Monitoring processing took too long. The system will shut that feature down in 15 minutes if the threshold monitoring continues to exceed capcacity
ERROR | Watcher-thread-8 | 2024-07-02T02:44:30,175 | onitoringProcessLimitManagerImpl | onitoringProcessLimitManagerImpl 133 | .ca.im.aggregator.loader | | Threshold Monitoring has exceeded capacity for too long, shutting it down...
INFO | ory-thread-34566 | 2024-07-02T02:44:30,199 | AlarmStarter | gregator.alarm.impl.AlarmStarter 33 | .ca.im.aggregator.loader | | Disabling Threshold Evaluation globally
INFO | ory-thread-34564 | 2024-07-02T02:44:30,200 | dMonitoringSystemLogNotifierImpl | dMonitoringSystemLogNotifierImpl 48 | .ca.im.aggregator.loader | | Event 4.StateSystemShutdown was generated
We can resume Threshold Evaluations via REST following the steps from the documentation:
Resume Threshold Evaluations
Can also use the steps from this KB:
Resume threshold monitoring evaluations from the command line