Performance Management is experience a high level of stress in the Threshold Monitoring space. The Threshold Monitoring engine was degraded a few times. Had to turn off the limiter function to maintain Event functionality.
The system is polling a large number of items at both 1 minute and 5 minute rates.
The System Health Event reports show a high percentage of "Percentage of Poll Cycle to Complete Event Processing" metrics.
All supported Performance Management releases
Low CPU speeds with many cores on the Data Repository nodes.
Event Threshold Evaluations take place on the Data Repository database server. This is CPU speed intensive work.
In this instance, less CPU cores with higher speeds is required over more CPU cores with lower speeds.
The CPU speeds in this scenario were also below requirements stated in the Performance Management Sizing tool. There we see the minimum speed required for a Data Repository server is 2.6 GHz.
CPU speeds in this scenario were 2.3 GHz.
Raising the CPU speeds to 2.6 GHz resolved the problem.
To determine the correct CPU requirements perform a sizing for the system using the Performance Management Sizing tool. The speed and core count requirements will adjust as needed based on the numbers entered to the sizing tool.
The Performance Management Sizing tool can be found here: