Description:
This article covers EM Performance and the Harvest Cycle.
Solution:
Q: What is the work being done by the harvesting process with respect to MOM? Is the high harvest capacity causing calculators to run with high harvest time?
A: It is the reverse: high calculator harvest time leads to high harvest duration resulting in much of the harvest capacity being used. The target is for Harvest Duration (and SmartStor Duration, too) to stay well below 3.5 seconds to satisfy ad-hoc queries.
The harvest cycle is a complex real-time loop with many dependencies: Management Module calculators, Javascript calculators, built-in calculators, alert processing, etc.
Some of it is broken down in the metrics under Internal and Internal|Harvest. One Javascript calculator can raise the harvest time to above 10s resulting in many aggregated time slices and a totally unresponsive EM cluster. To check this you can remove all JS calculators from the scripts. Consequently, if harvest and calculator time drop significantly, then add them back one by one to identify the culprit(s). It could also be too many queries, e.g. from integrations.
Below are probable causes of EM performance degradation and the corresponding indicator metrics:
Agents Sending Too Many Metrics and/or Leaks.
Calculators/Alerts matching too many metrics.
Too Many Ongoing Queries or Transaction Trace Events.
Too Many Broad Historical Queries.