Question:
How to check if missing CEM data is just due to an Hourly Stats Aggregation problem and how to minimise the data loss?
Answers:
How to check if only hourly aggregation has a problem:
1. New CEM data has stopped being added to Reports/Analysis Graphs in the UI e.g. timeframes Custom Hour/Today do not show any data but longer Timeframes do show data
2. The TCs EM log may contain "Select failed" SQL Exception messages for ts_st_* relations (Postgres) or ts_st* partitions (Oracle error ORA-02149) that do not exist. However that symptom is not mandatory as the processing of the TIM stats files prior to the aggregation step may have stopped for some unknown reason.
3. The SAS EM log has no "Select failed" SQL Exception messages so there are no partition problems with Daily Aggregation
4. If the problem started earlier than today or the SAS EM has been restarted today the SAS EM log may shows a series of these messages
12:02:01.161 AM EST [INFO] [DailyAggregation.Thread1] [Manager.com.timestock.tess.services.processors.StatsAggregator] Last hourly aggregation is still running. Waiting 120 secs before next check. Daily aggregation won't start before hourly aggregation is completed ...
ending here:
04:00:03.700 AM EST [WARN] [DailyAggregation.Thread1] [Manager.com.timestock.tess.services.processors.StatsAggregator] Last hourly aggregation did not complete after 14400 seconds. Daily aggregation will retry tomorrow show data. ERROR: Select failed for ts_st_ts_us_int
CA Application Performance Management 10.7
How to minimise data loss:
1. If #2 is confirmed follow the steps in this KB Article KB48152 ("Missing data in CEM Analysis Graphs and Reports for Timeframe "Today", "Custom Hour", "Previous Hour", but other timeframes based on Day, Week, Month do show data. ERROR: Select failed for ts_st_ts_us_int")
2. Otherwise restart the Tim Colllection Service EM to see if that alone will resolve the problem.
In either case DO NOT restart the Stats Aggregation Service AS EM as that will cause Daily Aggregation to run immediately and potentially before the catch up of stats files from the TIM has completed causing a loss of aggregated data. So let the Stats Aggregation Service EM run to its next scheduled Daily Aggregation just after midnight.