Symptoms:
After time-zone change on APM EM cluster:
Cause:
Changing time-zone on the APM EMs caused Daily Aggregation to fail on the EM running Stats Aggregation Service.
The data that is still available is provided by Hourly Aggregation which is run as part of the TIM Collection Service
There will be 2 types of SQL error in the Stats Aggregation EM log:
1. "Select failed for ts_st_ts_us_int partition (ts_st_ts_us_int_yyyymmdd)
java.sql.SQLSyntaxErrorException: ORA-02149: Specified partition does not exist
...
Unexpected exception in daily aggregation
com.timestock.common.exceptions.TsUnexpectedDatabaseErrorException: Unexpected database error: could not execute query"
2. "Insert/updates failed for table com.timestock.tess.data.objects.StatsTranSetUserMonthly
java.sql.SQLException: ORA-14401: inserted partition key is outside specified partition"
Resolution:
The symptoms are a variation of the Last Aggregated Row (LAR)/Daylight Savings (DST) problem (ORA-14074) documented in this existing KB Article TEC610521 ("A Guide to Solving Common Stats and Defects Aggregation Problems"). In this case the Daily Aggregation is failing because of 2 problems:
1. The LAR dates on some of the tables are older than the date of the oldest available base partition
2. The partitions for month mm (yyyymmdd) in the ORA-02149 message have been created at the start of the previous month and so they have the previous time-zone set for their HIGH_VALUE timestamp. As a result of the time-zone change data is now being inserted which is newer than that old timestamp.
The Steps to resolve are:
1. Stop the EM running the Stats Aggregation Service
2. Using the SQL in TEC610521, update the LAR date for tables where the LAR date is older than the date of the oldest available base partition.
3. There are 9 monthly tables
'TS_ST_TS_ALL_MLY'
'TS_ST_TS_US_MLY'
'TS_ST_TS_USGRP_MLY'
'TS_ST_TSGRP_ALL_MLY'
'TS_ST_TSGRP_US_MLY'
'TS_ST_TSGRP_USGRP_MLY'
'TS_ST_TU_ALL_MLY'
'TS_ST_TU_US_MLY'
'TS_ST_TU_USGRP_MLY'
4. Each of those table's partitions for the month mm (yyyymmdd) in the ORA-02149 message will have the old time-zone set for the HIGH_VALUE column which can be verified with this SQL:
select partition_name, high_value, partition_position from user_tab_partitions where table_name = 'TS_ST_TS_ALL_MLY' order by partition_position;
5. As an example if yyyymmdd is 20151210 then we are dealing with December 2015 monthly partitions. Check that the December _mly partitions and any later partitions have no data e.g. if the latest _mly partition is March check the counts for all monthly partitions between December & March for each table e.g.
select count (*) from TS_ST_TS_ALL_MLY partition (TS_ST_TS_ALL_MLY_20151201);
select count (*) from TS_ST_TS_ALL_MLY partition (TS_ST_TS_ALL_MLY_20160101);
select count (*) from TS_ST_TS_ALL_MLY partition (TS_ST_TS_ALL_MLY_20160201);
select count (*) from TS_ST_TS_ALL_MLY partition (TS_ST_TS_ALL_MLY_20160301);
5. Assuming there is no data in any of the _mly partitions between those months drop all of them. In the above example the 4 partitions between December & March need to be dropped for each _mly table (in this case 36 partitions in total).
6. The Stats Aggrenation EM will recreate the dropped partitions after restart
7. The Daily Aggregation should run with this log message
"Daily aggregation will now start"
and eventually complete successfully with this log message:
"Daily aggregation completed"
Daily Aggregation could take a long time to run because of the data catch-up and you may see some non-fatal ORA- ERRORS while it is running but they are to be expected and only if this message is observed does that indicate it has failed to complete:
"Unexpected exception in daily aggregation"
Additional Information:
Due to the complexity of this problem assistance may be needed from CA Support in which case please raise a new support case.