After a large number of investments are updated, e.g. 20K are updated, the subsequent incremental Load DWH job runs 3x longer.
A full Load DWH run can be done, but this disrupts the reporting as a full run truncates the tables.
Although a full run is faster, the unavailability/repopulation of DWH tables reduce/delay visibility into the data.
STEPS TO REPRODUCE:
1. On a large dataset, simulate the update of 20K investments.
2. Run the incremental Load DWH job.
3. Run the full Load DWH job.
4. Compare the runtimes of the incremental vs the full run.
Actual: The incremental run can take 3x as long as the full run.
Expected: The incremental should not take much more time.
In reviewing the incremental run, the Data Warehouse portlet was reviewed and sorted by Duration.
The most time was spent on the following tables:
DWH_INV_INVESTMENT
DWH_INV_INVESTMENT_LN
This is reported as DE116305 and it was due to the Rally Integration.
The reason for the long runtime of the incremental Load DWH job is due to:
a. Two investment attributes (hig_invcategory,hig_rallyproject)
hig_invcategory
uses Lookup=Rally Investment Categories
hig_rallyproject
uses Lookup=Rally Projects HIG
b. When these 2 are deactivated, the job runs in only ~10 minutes vs ~60 minutes.
Do not run the Load DWH incrementally.