After the Load Data Warehouse runs, we found that some data is missing randomly. We noticed it happens more after a performance issue in our system that caused the Full load to process for several hours at a time. Can happen on Incremental as well.
We see missing data in the DWH tables, and the missing data is different each time. What could cause this behavior?
Missing assignments, time entries, tasks, projects, it all falls within the same root cause
Release : 15.x
Component : CA PPM DATA WAREHOUSE
Missing data in Data Warehouse can happen if the data was updated in the meantime whilst the job was running. This is due to the fact that last_updated_date is used to move all updated fields to DWH. When the job starts it takes the job start date, and compares to last_updated_date on the data it imports. Anything that's updated after the job start date will not get imported. This is by design.
So the scenario you are mentioned is possible if the Full was running very long time and someone modified a project = this project will not be in DWH. It could happen on a task, timesheet (any other data).
We have a feature created with Product Management to improve this behavior. To vote for it, please bring it up on the Innovation calls, this will help prioritize it.
To rectify this, the DWH job has to be run again and it would pick up that record, unless it was updated in the meantime again. Also possibilities to reduce the occurrence of this is to run the job outside of working hours to avoid updates and make sure you don't have any custom jobs or processes that update the objects and their last_updated_date exactly during the Load DWH job run. You may want to reschedule those at any other time.
If you are using DWH for custom reports and experience this missing data discrepancy often:
If you need real time data you can use the views which are on the Clarity database side, used to copy data from. Those will be all the views starting with DWH. I.e. for Assignments you may want to look in:
For tasks, projects, time entries: DWH_TASK_V, DWH_PROJECT_V, DWH_TIMEENTRY_V
When you have missing data check in those views and you should see that last_updated_date is during the job's run.
You have the following options: