DWH Full and Incremental causes missing data when data is updated while job is running
search cancel

DWH Full and Incremental causes missing data when data is updated while job is running

book

Article ID: 207222

calendar_today

Updated On:

Products

Clarity PPM SaaS Clarity PPM On Premise

Issue/Introduction

After the Load Data Warehouse (DWH) runs, some data is missing randomly.  It happens more after a performance issue in the system that causes the Full load to process for several hours at a time. This issue can happen on Incremental as well.

You see missing data in the DWH tables, and the missing data is different each time.  

Environment

Release : All Supported

Component : CLARITY DATA WAREHOUSE

Cause

The following all fails with the same root cause:

  • Missing assignments
  • Time entries
  • Tasks
  • Projects
  • etc.

Missing data in Data Warehouse can happen if the data was updated while the job was running. This is due to the fact that last_updated_date is used to move all updated fields to DWH. When the job start,s it takes the job start date, and compares to the last_updated_date on the data it imports. Anything that's updated after the job start date will not get imported. This is by design.

So, the scenario mentioned is possible if the Full was running for a very long time and someone modified a project. Then this project will not be in the DWH. It could happen on a task, timesheet (any other data).

Resolution

Run the Load DWH job again to pick up the record

To reduce the occurrence of this issue:

  1. Run the job outside of working hours to avoid updates
  2. Make sure you don't have any custom jobs or processes that update the objects and their last_updated_date is exactly during the Load DWH job run. You may want to reschedule those at any other time.

Note: To request a change in the design, you can bring this up on the monthly innovation calls with product management or submit an enhancement.  See Enhancement Requests for Clarity

If you are using the DWH for custom reports and experience this missing data discrepancy often:

If you need real-time data you can use the views which are on the Clarity database side, used to copy data from. Those will be all the views starting with DWH. Examples:

  • For Assignments, you may want to look in: DWH_ASSIGNMENT_V
  • For tasks, projects, and time entries: DWH_TASK_V, DWH_PROJECT_V, DWH_TIMEENTRY_V

When you have missing data, check in those views and you should see that last_updated_date is during the job's run.

You have the following options:

  • Be aware that data might be missing and leave things as is
  • Use the Clarity views to report on your data instead of DWH

Additional Information

See also Load DWH job frequently reported issues