Looking for information regarding Replication Job Timeouts and Back off and retry of Replication Jobs
Replication Job Timeouts
Hierarchy replication rules when run are split out into replication jobs. These replication jobs will timeout and be aborted once the specified timeout limit is met.
There are two types of timeouts, Global and per individual rule:
- The global timeout exists within the CoreSettings.config file
It is called ‘ReplicationMaxJobTimeout’ with a default value of ‘172800’ (48 hours).
- The individual rule timeout exists within the XML of each replication rule
There is a tag within the XML called <maximumRunTime>172800</maximumRunTime> which is also a default of 48 hours.
Note: If the two timeout types are configured to different values, the timeout with the lesser value will be enforced.
When a replication job fails to run for whatever reason, the job will enter a back off retry mode. The job will retry after 1 minute, then 2 minutes, followed by 4 minutes and so on, doubling the last retry time up until the maximum back off time of 1024 minutes. This is incremental back off is controlled by the following core setting. "ReplicationJobBlackoutMultiplier"
If the replication job has been retrying for 24 hours or reaches the 1024 minute retry attempt, it will then abort the replication.