"Assuming task failed, because it's status did not update in a timely fashion" error when configuring replications in Cloud Director Availability 4.x

Products

VMware Cloud Director

Issue/Introduction

Symptoms:

When configuring a new outgoing replication, you see the following error in the Replications Tasks view of the Cloud Director Availability Portal:

Assuming task 'b2763219-0fb8-####-####-########c69' failed, because it's status did not update in a timely fashion.

In /opt/vmware/h4/cloud/log/cloud.log on the Cloud Director Replication Management Appliance on the recovery site, you see a similar message:

2019-04-11 02:13:24.884 DEBUG - [UI_/plugins/Vk13YXJl/h4/outgoing-replications/Provider_Site/vapp_f57cb497-ddda-####-####-########fdd_K1_4t] [job-3] com.vmware.h4.jobengine.JobEngine        : Suspending execution for task 9a347713-2860-####-####-########73c
2019-04-11 02:13:24.887 DEBUG - [UI_/plugins/Vk13YXJl/h4/outgoing-replications/Provider_Site/vapp_f57cb497-ddda-####-####-########fdd_K1_4t] [job-3] com.vmware.h4.jobengine.JobEngine        : Suspending execution for task 379f7132-b613-431e-a9a7-ce4e2c32e5d0
2019-04-11 02:13:24.978 WARN - [192edc41-034e-####-####-########18b] [c4-scheduler-2] com.vmware.task.rest.client.TaskMonitor : Task b2763219-0fb8-####-####-########c69 has timed out (it hasn't been updated in 60000 msec)
2019-04-11 02:13:24.983 ERROR - [UI_/plugins/Vk13YXJl/h4/outgoing-replications/Provider_Site/vapp_f57cb497-ddda-####-####-########fdd_K1_4t] [c4-scheduler-2] com.vmware.h4.jobengine.JobExecution     : Task 9a347713-2860-####-####-########273c (WorkflowInfo{type='start', resourceType='vmReplication', resourceId='C4-0c68766f-9ad5-####-####-########c03', isPrivate=false, resourceName='null'}) has failed
com.vmware.vdr.error.exceptions.TaskMonitoringTimeOutException: Assuming task 'b2763219-0fb8-408f-b3a7-1cbf4f110c69' failed, because it's status did not update in a timely fashion.
        at sun.reflect.GeneratedConstructorAccessor146.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at com.vmware.h4.exceptions.GenericServerExceptionProvider.get(GenericServerExceptionProvider.java:70)
        at com.vmware.h4.exceptions.GenericServerExceptionProvider.get(GenericServerExceptionProvider.java:47)
        at com.vmware.h4.common.task.H4ApiTaskToTaskConverter.toTask(H4ApiTaskToTaskConverter.java:33)
        at com.vmware.task.rest.client.TaskMonitor.lambda$workImpl$0(TaskMonitor.java:189)
        at com.vmware.task.rest.client.TaskMonitor.notifyListener(TaskMonitor.java:211)
        at com.vmware.task.rest.client.TaskMonitor.workImpl(TaskMonitor.java:188)
        at com.vmware.task.rest.client.TaskMonitor.work(TaskMonitor.java:120)
        at com.vmware.task.rest.client.TaskMonitorService.lambda$taskMonitor$0(TaskMonitorService.java:65)
        at com.vmware.h4.common.mdc.MDCRunnableWrapper.run(MDCRunnableWrapper.java:30)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware Cloud Director Availability 4.x

Cause

This issue can occur when there is a time drift between the Cloud Director Availability components on the protected and recovery sites.

Resolution

To resolve this issue, configure the following time settings across all sites:

In each cloud site, ensure the Tunnel Appliance, Cloud Director Replication Management Appliance, Replicator Appliance(s), Cloud Director cells, vCenter Server(s), Platform Services Controller, and ESXi Hosts have their times synced to the same NTP source.
In each on-premises site, ensure the Cloud Director Availability On-Premises Appliance, vCenter Server, Platform Services Controller, and ESXi Hosts have their times synced to the same NTP source.
Ensure that all sites have their times in sync with each other by either synchronising to the same NTP source or having each site's own NTP server further synchronise to the same upstream NTP source.

Additional Information

This error can also be generated when there are multiple offline or inaccessible Replicator appliances or there is a delay interacting with and processing API calls to the destination vSphere environment.

For more information, see: