Migration and replication tasks fail because the task status did not update in a timely fashion in Cloud Director Availability 4.x
search cancel

Migration and replication tasks fail because the task status did not update in a timely fashion in Cloud Director Availability 4.x

book

Article ID: 315159

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

Symptoms:

  • Tasks for managing existing or creating new replications and migrations fail with an error similar to:
Assuming task 'Task ID' failed, because its status did not update in a timely fashion.
  • This issue occurs when one or more remote Replicators are offline or inaccessible.
  • In the /opt/vmware/h4/cloud/log/cloud.log file on the Cloud Replication Management Appliance of the destination site, you see entries similar to:
2020-07-12 01:07:51.174 ERROR - [UI-ID] [task-poller-4] com.vmware.h4.jobengine.JobExecution     : Task ID (WorkflowInfo{type='migrate', resourceType='vmReplication', resourceId='C4-ID', isPrivate=false, resourceName='MyApplication'}) has failed.

com.vmware.vdr.error.exceptions.TaskMonitoringTimeOutException: Assuming task 'c4-ID' failed, because its status did not update in a timely fashion.
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at com.vmware.h4.exceptions.GenericServerExceptionProvider.get(GenericServerExceptionProvider.java:120)
    at com.vmware.h4.exceptions.GenericServerExceptionProvider.get(GenericServerExceptionProvider.java:97)
    at com.vmware.h4.common.task.H4ApiTaskToTaskConverter.toTask(H4ApiTaskToTaskConverter.java:31)
    at com.vmware.task.rest.client.TaskMonitor.lambda$workImpl$0(TaskMonitor.java:191)
    at com.vmware.task.rest.client.TaskMonitor.notifyListener(TaskMonitor.java:213)
    at com.vmware.task.rest.client.TaskMonitor.workImpl(TaskMonitor.java:190)
    at com.vmware.task.rest.client.TaskMonitor.work(TaskMonitor.java:122)
    at com.vmware.h4.cloud.service.ManagerTaskMonitorService.lambda$taskMonitor$0(ManagerTaskMonitorService.java:107)
    at com.vmware.h4.common.mdc.MDCRunnableWrapper.run(MDCRunnableWrapper.java:30)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
 
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

 

Environment

VMware Cloud Director Availability 4.x

Cause

This issue occurs when the threads used by the scheduler become saturated while it waits for responses from the offline or inaccessible Replicators, which prevents it from updating items in the expected timeframe.

Resolution

  • To resolve this issue, if a remote site is not currently needed for active protections or migrations then it should either be left running to maintain the cross site connectivity or unpaired before being powered down.
  • For more information, see the Upair Paired Sites section of the Unpair paired sites from the Cloud Director site
  • If a site is needed for active protections or migrations, then all Replicators in that site should be online and accessible.

Additional Information