Test Recovery takes a long time to complete and fails with, Operation timed out: 900 seconds.
VM fails on 'Change recovery site storage to writable' step and displays : Operation timed out: 900 seconds
Re-running the Test Recovery plan fails with same error.
The failure issue is seen only on VM with large disk size.
When Test Recovery is initiated without selecting option - "Replicate recent changes to recovery site" as per below screenshot,
The error does not occurs and plan successfully completes.
VMware Live Recovery 9.0.2
The VMs containing multiple virtual disks. During the recovery phase, an incremental sync is initiated, which requires merging the associated hbrdisks (replication logs) with the parent disks. Due to the volume of data, this consolidation process takes a significant amount of time to complete, triggering 'Operation timed out' errors.
####-##-##T##:##:09.060Z verbose vmware-dr[01385] [SRM@6876 sub=Replication opID=########-####-####-####-############-failover:b1a1:f93f:c124:2957] HandleEntityCompletion: Queuing callback for completion of protected VM Id=[dr.replication.ProtectedVm:########-####-####-####-############:protected-vm-######]
####-##-##T##:##:09.061Z error vmware-dr[39179] [SRM@6876 sub=Replication.VmRecoveryInterface ctxID=b9a6720f opID=########-####-####-####-############-failover:####:####:####] Protection VM 'dr.replication.ProtectedVm:protected-vm-######' failed operation 'Recover'! There are '0' warnings for this Protection Group. Failure Reason: N2Dr16TimeoutExceptionE Operation timed out after '900.000591' seconds
-->
[context]zKq7AVECAAQAAL/UWwEJdm13YXJlLWRyAADM6xtsaWJ2bWFjb3JlLnNvAAFZCw9saWJjb25uZWN0aW9uLWJhc2Uuc28AApg6A2xpYmZ1bmN0aW9uYWwuc28AAIKvQQDeSDUA4mE1ALCLSgOujgBsaWJwdGhyZWFkLnNvLjAABC/eD2xpYmMuc28uNgA=[/context]
####-##-##T18:45:06.155Z info hbrsrv[01629] [Originator@6876 sub=StatsLog groupID=GID-########-####-####-####-############ opID=hsl-246a8da0] HbrEvent: {"eventID":"consolidateProgress","groupID":"GID-########-####-####-####-############","percentCompleted":75,"ETA":"####-##-##T01:45:47.155268Z","serverID":"########-####-####-####-############","hbrEvent":1}
Follow below steps,
Run a manual sync (Sync Now) for the VM in replication, give time for consolidation process to complete and then run the Test Recovery Plan again, it must succeed. You can monitor the target vSphere Replication appliances' hbrsrv.log to find out when the consolidation task completes.
Test Recovery plan to be run with out selecting option - "Replicate recent changes to Recovery site" and it will complete successfully, since consolidation is not triggered hence recovery task will complete soon.
For a permanent resolution increase the timeout threshold, please increase the time out values of the settings below under both the SRM sites to a value that you deem correct.
Change Remote Manager Settings
Configure the maximum time to wait for a remote operation to complete. The default value is 900 seconds. Parameter - remoteManager.defaultTimeout
Configure an additional timeout period for tasks to complete on the remote site. The default value is 900 seconds. Parameter - remoteManager.taskDefaultTimeout