An instance with id 'replica-########-####-####-####-########d8e9' was not found.
/opt/vmware/h4/replicator/log/replicator.log
file on the destination Replicator Appliance, you see entries similar to:DEBUG - [########-####-####-####-########94f7] [pc-task-monitor-1] c.v.h.r.monitoring.hbr.HbrTaskListener : The HBR task failed (hbr.replica.TaskInfo) {
dynamicType = null,
dynamicProperty = null,
task = ManagedObjectReference: type = HbrReplicaTask, value = Hbr.Replica.Task.########-####-####-####-########0ffe, serverGuid = null,
operation = groupCreateFailoverImage,
method = HbrReplicationGroupCreateFailoverImage_Task,
object = ManagedObjectReference: type = HbrReplicationGroup, value = Hbr.Replica.Group.H4-########-####-####-####-########4fde, serverGuid = null,
extraInfo = null,
state = error,
canceled = false,
cancelable = false,
error = (hbr.replica.fault.InstanceNotFound) {
faultCause = null,
faultMessage = null,
groupId = H4-########-####-####-####-########4fde,
instanceId = replica-########-####-####-####-########d8e9
},
result = null,
progress = 100,
totalTransferSizeKb = null
}
ERROR - [########-####-####-####-########5658-Gn-OCh-bJ-i1] [pc-task-monitor-1] com.vmware.h4.jobengine.JobExecution : Task ########-####-####-####-########5748 (WorkflowInfo{type='failover', resourceType='replication', resourceId='H4-########-####-####-####-########4fde', isPrivate=false, resourceName='null'}) has failed
com.vmware.h4.replicator.api.exceptions.InstanceNotFound: An instance with id 'replica-########-####-####-####-########d8e9' was not found.
at com.vmware.h4.replicator.converters.HbrExceptionConverter.lambda$static$9(HbrExceptionConverter.java:62)
at com.vmware.h4.common.error.ExceptionConversionService.convert(ExceptionConversionService.java:100)
at com.vmware.h4.replicator.replication.SuspendableJob.lambda$waitForHbrTaskAndResume$2(SuspendableJob.java:292)
...
VMware Cloud Director Availability 4.x
This behaviour occurs because the instance selected for the failover operation has been collapsed based on the retention rule enforcement when a synchronization is started after the failover instance was selected but before the actual failover of the VM replication is started.
This behaviour can occur for any kind of replication, but is more likely to occur when leveraging small RPO windows and advanced retention rules.
As this behaviour occurs only when the replication is active and the source VMs are still available, it is not possible to experience the behaviour when impacted by a disaster event where the source VMs are no longer available.
To minimise the chance of encountering this behaviour when wanting to failover replications while the source VMs are still active and available, failover your replications using one of the following methods: