Error: "An instance with id 'replica-########-####-####-####-############' was not found." when performing a failover of a vApp replication
search cancel

Error: "An instance with id 'replica-########-####-####-####-############' was not found." when performing a failover of a vApp replication

book

Article ID: 384638

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

  • When you perform a failover of a multi-VM vApp replication in VMware Cloud Director Availability (VCDA), some of the VM replications failover as expected but others fail with an error similar to:

    An instance with id 'replica-########-####-####-####-########d8e9' was not found.

  • In the /opt/vmware/h4/replicator/log/replicator.log file on the destination Replicator Appliance, you see entries similar to:

    DEBUG - [########-####-####-####-########94f7] [pc-task-monitor-1] c.v.h.r.monitoring.hbr.HbrTaskListener   : The HBR task failed (hbr.replica.TaskInfo) {
       dynamicType = null,
       dynamicProperty = null,
       task = ManagedObjectReference: type = HbrReplicaTask, value = Hbr.Replica.Task.########-####-####-####-########0ffe, serverGuid = null,
       operation = groupCreateFailoverImage,
       method = HbrReplicationGroupCreateFailoverImage_Task,
       object = ManagedObjectReference: type = HbrReplicationGroup, value = Hbr.Replica.Group.H4-########-####-####-####-########4fde, serverGuid = null,
       extraInfo = null,
       state = error,
       canceled = false,
       cancelable = false,
       error = (hbr.replica.fault.InstanceNotFound) {
          faultCause = null,
          faultMessage = null,
          groupId = H4-########-####-####-####-########4fde,
          instanceId = replica-########-####-####-####-########d8e9
       },
       result = null,
       progress = 100,
       totalTransferSizeKb = null
    }
    ERROR - [########-####-####-####-########5658-Gn-OCh-bJ-i1] [pc-task-monitor-1] com.vmware.h4.jobengine.JobExecution     : Task ########-####-####-####-########5748 (WorkflowInfo{type='failover', resourceType='replication', resourceId='H4-########-####-####-####-########4fde', isPrivate=false, resourceName='null'}) has failed

    com.vmware.h4.replicator.api.exceptions.InstanceNotFound: An instance with id 'replica-########-####-####-####-########d8e9' was not found.
            at com.vmware.h4.replicator.converters.HbrExceptionConverter.lambda$static$9(HbrExceptionConverter.java:62)
            at com.vmware.h4.common.error.ExceptionConversionService.convert(ExceptionConversionService.java:100)
            at com.vmware.h4.replicator.replication.SuspendableJob.lambda$waitForHbrTaskAndResume$2(SuspendableJob.java:292)
            ...

  • This behaviour occurs when you perform a failover for a vApp replication, while the replication is active and the source VMs are still available.

Environment

VMware Cloud Director Availability 4.x

Cause

This behaviour occurs because the instance selected for the failover operation has been collapsed based on the retention rule enforcement when a synchronization is started after the failover instance was selected but before the actual failover of the VM replication is started.

This behaviour can occur for any kind of replication, but is more likely to occur when leveraging small RPO windows and advanced retention rules.

As this behaviour occurs only when the replication is active and the source VMs are still available, it is not possible to experience the behaviour when impacted by a disaster event where the source VMs are no longer available.

Resolution

To minimise the chance of encountering this behaviour when wanting to failover replications while the source VMs are still active and available, failover your replications using one of the following methods:

  • Leverage the test failover feature to test the DR capability of VCDA for the vApp replications you which to test. For more information see, Test failover a replication
  • If an actual failover is required; perform a migrate instead as this will incorporate a synchronization as a part of the failover workflow to ensure up-to-date consistency of data between the source and recovered VMs. For more information, see Migrate a replication.
  • If an actual failover is required and you need to be able to select the instance to failover to; ensure a synchronization isn't currently active for the protected VMs, then power off the source VM before performing the failover to ensure no new deltas are being transferred and no new instances are being created while the failover is running. For more information, see Failover a replication