Enabling Replication for a virtual machine may fail due to stale replication group GIDs in the VRM database

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

Attempting to configure replication for a virtual machine may fail with this error message in the vSphere Replication (VR) GUI:

Call "HmsGroup.CurrentSpec" for object "" on Server "<IP-address> failed. An unknown error has occurred.
On the primary vSphere Replication Management (VRM) appliance (server), in the hms.log located at /opt/vmware/hms/logs/, you see entries similar to:

2013-04-22 06:35:45.323 ERROR hms.replication [hms-vlsi-server-thread-36] (..hms.replication.AbstractGroup) opID=ed179018-54dc-43fb-81ca-8d9717649945 | Error while retrieving configuration Spec for group GID-67fc61da-2964-4d34-b7ff-b1d22938461b
java.util.concurrent.ExecutionException: com.vmware.vim.binding.vmodl.fault. ManagedObjectNotFound:
obj = com.vmware.vim.binding.vmodl.ManagedObjectReference@93db716
inherited from com.vmware.vim.binding.vmodl.fault.ManagedObjectNotFound: Managed object of type 'HmsGroup' with id 'GID-67fc61da-2964-4d34-b7ff-b1d22938461b' not found
at com.vmware.vim.vmomi.core.impl.BlockingFuture.get(BlockingFuture.java:70)
at com.vmware.hms.replication.PrimaryGroupImpl.getTargetHbrServer(PrimaryGroupImpl.java:3757)
at com.vmware.hms.replication.AbstractGroup.currentSpec(AbstractGroup.java:316)
at com.vmware.hms.replication.AbstractGroup.currentSpec(AbstractGroup.java:339)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.vmware.vim.vmomi.server.impl.InvocationTask.run(InvocationTask.java:61)
at com.vmware.vim.vmomi.server.common.impl.RunnableWrapper$1.run(RunnableWrapper.java:48)
at com.vmware.jvsl.sessions.net.impl.SessionExecutor$1.run(SessionExecutor.java:60)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: com.vmware.vim.binding.vmodl.fault.ManagedObjectNotFound:
obj = com.vmware.vim.binding.vmodl.ManagedObjectReference@93db716
inherited from com.vmware.vim.binding.vmodl.fault.ManagedObjectNotFound: Managed object of type 'HmsGroup' with id 'GID-67fc61da-2964-4d34-b7ff-b1d22938461b' not found

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware vSphere Replication 8.x

Cause

The cause of this issue is network connectivity issues, the secondary VRM server intermittently losing connection with or being unable to communicate with the primary VRM server.

If a virtual machine Remove Replication operation is initiated on an affected VRM server, the operation completes on the secondary VRM server. However, it may not complete on the primary VRM if the network connectivity issue exists between the two VRM servers. Due to this synchronizing issue, it is likely that stale replication group records related to this virtual machine will remain on the primary VRM server database in the GID record, which looks similar to:

GID-67fc61da-2964-4d34-b7ff-b1d22938461b

The virtual machine replication state information shown in the VM view of the VR user interface is drawn from the information on the secondary VRM server. As the virtual machine replication information has been removed from the secondary VRM server, the virtual machine will not show as replicated in the VR GUI.

Attempting to configure replication for this virtual machine reveals errors reported by the primary VRM server. This is due to the presence of a stale replication record for that virtual machine, which is still in the primary VRM server's database. The primary VRM server queries the secondary VRM server for the same (stale) GID which cannot be found. Therefore, the primary VRM reports the ManagedObjectNotFound error in the logs.

Resolution

To resolve this issue, you need to remove entries for this virtual machine using the Managed Object Browser (MOB)

VRMS 8.x and higher

Use the primary site VRM server MOB to modify the GID entry for the affected virtual machine.

For example, use this URL if the moid is GID-d0cf6528-e20d-47cc-b94a-9175276afd3c:

https://IP_address_of_primnary_VRM_server:8043/mob/?moid=GID-d0cf6528-e20d-47cc-b94a-9175276afd3c&vmodl=1
Log in with your vCenter Server username and password. The account you use to log in must have full administrative privileges.
Under Methods, click Destroy. A new window appears.
Click the Invoke Method button in the new window.
Clean up leftover files at the target replication destination datastore.

Note: This is the replication destination datastore previously configured for this virtual machine.
Reconfigure replication for the virtual machine.

Notes:

vCenter Server AD domain name, username and password are case sensitive.
The destroy method is available in the Next Generation Client (NGC) Web UI when using VR 5.5.

Additional Information

For information on replication job creation, see VMware vSphere Replication Documentation.

为虚拟机启用复制可能因 VRM 数据库中的复制组 GID 过时而失败