Recovering VM template with RDM is not supported by SRM

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

1. Running a Planned migration reports Incomplete Recovery
2. Unable to recover VMs at the target site
3. Error - Recovering VM template with RDM is not supported by SRM. Some virtual machines in the protection group 'ABC' could not be recovered

VMware-dr.log :

2023-11-11T19:56:39.123+01:00 error vmware-dr[09529] [SRM@6876 sub=StorageProvider opID=4cc8dc73-d207-4948-82f7-f776c2f23f36-failover:ede8:0088:cbe0:c41f:ffb2] GetTargetInfoForRecoveredVm: Cannot recover RDM for VM template 'protected-vm-9292'
2023-11-11T19:56:39.123+01:00 error vmware-dr[09529] [SRM@6876 sub=AbrRecoveryEngine opID=4cc8dc73-d207-4948-82f7-f776c2f23f36-failover:ede8:0088:cbe0:c41f:ffb2] ResolveRdms: error getting target infor VM: protected-vm-9292
--> (dr.storageProvider.fault.VmTemplateWithRdmNotSupported) {
-->  faultCause = (vmodl.MethodFault) null,
-->  faultMessage = <unset>
-->  msg = ""
--> }
-->

Environment

VMware Site Recovery Manager 8.x

Cause

On the secondary site during RelocatePhVmLocation operation, we check whether the requested new PH or folder locations are already the same as the existing ones. This can happen if the mappings point to the same recovery target for example. In such a case, we clear out the requested change from the operation context. The problem is that instead of unsetting the std::optional, we reset the MoRef object to 'null' which causes the issue.

The problem is rare as in the majority of the cases, problematic 'null' RP/folder data will be always corrected by an immediate update received from the PH VM VC inventory monitoring reflecting VM migration operations. This essentially covers the original problematic corner case implementation in RelocatePhVmLocation.

In some cases, we can have a race condition where the PH VM inventory update from VC comes prior to the internal SRM operation completion.

What is the reason for having problematic VMs with unknown resource pool locations that aren't templates?
To provide dynamic VMPG protection SRM monitors production VM location in VC inventory and reacts accordingly in case of any changes. Once it notices a folder or resource pool location for a protected VM, SRM checks the current inventory mappings configuration and issues placeholder VM location relocation if any updates are required.

How to recognize the problem in advance ?
In SRM logs search for "Resource pool 'vim.ResourcePool:(null)' autoConfigured " pattern to find VMs with possible wrongly set Resource Pool locations. From the found log message get the protected VM id(e.g. 'protected-vm-10777') and discover the name of the VM from other related logs.

Then from DR UI check the current known PH VM location for this VM in Protection Group's Virtual Machine View. In case of a problem, there should be no data shown for Resource Pool.

Resolution

Workaround:

- Restart the recovery site SRM server to force new VC inventory updates for all PH VMs and correct the wrongly set Resource Pool locations.

- As a sure way to avoid the problem until a fix is provided for SRM, the client can disable the Dynamic VMPG protection feature from advanced settings which essentially will force the well-known static protection behavior for SRM with version 8.7 and older. The advanced setting is "replication.updateVmProtectionOnPlacementChange" . Once changed for both SRM sites, the client still needs to restart SRM servers to correct any possible VMs left in a problematic state prior to the advanced setting change.

VMware Documentation

Fix:

This issue is resolved in SRM 8.8.0.3 and above versions