VLR N:1 Resource Mapping - Solving the SRM Failback Bottleneck
search cancel

VLR N:1 Resource Mapping - Solving the SRM Failback Bottleneck

book

Article ID: 437472

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

This document clarifies a common point of confusion with VMware Live Recovery (Formerly, SRM/VLSR) in Many-to-One (N:1) resource cluster mappings. In environments where multiple production clusters (e.g., Clusters A, B, and C) are mapped into a single recovery cluster (e.g., Cluster D), failover is typically straightforward—but the challenges often surface during Reprotect and Failback (Planned Migration).

The key issue is SRM’s reverse mapping behavior after a failover. Once the recovery site becomes the source, SRM must determine how to map workloads back to their original clusters, but in an N:1 design this introduces mapping ambiguity. As a result, administrators frequently need to manually adjust cluster/resource mappings to ensure placeholder VMs and replication workflows align with the correct original destination cluster and to prevent reprotect failures.



The sections below outline why this ambiguity occurs, how it impacts reprotect operations, and what to expect when planning failback in an N:1 SRM architecture.



Environment

VMware Live Recovery
VMware Live Site Recovery 
VMware Site Recovery Manager

Resolution

Production site ESXi hosts clusters  

Cluster A
Cluster B
Cluster C

Recovery site ESXi host cluster 

Cluster D

VMware Live Site Recovery (SRM) requires manual reconfiguration of cluster mappings for N:1 Resource Cluster Mapping when failing back VMs to their respective clusters. This means that after a failover, you'll need to manually update the resource mappings to return the VMs to their original clusters (A, B & C). 

In a Many-to-One (N:1) configuration, you are consolidating multiple protected sites/clusters (Cluster A, B, and C) into a single large recovery site (Cluster D). While the "Failover" can be automated and smooth, the "Reprotect" and "Failback" (PLANNED MIGRATION) is where the complexity occurs.


The "Reverse Mapping" Conflict

When you perform a Failover, SRM follows your mappings from Cluster A > D, Cluster B > D, and Cluster C > D. However, when you click REPROTECT to prepare for failback, SRM essentially flips the logic

Because at the Recovery Site Cluster D (is the only recovery cluster) is now the "Source site," SRM looks at its resource mappings to decide where to replicate and failback VMs. Since Site D is mapped to multiple target Cluster (A, B, and C), SRM fails to determine which VM belongs to which original cluster. It sees it as a 1-to-Many conflict.


Mapping Ambiguity: 

Because SRM resource mappings are typically defined at the Cluster or Resource Pool level, it creates a mapping ambiguity. If Cluster D is mapped to Cluster A, B, and C, SRM's "Reverse Mapping" logic will often default to the first mapping it finds or remain "Unmapped" thus needing a manual reconfiguration of mapping. 

During the REPROTECT operation, the placeholder VMs are created at the original cluster. If the reverse mapping isn't manually pointed back to the specific original cluster (Example: Recovery site Cluster D > Cluster A (original site)), the reprotect process will error out because this is a logical bottleneck of SRMs N:1 architecture. 

What is Reprotect ? It's the process of configuring protection in the reverse direction (This step works on replacing the VMX with Placeholder VMs)

 

How do I failback in this scenario after I have failed over ?

Instant Fix 

1. Failback 1 Cluster at a time from the Recovery Site D to the Source Site Clusters (A, B & C)

What this means is, you won't be able to failback all the recovery plans from the Recovery Site D to the Source Site Clusters (A, B & C) in a single event. But, you'll have to do a phased migration

The "Mapping Swap" Workflow
To perform a failback to Production site Cluster A, you must:

1. Go to Site Pair > Configure > Resource Mappings.
2. Map Cluster D ➡️ Production-Cluster A.
3. Run the Reprotect for the VMs belonging to Cluster A at Production Site.
4. Execute the Failback (Recovery Plan).

NOTE: The above steps presumes that your Recovery Plans (Protection Groups) include VMs belonging to a specific cluster (Cluster A or B or C) and that you have not included VMs from cross clusters at the Production site (Cluster A, B & C)

After finishing Cluster A, you can move on with the other remaining Clusters:

Go back to Mappings and change the mapping from Cluster D - RP-Cluster B ➡️ Cluster B.

Repeat the Reprotect/Failback for Cluster B group and so on. 

💡TIP: Always ensure the Reprotect finishes successfully for one group before changing the mappings for the next group.


OR 


Recommended Workaround  

Resource pools can be a viable workaround for N:1 cluster mappings in SRM. By mapping recovery site resource pools to production site resource pools, you can simplify failback and reduce manual reconfiguration.

Resource pools offer more granular control over the protection of VMs. You can include a desired list of VMs in the specific resource pool and protect them in the designated Protection Group.

Steps to Implement:

1. Create dummy Resource Pools: Set up resource pools as per your requirement at both protected and recovery sites.
 
A dummy resource pool is an object created just for the sake of organizing VMs for administrative purposes but is not used for setting (shares, limits, reservations)  

If you want to avoid changing mappings every time you switch sites/clusters, you can use Resource Pools as "Logical Proxies." This is the only way to simulate a 1-to-Many mapping effectively. At the Recovery sites, under Cluster D: Create three Resource Pools: 

Cluster D - Production-Cluster A 
Cluster D - Production-Cluster B 
Cluster D - Production-Cluster C 

These resource pools can then be used under resource mapping to map to the respective Cluster (A, B & C) at the Production Site. 

How this works: Since the mappings are now tied to the Resource Pools (which are unique) rather than the Cluster (which is shared), you don't have to manually reconfigure the mappings before every reprotect/failback. SRM will see that a VM in Cluster D - RP-Cluster A has a direct, dedicated path back to Cluster A.