A Test Failover operation in Site Recovery Manager fails with the SRA message: Failed to create LUN snapshot
search cancel

A Test Failover operation in Site Recovery Manager fails with the SRA message: Failed to create LUN snapshot

book

Article ID: 304526

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:
In a datacenter environment where ESX servers in the production and disaster recovery sites are connected to EMC Clariion CX storage arrays and the production site is protected using VMware Site Recovery Manager(SRM) using EMC MirrorView SRA (Storage Replication Adapter), running an SRM Test Failover operation may fail during the preparing storage stage.
The logs from SRM (default location C:\Documents and Settings\All Users\Application Data\VMware\VMware vCenter Site Recovery Manager\Logs) and the SRA show the message Failed to create Lun snapshots as:
[2009-07-16 18:27:19.874 'SecondarySanProvider' 968 trivia] 'tasteful' returned <Response>
[#3] <Return Code>0</Return Code>
[#3] </Response>

[2009-07-16 18:27:19.874 'SecondarySanProvider' 968 info] Return code for tasteful: 0
[2009-07-16 18:27:19.874 'SecondarySanProvider' 968 trivia] 'Prepare storage for group 'sharecrop' for recovery' took 56.749 seconds
[2009-07-16 18:27:19.874 'BeginImageTest-Task' 968 verbose] Error set to (Dr.San.fault.Execution Error) {
[#3] dynamic Type = <unset>,
[#3] error Message = "Failed to create lun snapshots",
[#3] MSG = ""
[#3] }

[2009-07-16 18:27:19.874 'BeginImageTest-Task' 968 info] State set to error
Note: The highlighted sections indicate the response from the MirrorView SRA and are not from SRM.


Resolution

During a Test Failover operation, SRM makes a request to the SRA to perform a test failover of the datastore LUNs. These are the replicated LUNs that were discovered during the array discovery and LUN discovery stages. For the Test Failover operation, failing over a LUN means creating LUN snapshots for the LUN-replicas that are established on the DR (Disaster Recovery) site. If creating snapshots for these LUNs fails, SRM Test Failover operation will fail.
The error message failed to create lun snapshot reported by the MirrorView SRA can occur in two different but related scenarios. In both scenarios, two parameters are considered: the number of LUNs for which SRM requests the failover and the Consistency Groups (CGs) defined by the SAN administrator as part of the LUN replication configuration.
Scenario 1:
The consistency group defined by the SAN administrator contains a certain number (n) of LUNs and SRM requests the failover of a number of LUN less than the number (n). MirrorView SRA requires that LUN snapshot requests (LUN Failover requests made by SRM) should be made for all the LUNs contained in a consistency group.
Scenario 2:
SRM makes a failover request (i.e. snapshot request) for a group of LUNs that are in the same consistency group (CG) and the number of LUNs in this group exceeds a certain maximum allowed by the storage array. For example, in arrays such as Clariion CX4-140, this limit is 8 and in some other CX-arrays, the limit is 16.
In this scenario, the solution enabler reports this message:

A Multi-Lun Consistent operation was attempted that specified more devices than allowed. The limit is either 8 or 16 for this array type, so please inspect your Navisphere documentation on what the correct numbers should be. Please retry the operation using the correct number of devices.

Note: This message is not propagated to the MirrorView SRA. Therefore, you do not see this message in the SRM/SRA logs.
To resolve this issue, limit the number of LUNs in a consistency group to its maximum allowed value. This value would be the maximum number of LUNs in a CG for which you can take simultaneous snapshots. For more information, see the EMC SAN documentation.