VLSR - No hosts with hardware version which are powered on and not in maintenance mode are available
search cancel

VLSR - No hosts with hardware version which are powered on and not in maintenance mode are available

book

Article ID: 312728

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

  1. Unable to perform recovery tasks (Test, Failover, Failback)

  2. SRM cannot register & power ON VMs because the datastores isn’t mounted

  3. Recovery task does not complete, thereby not allowing to reprotect. 



Environment


VMware Site Recovery Manager 8.x
VMware Live Site Recovery 9.x

Cause

 

  1. LUN presentation to hosts are incorrect or incomplete.
  2. Placeholder datastore mapping is wrong
  3. Resource mapping is wrong
  4. VMs are replicated to the wrong target vSAN datastore
  5. vCenter license key has expired 

Resolution


The first thing to check when we see this error is to find whether the datastore in question is in maintenance mode and remove it & run the recovery task again to check what happens.

This error can be caused by 5 different components -  

1. LUN presentation
2. Resource Pools
3. vSAN
4. Virtual Volumes (vVols)
5. Expired vCenter license
 

1. LUN presentation 

 The LUNs must be presented to the hosts/clusters on the target site where you want the VMs to recover. You could encounter the following problems with this scenario.

1. LUNs are not presented to the recovery hosts/clusters 
2. LUNs are presented to the wrong recovery hosts/clusters
3. The placeholder(protected) VMs appear in the wrong cluster as a result of wrong resource mappings.

Fixing TEST recovery

 If you have run a TEST recovery, fixing this is going to be easy. When a TEST recovery is performed, SRM instructs the storage array or vSphere replication to take snapshots which can be easily cleaned-up by running a CLEANUP task from SRM. You just need to perform a CLEANUP & do the following. 

1. Present the LUNs to the correct recovery hosts/clusters 
2. If the LUNs are presented to the correct cluster but you are seeing the placeholder VMs displayed in a different cluster, check the following -

 A. Check if the placeholder datastore is mounted to this cluster. If not mounted, check if the LUN belonging to this datastore is mapped to this cluster and mount the placeholder datastore to all the hosts.
B. Check the resource mappings and update them to point to the correct recovery cluster.
C. Unprotect all VMs in the protection group and RESTORE ALL PLACEHOLDER VMs. Now, the protected VMs must be displayed in the correct recovery cluster.
D. Check SRM mappings overall and update them if you are finding any problems and proceed with TEST recovery.
 

Fixing PLANNED MIGRATION

When a PLANNED MIGRATION is run, you cannot revert this operation until the workflow is completed. Whether you use VR or ABR, SRM attempts to shut down the protected virtual machines gracefully if VMware tools is installed, if not its powered OFF. This is a critical situation for the customer as they are in the midst of moving their workloads from site A > site B within a specified Recovery Time Objective (RTO). When a planned migration halts and throws this error there's no way forward other than performing a manual failback because we can't complete the workflow until all issues are fixed at the recovery site.

When recovery fails using VR, the replication of the virtual machines reverts to the replication state before the attempted recovery thus making it much easier to perform a manual failback in comparison to ABR.

vSphere Replication manual recovery steps

1. Note down the list of VMs in the recovery plan.
2. vCenter won't allow you to power ON these VMs as it knows that a recovery or failover was triggered. Remove these VMs from vCenter inventory at the production site as the power ON icon is grayed out and re-register them from their respective datastores. Before removing them, please note down the names of the datastores where they are located.
3. Power ON the VMs and make sure the data is intact
6. Delete the protection group & recovery plan as its damaged. It can't be used because we failed back manually without completing the recovery workflow.
7. Follow the steps mentioned under - Fixing TEST recovery.
8. Recreate a new protection group & recovery plan.
9. Run a recovery TEST and PLANNED MIGRATION to check if everything is working as expected.

 CAUTION: Intervening a SRM recovery plan to make manual changes outside of its workflow or performing actions or tasks on the array when SRM is managing it will break the SRM PG & RP.

Storage based replication manual recovery steps

If you are using array based replication, the LUNs are failed over to the other site and mounted to the respective hosts/clusters where its presented thus requiring a manual failback of all the VMs in the recovery plan.

The process of promoting or demoting a LUN arises when a LUN is setup for replication. When we initiate a PLANNED MIGRATION (failover) from a site, the replicating LUN is demoted at the source site and promoted at the target site thereby enabling read/write access to that LUN & allowing it to be mounted as a datastore.

LUN Promoted - When a LUN failover is triggered, the target LUN is marked as read/write by the array whereas the source LUN become read only.
LUN Demoted - When a LUN failover is triggered, the source LUN is marked as read only & does not respond to reads or writes from the hosts its connected to.

Now, this process is completely reversible either thru SRM or externally thru the array. Since, SRM demands a recovery workflow to be completed successfully, it won't allow us to perform a REPROTECT until the recovery task completes. Now, since we can't fix this error, we'll have to make one of the two choices below to failback.

A. Reverse the replication of LUNs from target to source
B. Promote the source LUN 

A. Reverse the replication of LUNs from target to source

If the consistency of data is not questionable, then you can decide to choose this option. Please check with the customer before proceeding to choose this option, else you might end up replicating corrupted blocks or bad data to the source LUN. This operation must be performed by a storage administrator or by the storage vendor. We DO NOT do this part of it. This option won’t be required for this error because the VMs will not be recovered on the datastore and powered ON, since this is the problem we are working on. This option will be useful when VMs are powered ON, on the datastore and are being used by users leading to data changes. I’ve just mentioned it here as it falls within the context of things we are discussing.

RECOVERY SITE

1. Power OFF VMs on the replicated datastores you plan to failback manually from the array.
2. Take a screenshot of the VMs in the recovery plan
3. Remove these VMs from the datastores inventory
4. Unmount datastores from all hosts
5. Detach LUNs from all the hosts connecting to it
6. Rescan storage at the cluster level
7. Reverse replicate the LUNs from target/destination array to source array. I highly recommend involving your storage vendor to do this, if you aren't aware of performing this operation correctly. Breaking the replication pair relationship and re-establishing them may be required depending on the scenario, array type, etc.
8. After LUN replication is complete, prepare for failback.

PREPARING FOR FAILBACK 

1. Unprotect the VMs in protection group, if the recovery plan was partially recovered.
2. Remove all VMs from the production vCenter inventory that are part of the failed recovery plan 
3. Dissociate the protection group from recovery plan by editing it.
4. Delete the protection group and recovery plan
5. Recreate the protection group & recovery plan pointing from recovery to production site.
6. Perform a PLANNED MIGRATION
7. Run a REPROTECT

Promote the source LUN  

If customer doesn't want to replicate the LUN back to the source site due to various reasons like -

1. Storage vendor is unavailable or hasn't joined VMware on call yet. Nobody from their internal storage team knows how to reverse replicate LUNs on the array.
2. They want to bring up the VMs at the production site ASAP.
3. They don't trust the consistency of data on the target LUN anymore, hence don't want to use it, etc.
 

NOTE: Before following the steps below, follow steps 1-6 under RECOVERY SITE (skip steps that aren't applicable). Promote all LUNs to RW mode at the production site that were failed over to the recovery site. I highly recommend involving the storage vendor to do this, if you aren't aware of performing this operation correctly. After performing this operation, breaking the replication pair relationship and re-establishing them will be required depending on the array.   

PRODUCTION SITE 

1. Promote LUNs belonging to the recovery plan that was failed over 
2. Remove powered OFF VMs from VC inventory, that are part of the recovery plan
3. Attach all the LUNs corresponding to the detached datastores that was failed over to the recovery site.
4. Rescan storage -> Scan for new devices & VMFS volumes from cluster level
5. Mount all the datastores by assigning a new signature to one of the hosts in the cluster
6. Rescan storage -> Scan for new devices & VMFS volumes from cluster level, this will mount all the datastores you have just mounted by assigning a new signature to the remaining hosts in the cluster.
7. Register VMs and power ON
8. Delete the protection group & recovery plan as its damaged and can no longer be used. 

As we have failed back manually, we have to cleanup the replication bits on the array belonging to the consistency group connected to the protection group we are working on. This task must be performed on the array and the replication must be reversed in the opposite direction pointing to the target site before performing the steps below.

10. Verify the steps mentioned under - Fixing TEST recovery.
11. Recreate a new protection group & recovery plan, the replicated datastore must show-up when creating a new PG, if the replication was properly reversed.
12. Run a TEST and PLANNED MIGRATION to check if everything is working as expected.

RESOURCE POOLS

Placeholder Create Error: No hosts with hardware version '18' and datastore(s) '"XX-XXX"' which are powered on and not in maintenance mode are available.

You can get this error when creating a placeholder for protected VMs or when performing failovers. Check to see if there are resource pools with same names under different clusters or datacenters at the same site, this can lead to this problem. Rename the list of resource pools having the same name as the list of resource pools under a different cluster/datacenter to a different name. Example : Resource pool-old

Renaming the resource pools will fix this error.
 

vSAN 

When replicating VMs to the recovery site consisting of many vSAN clusters/datastores. You must pay attention to the vSAN datastore you are replicating to & the resource mappings. You will encounter this error if -

1. The target vSAN datastore you are replicating to belongs to a different host cluster than the one added to resource mappings.
2. The resource mappings are wrong causing the placeholder VMs to be created on a different vSAN cluster which is not connected to the vSAN datastore consisting the replicas.

TEST recovery

 When a TEST recovery is performed, please perform the following steps -

1. Run a cleanup
2. Gracefully remove VMs from under replication tab, DO NOT retain seeds as the replica seeds are sitting on the wrong vSAN datastore
3. Reconfigure VMs for replication and point them to the correct vSAN datastore
4. Check and update the correct resource mappings
4. Unprotect the VMs from protection group and restore all placeholder VMs
5. Run a TEST and PLANNED MIGRATION to check if everything is working as expected. 

To manually recover VMs from a PLANNED MIGRATION, please refer to the steps under vSphere Replication manual recovery steps
 

VIRTUAL VOLUMES (vVols) 

When performing failback on Virtual Volumes datastores, the operation might fail with the following error - No hosts with hardware version '19' and datastore(s) '"xxx"' which are powered on and not in maintenance mode are available.

1. Make sure the vVol datastore is mounted to the host cluster you are recovering it to.
2. If vVol datastore can't be mounted due to the PEs not being visible, please continue with vVol troubleshooting. (The case should be moved to Storage SME)
3. This error is also seen, when you mount a vVol datastore with the same name to multiple datacenters.

This issue is fixed in Site Recovery Manager 8.7.0.3 | 05 SEP 2023 | Build 22359471
 

EXPIRED vCENTER LICENSE  

 You could also encounter this error when vCenter license expires leaving the hosts and datastores in a disconnected state. Replace the vCenter license with an active one.

 

Additional Information

Impact/Risks:

PLANNED MIGRATIONS can have production impact on customers. Customers also perform TEST recoveries to ensure everything is working fine before performing planned migrations. If TEST recoveries are failing and planned migrations are scheduled within a few hours or a day later, its still considered as a potential production impact in the future that must be fixed at the earliest.