No hosts with hardware version which are powered ON and not in maintenance mode are available
search cancel

No hosts with hardware version which are powered ON and not in maintenance mode are available

book

Article ID: 312728

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

  1. Unable to perform recovery tasks (Test, Failover, Failback)

  2. SRM cannot register & power ON VMs because the datastores isn’t mounted

  3. Recovery task does not complete, thereby not allowing to reprotect. 



Environment

VMware Site Recovery Manager 8.x
VMware Live Site Recovery 9.x
VMware Live Recovery 9.x

Cause

1. LUN presentation to hosts are incorrect or incomplete.

2. Placeholder datastore mapping is wrong

3. Resource mapping is wrong

4. VMs are replicated to the wrong target vSAN datastore

5. vCenter license key has expired 

Resolution

First, check whether the datastore is in maintenance mode. If it is, take it out of maintenance mode and rerun the recovery task.
 
This error can be caused by five different components:

1. LUN Presentation

2. Resource Pools

3. Replicating to the wrong vSAN Cluster (Datastore)

4. Virtual Volumes (vVols)

5. Expired vCenter License
 

1. LUN Presentation 

LUNs must be presented to the target-site recovery hosts/clusters for VM recovery. Otherwise, you may run into these issues:
  1. LUNs aren’t presented to the recovery hosts/clusters.
  2. LUNs are presented to the wrong recovery hosts/clusters.
  3. Placeholder (protected) VMs appear in the wrong cluster due to incorrect resource mappings.


Fixing TEST recovery

If you have run a TEST recovery, fixing this is going to be easy. When a TEST recovery is performed, SRM instructs the storage array or vSphere replication to take snapshots which can be easily cleaned-up by running a CLEANUP task from SRM. You just need to perform a CLEANUP & do the following. 

After CLEANUP:
  1. Present the LUNs to the correct recovery hosts/clusters.

  2. If the LUNs are already presented but placeholder VMs appear in the wrong cluster, verify:

    • A. Placeholder datastore: Ensure it’s mounted on the cluster. If not, map the datastore LUN to the cluster and mount the datastore on all hosts.

    • B. Resource mappings: Update them to the correct recovery cluster.

    • C. Placeholders: Unprotect all VMs in the protection group, then Restore All Placeholder VMs.

    • D. SRM mappings: Review and correct any mapping issues, then rerun the TEST recovery.

 

Fixing PLANNED MIGRATION

When a PLANNED MIGRATION is run, you cannot revert this operation until the workflow is completed. Whether you use VR or ABR, SRM attempts to shut down the protected virtual machines gracefully if VMware tools is installed, if not its powered OFF. This is a critical situation for the customer as they are in the midst of moving their workloads from site A > site B within a specified Recovery Time Objective (RTO). When a planned migration halts and throws this error there's no way forward other than performing a manual failback because we can't complete the workflow until all issues are fixed at the recovery site.

When recovery fails using VR, the replication of the virtual machines reverts to the replication state before the attempted recovery thus making it much easier to perform a manual failback in comparison to ABR.

vSphere Replication manual recovery steps

  1. Note down the list of VMs in the recovery plan.

  2. Since vCenter blocks powering ON these VMs after a recovery/failover, remove them from the production vCenter inventory and re-register them from their datastores. Before removal, record the datastore names.

  3. Power ON the VMs and verify data integrity.

  4. Delete the damaged protection group and recovery plan; they can’t be reused because failback was done manually without completing the recovery workflow.

  5. Follow the steps under Fixing TEST recovery.

  6. Create a new protection group and recovery plan.

  7. Run a recovery TEST and PLANNED MIGRATION to confirm everything works as expected.
CAUTION: Intervening a SRM recovery plan to make manual changes during the workflow orchestration or performing actions or tasks on the array when SRM is managing it will break the SRM PG & RP. 


Storage based replication manual recovery steps (ABR)

If you are using array based replication, the LUNs are failed over to the other site and mounted to the respective hosts/clusters where its presented thus requiring a manual failback of all the VMs in the recovery plan.

The process of promoting or demoting a LUN arises when a LUN is setup for replication. When we initiate a PLANNED MIGRATION (failover) from a site, the replicating LUN is demoted at the source site and promoted at the target site thereby enabling read/write access to that LUN & allowing it to be mounted as a datastore.

LUN Promoted - When a LUN failover is triggered, the target LUN is marked as read/write by the array whereas the source LUN become read only.
LUN Demoted - When a LUN failover is triggered, the source LUN is marked as read only & does not respond to reads or writes from the hosts its connected to.

Now, this process is completely reversible either thru SRM or externally thru the array. Since, SRM demands a recovery workflow to be completed successfully, it won't allow us to perform a REPROTECT until the recovery task completes. Now, since we can't fix this error, we'll have to make one of the two choices below to failback.

A. Reverse the replication of LUNs from target to source
B. Promote the source LUN 

A. Reverse the replication of LUNs from target to source

If the consistency of data is not questionable, then you can decide to choose this option. Please check with the customer before proceeding to choose this option, else you might end up replicating corrupted blocks or bad data to the source LUN. This operation must be performed by a storage administrator or by the storage vendor. We DO NOT do this part of it. This option won’t be required for this error because the VMs will not be recovered on the datastore and powered ON, since this is the problem we are working on. This option will be useful when VMs are powered ON, on the datastore and are being used by users leading to data changes. I’ve just mentioned it here as it falls within the context of things we are discussing.

RECOVERY SITE

  1. Power OFF VMs on the replicated datastores you’ll fail back manually from the array.

  2. Screenshot the VMs in the recovery plan.

  3. Remove the VMs from the datastore inventory.

  4. Unmount the datastores from all hosts.

  5. Detach the LUNs from all connected hosts.

  6. Rescan storage at the cluster level

  7. Reverse-replicate the LUNs from the target/destination array to the source array. If you’re not fully comfortable with this process, we recommend involving your storage vendor to do this task. Alternatively, you may need to break and re-establish replication pairs depending on the array and scenario.

  8. After LUN replication complete, prepare for failback.

PREPARING FOR FAILBACK 

  1. If the recovery plan partially completed, unprotect the VMs in the protection group.

  2. Remove all VMs in the failed recovery plan from the production vCenter inventory.

  3. Edit the recovery plan to dissociate it from the protection group.

  4. Delete the protection group and recovery plan.

  5. Recreate the protection group and recovery plan, pointing from recovery to production.

  6. Perform a PLANNED MIGRATION.

  7. Run REPROTECT.

Promoting the source LUN(s)

If you don't want to replicate the LUN back to the source site due to factors like -

  1. Storage vendor is unavailable or hasn't joined VMware by Broadcom on a meeting yet. Nobody from your internal storage team knows how to reverse replicate LUNs on the array.

  2. You want to bring up the VMs at the production site ASAP.

  3. You don't trust the data consistency on the target LUN anymore, hence don't want to use it, etc.
     
NOTE: Before proceeding, complete steps 1–6 under RECOVERY SITE (skip any that don’t apply). At the PRODUCTION SITE, promote all LUNs that were failed over to the recovery site to RW mode. If you’re not confident performing this task correctly, VMware by Broadcom recommends involving the storage vendor. Depending on the array, you may then need to break and re-establish the replication pairs.


PRODUCTION SITE 

  1. Promote LUNs for the failed-over recovery plan.

  2. Remove powered-off VMs in the recovery plan from vCenter inventory.

  3. Reattach LUNs for the detached datastores that were failed over to the recovery site.

  4. Rescan storage at the cluster level (scan for new devices and VMFS volumes).

  5. Mount datastores by assigning a new signature to one of the hosts in the cluster.

  6. Rescan storage -> Scan for new devices & VMFS volumes from cluster level, this will mount all the datastores you have just mounted by assigning a new signature to the remaining hosts in the cluster.

  7. Register the VMs and power them ON.

  8. Delete the damaged protection group and recovery plan.
Since the failback was manual, we have to cleanup the replication bits on the array belonging to the consistency group connected to the protection group we are working on. This task must be performed on the array and the replication must be reversed in the opposite direction pointing to the target site before continuing.
  1. Verify the steps under Fixing TEST recovery.

  2. Recreate the protection group and recovery plan. The replicated datastore should appear during PG creation if replication was reversed correctly.

  3. Run a TEST and PLANNED MIGRATION to confirm everything works as expected.



RESOURCE POOLS

Placeholder Create Error: No hosts with hardware version '18' and datastore(s) '"XX-XXX"' which are powered on and not in maintenance mode are available.

This can occur when creating a placeholder for protected VMs or during failovers. It may be caused by resource pools with the same name under different clusters or datacenters at the same site. Rename the duplicate resource pools (for example, to Resource pool-old) to resolve the issue.

Renaming the resource pools will fix this error.

vSAN 

When replicating VMs to the recovery site consisting of many vSAN clusters/datastores. You must pay attention to the vSAN datastore you are replicating to & the resource mappings. You will encounter this error if -

1. The target vSAN datastore you are replicating to belongs to a different host cluster than the one added to resource mappings.

2. The resource mappings are wrong causing the placeholder VMs to be created on a different vSAN cluster which is not connected to the vSAN datastore consisting the replicas.

TEST Recovery

 When a TEST recovery is performed, please perform the following steps -

1. Run a cleanup

2. Gracefully remove VMs from under replication tab, DO NOT retain seeds as the replica seeds are sitting on the wrong vSAN datastore

3. Reconfigure VMs for replication and point them to the correct vSAN datastore

4. Check and update the correct resource mappings

5. Unprotect the VMs from protection group and restore all placeholder VMs

6. Run a TEST and PLANNED MIGRATION to check if everything is working as expected. 

To manually recover VMs from a PLANNED MIGRATION, please refer to the steps under vSphere Replication manual recovery steps
 

VIRTUAL VOLUMES (vVols) 

Failback on Virtual Volumes (vVol) datastores may fail with this error:

No hosts with hardware version '19' and datastore(s) '"xxx"' which are powered on and not in maintenance mode are available.

To resolve:
  1. Verify the vVol datastore is mounted to the target host cluster you are recovering to.

  2. If it can’t be mounted because the PEs aren’t visible, continue vVol troubleshooting and escalate the case to a Storage SME.

  3. Check for duplicate vVol datastore names mounted across multiple datacenters, which can also trigger this error.

This issue is fixed in Site Recovery Manager 8.7.0.3 | 05 SEP 2023 | Build 22359471
 

EXPIRED vCENTER LICENSE  

You may see this error if your vCenter license has expired, which can disconnect hosts and datastores. Replace the license with an active one.

Additional Information