HCX - Bulk Migration may fail due to "Invalid Power State" of VM
search cancel

HCX - Bulk Migration may fail due to "Invalid Power State" of VM

book

Article ID: 321613

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

The article provides information to identify and explain the cause of an HCX Bulk Migration workflow failure.

Symptoms:
In certain environment, Bulk Migration workflow may fail during the switchover process with the following error:

Power off task failed due to(vim.fault.InvalidPowerState) { faultCause = null, faultMessage = null, requestedState = poweredOn, existingState = poweredOff }

In the app.log (/common/logs/admin/app.log), a similar error can be found:

2021-10-02 09:19:27.821 UTC [ReplicationTransferService_SvcThread-36921, Ent: HybridityAdmin, , TxId: <txID>] INFO  c.v.h.s.r.j.VirtualmachineOperationJob- waiting for shutdown
2021-10-02 09:19:27.825 UTC [ReplicationTransferService_SvcThread-36921, Ent: HybridityAdmin, , TxId: <txID>] INFO  c.v.h.s.r.j.VirtualmachineOperationJob- Waiting for guest shutdown, Retry count: 1
2021-10-02 09:19:48.100 UTC [ReplicationTransferService_SvcThread-36922, Ent: HybridityAdmin, , TxId: <txID>] INFO  c.v.h.s.r.j.VirtualmachineOperationJob- Waiting for guest shutdown, Retry count: 2
2021-10-02 09:20:08.376 UTC [ReplicationTransferService_SvcThread-36914, Ent: HybridityAdmin, , TxId: <txID>] INFO  c.v.h.s.r.j.VirtualmachineOperationJob- Waiting for guest shutdown, Retry count: 3
2021-10-02 09:20:28.661 UTC [ReplicationTransferService_SvcThread-36930, Ent: HybridityAdmin, , TxId: <txID>] INFO  c.v.h.s.r.j.VirtualmachineOperationJob- Waiting for guest shutdown, Retry count: 4
2021-10-02 09:20:48.948 UTC [ReplicationTransferService_SvcThread-36914, Ent: HybridityAdmin, , TxId: <txID>] INFO  c.v.h.s.r.j.VirtualmachineOperationJob- Waiting for guest shutdown, Retry count: 5
2021-10-02 09:21:19.631 UTC [ReplicationTransferService_SvcThread-36931, Ent: HybridityAdmin, , TxId: <txID>] ERROR c.v.h.s.r.j.VirtualmachineOperationJob- Job (########-####-####-####-##########a2) failed with exception Power off task failed due to(vim.fault.InvalidPowerState) {
   faultCause = null,
   faultMessage = null,
   requestedState = poweredOn,
   existingState = poweredOff
}
java.lang.RuntimeException: Power off task failed due to(vim.fault.InvalidPowerState) {
   faultCause = null,
   faultMessage = null,
   requestedState = poweredOn,
   existingState = poweredOff
}



Cause

  • During Bulk migration switchover, the Graceful Power Off task did not complete, even with 'forcePowerOff' enabled, as the VM was already powered off due to a delayed shutdown, resulting in a migration failure.
  • Guest OS shutdown takes more than 100sec (20sec for each retry attempt).
  • Due to this timing issue, when HCX initiated power off task, the guest OS shutdown task also completes at the same time. Thereafter, vCenter throws an error that is unable to power off an already powered off VM,

An additional cause of not being able to Power Off Guest OS during Switchover Using HCX Bulk Migration when 'forcePowerOff' is not set is well described here: KB 82269

Resolution

From HCX 4.6.1 release onwards, if the virtual machine does not power off after 100 seconds during the Bulk Migration cut-over stage, HCX waits for an additional 100 seconds to handle any delay in synchronizing with the vCenter Server before failing the migration workflow. Refer to HCX Release Notes.

Workaround:

  • In the event that other VMs take longer or cannot be shutdown gracefully from Guest OS, the recommendation is for the customer to enable "Force Power Off" upon scheduling the migrations.
  • It is also recommended for the customer to use "Seed Checkpoint" for Bulk Migrations which is available from HCX 4.1.0 release onwards, in the event that the workflow fails and rolls back at the cut over stage, so when rescheduling the migrations, the workflow will try to reuse the data already copied in the previous attempt.

Note: The recommendation is to not perform cleanup operation of failed job which leads to the removal of seed data.

Additional Information

Impact/Risks:
It only impacts HCX Bulk migration workflow. There is no impact to other migration profiles like vMotion/RAV/Cold Migration.