Recovery plan stuck in "recovery required" state after initiating a planned migration.
search cancel

Recovery plan stuck in "recovery required" state after initiating a planned migration.

book

Article ID: 390538

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

  • A planned migration was initiated, and all virtual machines in the recovery plan successfully failed over.

  • Despite the successful failover, the recovery plan status remains stuck in “Recovery Required.”
  • The Reprotect option is greyed out, preventing further actions.

  •  The history report indicates that the planned migration was completed successfully.

  • Virtual machines are successfully failed over and powered on at the DR site.

Environment

VMware Live Site Recovery 8.x

VMware Live Site Recovery 9.x

 

Cause

This issue occurs because the peer group monitor did not update the protection group state as expected. Ideally, the protection group state should transition to deactivated after the planned migration. However, since this state was not updated, the recovery plan entered a failedOverSplit state, causing it to display “Recovery Required.”

The exact reason for the delay in updating the group state is unknown. However, once the group state is eventually updated, the recovery plan status also updates accordingly.

Cause Validation:

From the /opt/vmware/support/logs/srm/vmware-dr.log file on the recovery site SRM appliance, we can see that the status of the recovery plan changed from failoverInProgress to failedOverSplit instead of failedOver

2025-03-01T04:10:49.673Z info vmware-dr[28586] [SRM@6876 sub=Recovery opID=d8a95864-74f9-4d6e-a14b-36bf93bacd18-failover:f1ba] [52f4b] Starting failover workflow for plan dr.recovery.RecoveryPlan:044cf05b-f434-40cc-9a10-2fe214ceef65:2fc4e937-5e15-40a9-946e-607bd87c4c96 'SISPROD01' (planned)
2025-03-01T04:10:49.685Z verbose vmware-dr[03280] [SRM@6876 sub=Recovery ctxID=9f8215c9 opID=d8a95864-74f9-4d6e-a14b-36bf93bacd18-failover:f1ba] [52f4b] Changing info.stateInfo.state : failoverInitiated -> failoverInProgress
2025-03-01T04:17:19.950Z info vmware-dr[27213] [SRM@6876 sub=Recovery ctxID=9f8215c9 opID=d8a95864-74f9-4d6e-a14b-36bf93bacd18-failover:f1ba] CalculatePlanState: [2fc4e937-5e15-40a9
-946e-607bd87c4c96] Plan has an override state: failoverInProgress
2025-03-01T04:17:19.950Z verbose vmware-dr[27213] [SRM@6876 sub=Recovery ctxID=9f8215c9 opID=d8a95864-74f9-4d6e-a14b-36bf93bacd18-failover:f1ba] CalculatePlanState: split detail: startedProtectionVmsPowerOff=1, allProtectionVmsPoweredOff=1, active
2025-03-01T04:17:19.950Z verbose vmware-dr[27213] [SRM@6876 sub=Recovery ctxID=9f8215c9 opID=d8a95864-74f9-4d6e-a14b-36bf93bacd18-failover:f1ba] CalculatePlanState: [2fc4e937-5e15-4
0a9-946e-607bd87c4c96] Local state is failedOverSplit (peer state not accounted for)
2025-03-01T04:17:19.950Z verbose vmware-dr[27213] [SRM@6876 sub=Recovery ctxID=9f8215c9 opID=d8a95864-74f9-4d6e-a14b-36bf93bacd18-failover:f1ba] CalculatePlanState: [2fc4e937-5e15-4
0a9-946e-607bd87c4c96]: Remote peer state is peerFailedOver
2025-03-01T04:17:19.950Z info vmware-dr[27213] [SRM@6876 sub=Recovery ctxID=9f8215c9 opID=d8a95864-74f9-4d6e-a14b-36bf93bacd18-failover:f1ba] CalculatePlanState: [2fc4e937-5e15-40a9
-946e-607bd87c4c96] Plan state is failedOverSplit
2025-03-01T04:17:19.950Z verbose vmware-dr[27213] [SRM@6876 sub=Recovery ctxID=9f8215c9 opID=d8a95864-74f9-4d6e-a14b-36bf93bacd18-failover:f1ba] Changing info.stateInfo.state : failoverInProgress -> failedOverSplit

After sometime, we can see that the peer state changed from active to deactivated after which the plan status changed from failedOverSplit to failedOver

2025-03-01T05:43:49.940Z info vmware-dr[03280] [SRM@6876 sub=Replication opID=2f48728a] Transitioning peer state of group 'vm-protection-group-1830:dr.replication.VmProtectionGroup:044cf05b-f434-40cc-9a10-2fe214ceef65' from 'active' to 'deactivated'
2025-03-01T05:43:49.948Z verbose vmware-dr[28422] [SRM@6876 sub=Recovery ctxID=9f8215c9 opID=b3e7a8f5] Recalculating state for plan: [dr.recovery.RecoveryPlan:044cf05b-f434-40cc-9a1
0-2fe214ceef65:2fc4e937-5e15-40a9-946e-607bd87c4c96]
2025-03-01T05:43:49.948Z verbose vmware-dr[28422] [SRM@6876 sub=Recovery ctxID=9f8215c9 opID=b3e7a8f5] CalculatePlanState: [2fc4e937-5e15-40a9-946e-607bd87c4c96] Local state is fail
edOver (peer state not accounted for)
2025-03-01T05:43:49.948Z verbose vmware-dr[28422] [SRM@6876 sub=Recovery ctxID=9f8215c9 opID=b3e7a8f5] CalculatePlanState: [2fc4e937-5e15-40a9-946e-607bd87c4c96]: Remote peer state
is peerFailedOver
2025-03-01T05:43:49.948Z info vmware-dr[28422] [SRM@6876 sub=Recovery ctxID=9f8215c9 opID=b3e7a8f5] CalculatePlanState: [2fc4e937-5e15-40a9-946e-607bd87c4c96] Plan state is failedOv
er
2025-03-01T05:43:49.948Z verbose vmware-dr[28422] [SRM@6876 sub=Recovery ctxID=9f8215c9 opID=b3e7a8f5] Changing info.stateInfo.state : failedOverSplit -> failedOver

 

Resolution

As mentioned in the Cause section, the reason for this behavior is unknown, but the group state will update automatically after some time, resolving the issue.

Workaround:

If immediate resolution is needed, restart the SRM services to force an update of the group state. This will correct the recovery plan status and restore normal functionality.