Device stuck in 'Failover in progress' state even after the failover and reprotect is complete
search cancel

Device stuck in 'Failover in progress' state even after the failover and reprotect is complete

book

Article ID: 406175

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

  • NetApp array based replication is in use with Site recovery manager 

  • Planned migration is initiated for a recovery plan and it is observed that the recovery plan failed at the presynchronize storage phase

  • Reinitiated the planned migration, and the failover completed successfully, following which a reprotect is performed

  • The protection group associated with this recovery plan has multiple datastores protected and one of the datastores is missing from the protection group

  • The datastore is missing from the vCenter server inventory as well

  • NetApp vendor is engaged and the device is presented to the ESXi hosts again and after performing a rescan of the storage replication adapters and initiating  discover devices, it is observed that the device status reports as 'Failover in progress'

  • From the NetApp storage, it is verified that the device is in OK state and is replicating actively

Environment

VMware Live Site Recovery 8.x

VMware Live Site Recovery 9.x

Cause

The device status is stuck in 'Failover in progress' state since the device was disconnected while the planned migration was initiated. After the first failure, the planned migration was initiated again and this device was no longer mapped to the ESXi hosts and was no longer part of the recovery plan. Hence, after the failover when a reprotect was performed, this device entry was not removed from the devices.txt file from the NetApp server. When the device was mapped to the ESXi hosts again, since the device entry still exists in the devices.txt file it reports as Failover in progress

Resolution

Inorder to resolve this issue, below steps needs to be followed to remove the stale entry of the device from the devices.txt file on the NetApp server. Engage the NetApp vendor if you do not have access to your OTV appliance

  1. Enable SSH access to the OTV Appliance using the below steps

    1. Login to the vSphere Client.

    2. Navigate to the OTV VM.

    3. Open the OTV VM console.

    4. Login to the maintenance console with username maint and associated password.

    5. Choose option: 2  ) System Configuration .

    6. Choose option: 6  ) Enable SSH access . Note: If option 6 is displayed as 6 ) Disable + + + SSH access, then SSH access is already enabled. Proceed to step 7.

    7. Choose option: b  ) Back .

    8. Choose option: 4 ) Support and Diagnostics .

    9. Choose option: 3 ) Enable remote diagnostic access .

    10. Set the one time use password.

    11. SSH to the OTV IP with your preferred SSH method.

    12. Login with username diag and the password set in step 8.

  2. Log into the OTV Appliance using the provided credentials (you may need to check OTV appliances at both protected and recovery sites)

  3. Backup the devices.txt file within the affected VSC/OTV appliance

    diag@vscserver:~$ cd /opt/netapp/vpserver/conf
    diag@vscserver:~$ sudo su
    diag@vscserver:~$ cp devices.txt devices.txt.backup

  4. Manually remove the stale devices within devices.txt that appear under the Failover in Progress status in SRM

  5. Return to the Array Pairs window in SRM, and re-run Discover Devices on the relationships showing Failover In Progress

  6. After the discovery completes, the devices will display the correct status