Recovering vSAN iSCSI Objects After a Site Failure in a vSAN Stretched Cluster
search cancel

Recovering vSAN iSCSI Objects After a Site Failure in a vSAN Stretched Cluster

book

Article ID: 439973

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

In a vSAN stretched cluster, if vSAN iSCSI objects have site failure tolerance configured, and one site along with the witness node goes down, the objects lose quorum and become inaccessible. If it is confirmed that the failed site or the witness node is unlikely to be restored, the surviving site may need to perform a forced takeover. Currently, the vSAN site force takeover feature does not support recovery of vSAN iSCSI objects. This article explains the steps required to recover vSAN iSCSI objects in this site failure scenario.

Note: Ensure that the hosts in the failed site and the witness host do not rejoin the cluster before performing the resolution steps provided in this article.

Environment

vSAN 9.1 or above

Resolution

4.1. Overall Workflow
  1. Run the /usr/lib/vmware/vsan/bin/site-takeover script on one of the ESXi hosts in the surviving site and identify the vSAN iSCSI objects listed in the output file.
  2. Log out the iSCSI sessions from all initiators.
  3. On all hosts in the surviving site, stop the vSAN iSCSI service daemon by running the following command: /etc/init.d/vitd io_stop
  4. Reboot all iSCSI initiators.
  5. Recover all vSAN iSCSI objects identified in Step 1 by using the procedure provided in Section 4.2.
  6. On all hosts in the surviving site, start the vSAN iSCSI service daemon by running the following command: /etc/init.d/vitd start
  7. Restore the iSCSI connections on the initiators disconnected in Step 2.
4.2. Recover Inaccessible vSAN iSCSI Objects in the Surviving Site
  1. Run the following commands to force the recovery of vSAN iSCSI objects during the site takeover process:
vsish -e set /config/VSAN/intOpts/ClomEnableRecoveryOfSkippedObjs 1
/usr/lib/vmware/vsan/bin/site-takeover
 
  1. After the site takeover operation completes, restore the option to its default value by running the following command:
vsish -e set /config/VSAN/intOpts/ClomEnableRecoveryOfSkippedObjs 0