Accidentally deleting cluster with vSAN and crashing vCenter
search cancel

Accidentally deleting cluster with vSAN and crashing vCenter

book

Article ID: 397791

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

vSAN, Vitual Machines (VMs), vCenter down after removing an unneeded datastore

When removing anything in vCenter, make sure you are deleting the appropriate item. In the case of the screenshot below, even though the Datastore tab is highlighted and a datastore is checked, if delete was clicked it would instead issue a delete command against the vSAN-ESA cluster, prompting it to dismantle the cluster. Always confirm what is to the left of the Actions field (in the red box in this case) to confirm what the command will be issued against.

 

When a VSAN cluster is deleted in this way, it will begin removing nodes from the cluster, and the unicast tables of the hosts, partitioning them until it eventually crashes VMs due to partitions. 

Resolution

If the cluster is accidentally deleted, and the vCenter is crashed because it was running on the vSAN cluster follow these steps to restore it.

 

1. SSH into each node and re-build the unicast tables to form the cluster back together. Please see: Configuring vSAN Unicast networking from the command line

 

2. Once the cluster is reformed, set each host to ignore vCenter so it won't re-partition when powering vCenter back up:

esxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListupdates

3. Find the vCenter and power it back up.

4. Once powered up re-add the removed nodes from the cluster back into the cluster by right clicking it and clicking 'add host' 

5. Once all hosts are re-added undo the setting from before by returning to the SSH sessions and issuing the following command:

esxcfg-advcfg -s 0 /VSAN/IgnoreClusterMemberListupdates

6. Navigate to the Cluster > Monitor Tab > Skyline Health under vSAN. You will see an alarm that vCenter is unauthoritative. Click troubleshoot on that alert, then click 'remediate' to clear the alarm and let vCenter re-gain control of the unicast tables. See: vSAN Skyline Health Service - Cluster health – vCenter state is authoritative