How to handle inconsistent cluster power status in vSAN shutdown workflow
search cancel

How to handle inconsistent cluster power status in vSAN shutdown workflow

book

Article ID: 315520

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

When performing vSAN cluster shutdown via the vCenter UI, it is possible to hit cluster power status issues in the following scenarios:


Scenario 1

  1. vCenter runs in the vSAN cluster.

  2. At least one shutdown and start cluster operation has been performed.

  3. After successfully restart the cluster, if vSAN health service is restarted, the cluster is back to powered off status and "Restart cluster" is available.



Scenario 2

  1. vCenter doesn't run in the vSAN cluster. 

  2. The vSAN cluster is successfully shutdown.

  3. During power on the cluster, errors occurred with "General vSAN error. Exception in GetNextPowerStatus None".



Scenario 3

  1. vCenter doesn't run in the vSAN cluster.

  2. The vSAN cluster is successfully shutdown.

  3. vCenter is restarted externally or vSAN health service is restarted.

  4. The cluster is back to powered on status and "Restart cluster" is not available.





Environment

VMware vSAN 7.0.x

Resolution

Upgrade vCenter and ESXi to 7.0U3d or higher

Workaround:

Workaround for scenario 1

  1. Clear the config.
     
    • In /etc/vmware-vsan-health/config.conf file, remove line in section [PowerSystem]:
      • e.g. state_for_domain-c47 = vcVMPoweredOff
         
    • Restart vSAN health
      • $ ssh root@<vc-ip>
      • $ vmon-cli -r vsan-health
         
  2. The vSAN cluster will be back to normal.

Workaround for scenario 2

  1. Clear the config. (refer to scenario 1 step 1)

  2. Manually set the cluster power status to 'clusterPoweredOff'.

    • Go to vsan mob link like https://vcenterIp/vsan/mob/?moid=vsan-cluster-power-system&method=updateClusterPowerStatus 

      • If the mob link is inaccessible, try to enable it first:
        1. login to vCenter appliance via SSH.
        2. 'rvc administrator@<domain name>@localhost' and provide credentials
        3. 'vsan.debug.mob --start 1'
        4. 'quit'

    • Login with administrator@<domain name> and password

    • Input your cluster id and power status.

      • To get the cluster ID select the affected vSAN cluster. Copy the cluster domain id from the URL of the browser. It should be similar to 'domain-c<number>', not the entire string, as outlined in red in the example below:



    • i.e.  <cluster type="ComputeResource">domain-c8</cluster>, status should be 'clusterPoweredOff' and then press "Invoke Method"

  3. The vSAN cluster will be back to the status for powering on and "Restart cluster" will be available.

Workaround for scenario 3

  1. Manually set the cluster power status to 'clusterPoweredOff'. (refer to scenario 2 step 2) 
  2. The vSAN cluster will be back to the status for powering on and "Restart cluster" will be available.
If the vCenter is unavailable, the configuration changes made by the Shutdown Cluster Wizard can be reverted by applying the following on all ESXi nodes in the cluster via SSH:

# esxcfg-advcfg -s 0 /VSAN/DOMPauseAllCCPs
# esxcfg-advcfg -s 0 /VSAN/IgnoreClusterMemberListUpdates