vSAN objects unhealthy due to DECOM state
search cancel

vSAN objects unhealthy due to DECOM state

book

Article ID: 317237

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Impact/Risks:
 

Any host stuck in DECOM STATE: 6 will not be contributing storage or compute resources to the cluster because vSAN believes the host to still be in Maintenance Mode.
 
Symptoms:
  • Virtual Machines become invalid, inaccessible or orphaned in vSphere
  • The vSAN objects may be shown as reduced availability
  • You may see error "There are currently X usable fault domains. The operation requires Y more usable fault domains" while creating VMs/ snapshots
  • All the drives in all hosts are mounted in cmmds, and none of the hosts are network partitioned.
  • In vSphere, no hosts appear to be in Maintenance Mode.
  • When you check every host from the command line with: # esxcli system maintenanceMode get ... they all show "Disabled"
  • But if you run the command "# localcli vsan cluster get" you'll see Maintenance Mode Enabled or "ON" (example below)

    Cluster Information:
      Enabled: true
      Current Local Time: 2018-09-14T18:51:57Z
      Local Node UUID: ########-####-####-####-########a270
      Local Node Type: NORMAL
      Local Node State: AGENT
      Local Node Health State: HEALTHY
      Sub-Cluster Master UUID: ########-####-####-####-########a4dcool ... 0
      Sub-Cluster Backup UUID: ########-####-####-####-########a390
      Sub-Cluster UUID: ########-####-####-####-########f8c7
      Sub-Cluster Membership Entry Revision: 3
      Sub-Cluster Member Count: 4
      Sub-Cluster Member UUIDs: ########-####-####-####-########a390, ########-####-####-####-########a4d0, ########-####-####-####-########c7b0, ########-####-####-####-########a270
      Sub-Cluster Membership UUID: ########-####-####-####-########a4d0
      Unicast Mode Enabled: true
      Maintenance Mode State: ON  <<---This node is in Decom state according to vSAN.
      Config Generation: ########-####-####-####-########112e 5 2018-09-14T17:21:34.629


    This means that the Host is in vSAN Decom State - maintenance mode didn't cancel or exit cleanly, so vSAN considers the host still in maintenance mode. 

 

  • Use the following script to see which Host(s) in cluster are in DECOM state (indicated by a value of "decomState": 6)

    • echo "hostname,decomState,decomJobType";for host in $(cmmds-tool find -t HOSTNAME -f json |grep -B2 Healthy|grep uuid|awk -F \" '{print $4}');do hostName=$(cmmds-tool find -t HOSTNAME -f json -u $host|grep content|awk -F \" '{print $6}');decomInfo=$(cmmds-tool find -t NODE_DECOM_STATE -f json -u $host |grep content|awk '{print $3 $5}'|sed 's/,$//');echo "$hostName,$decomInfo";done|sort

      hostname,decomState,decomJobType
      esxi1,0,0
      esxi2,0,0
      esxi3,0,0
      esxi4,6,0   >> DECOM STATE 6
      esxi5,0,0
      esxi6,0,0

       For more information about the various decommission states 
       
      Decommission State Meaning
      0 None - the node is not decommissioned
      1 The decommissioning process has been started
      3 The decommissioning process is underway
      6 The node has been decommissioned

 

  • We can also check the node in decom state by running the below commands:
    •  
    • cmmds-tool find -t NODE_DECOM_STATE
    • cmmds-tool find -t HOSTNAME -u (UUID provided by the above command)

For example, check the below output:

Environment

VMware vSAN 7.x
VMware vSAN 8.X

Cause

  • This issue is frequently the result of issuing a maintenance mode task in vCenter, quickly followed by a cancellation of the maintenance mode task.
  • This issue can also occur if the vSAN cluster was not shutdown properly before a planned shutdown/maintenance activity.
  • The host may end up stuck in vSAN Decom State: 6 where vSAN considers the host to still be in Maintenance Mode.

Resolution

To clear the DECOM state of the ESXi host(s) , you may follow the below steps:
 
1) Place the affected host into maintenance mode with No Action or No Data Migration depending on the version
2) Remove the host from maintenance mode
3) Verify Object health
4) Sometimes the host will not exist the DECOM_STATE ever after reboot. In this situation shutdown the host completely ( check the dependency by debug precheck command beforehand ex. esxcli vsan debug evacuation precheck -e " HOSTNAME/UUID") . Power off the host wait for 20 seconds and Power On again  

Additional Information