Inaccessible VMs Due To vSAN Node Decom State
search cancel

Inaccessible VMs Due To vSAN Node Decom State

book

Article ID: 317237

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Impact/Risks:
Any host stuck in DecomState: 6 will not be contributing storage or compute resources to the cluster because vSAN believes the host to still be in Maintenance Mode.
 
Symptoms:

Virtual Machines become inaccessible or orphaned in vSphere.
All the drives in all hosts are mounted in cmmds, and none of the hosts are network partitioned.
In vSphere, no hosts appear to be in Maintenance Mode.
When you check every host from the command line with: # esxcli system maintenanceMode get … they all show "Disabled"
But if you run the command "# localcli vsan cluster get" you'll see Maintenance Mode Enabled or "ON" (example below)

Cluster Information:
  Enabled: true
  Current Local Time: 2018-09-14T18:51:57Z
  Local Node UUID: ########-####-####-####-########a270
  Local Node Type: NORMAL
  Local Node State: AGENT
  Local Node Health State: HEALTHY
  Sub-Cluster Master UUID: ########-####-####-####-########a4dcool…0
  Sub-Cluster Backup UUID: ########-####-####-####-########a390
  Sub-Cluster UUID: ########-####-####-####-########f8c7
  Sub-Cluster Membership Entry Revision: 3
  Sub-Cluster Member Count: 4
  Sub-Cluster Member UUIDs: ########-####-####-####-########a390, ########-####-####-####-########a4d0, ########-####-####-####-########c7b0, ########-####-####-####-########a270
  Sub-Cluster Membership UUID: ########-####-####-####-########a4d0
  Unicast Mode Enabled: true
  Maintenance Mode State: ON  <<---This node is in Decom state according to vSAN.
  Config Generation: ########-####-####-####-########112e 5 2018-09-14T17:21:34.629


This means that the Host is in vSAN Decom State - maintenance mode didn't cancel or exit cleanly, so vSAN considers the host still in maintenance mode. 

Environment

VMware vSAN (All Versions)

Cause

This issue is frequently the result of issuing a maintenance mode task in vCenter, quickly followed by a cancellation of the maintenance mode task.
The host may end up stuck in vSAN Decom State: 6 where vSAN considers the host to still be in Maintenance Mode.

Resolution

Use the following command to see which Hosts are in this state (indicated by a value of "decomState": 6)

# echo "hostname,decomState,decomJobType";for host in $(cmmds-tool find -t HOSTNAME -f json |grep -B2 Healthy|grep uuid|awk -F \" '{print $4}');do hostName=$(cmmds-tool find -t HOSTNAME -f json -u $host|grep content|awk -F \" '{print $6}');decomInfo=$(cmmds-tool find -t NODE_DECOM_STATE -f json -u $host |grep content|awk '{print $3 $5}'|sed 's/,$//');echo "$hostName,$decomInfo";done|sort
hostname,decomState,decomJobType
esxi1,0,0
esxi2,0,0
esxi3,0,0
esxi4,6,0
esxi5,0,0
esxi6,0,0

To clear this state do the following:
1) Place the affected host into maintenance mode with No Action or No Data Migration depending on the version
2) Remove the host from maintenance mode
3) Verify Object health
4) Sometimes the host will not exist the DECOM_STATE ever after reboot .In this situation shutdown the host completely  ( check the dependency by debug precheck command beforehand ex. esxcli vsan debug evacuation precheck  -e " HOSTNAME/UUID" ) . Power off the host wait for 20 seconds and Power On again  

Additional Information