Partitioned vSAN cluster after upgrading vCenter Server on stretch cluster vSAN to version 6.6 or later

Products

VMware vSAN

Issue/Introduction

Symptoms:

The vSAN cluster became partitioned after upgrading.
The pre-upgrade version did not support unicast.
The upgraded version supports unicast.
The vCenter Server VM is on the same vSAN it is managing.

Environment

VMware vSAN 6.x

Cause

This issue occurs while vCenter Server is pushing the updated unicast config (with "Supports Unicast" = true) to all the upgraded nodes. Since the witness node is the last node to be upgraded, its unicast config on all data nodes will be updated last (Example, flipping the unicast support for witness node from 'False' to 'True' on all data nodes). As these witness unicast config updates are happening, a partition may occur in one of two ways in the software upgrade process.

Case 1

vCenter Server is upgraded to a release which supports unicast prior to 6.7u3, and ESXi is upgraded to any compatible release which supports unicast.

After the witness node has been upgraded, vCenter Server updates the witness unicast config on all data nodes by removing the old config and adding the new config. After being told to remove the old witness unicast config, a node with all other updated unicast configs will leave the existing multicast cluster, will transition to the unicast channel and will join the new unicast cluster formed by the other nodes which had left the multicast cluster earlier.

This sequential transition of nodes from the existing multicast cluster to the new unicast cluster can cause vCenter Server disk objects to become inaccessible preventing unicast config updates from completing and causing a partition.

This issue can only happen on a vCenter Server deployed on the same vSAN setup. To identify if this issue has occurred, below are the sample unicast configs from two different nodes participating in these two different clusters.

This node is running in unicast mode, all listed unicast agents have Supports unicast to be true, and witness unicast agent entry is missing.

[root@ESX1:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2020-03-10T22:22:24Z
Local Node UUID: ########-####-####-####-########7301
Local Node Type: NORMAL
Local Node State: AGENT
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: ########-####-####-####-########df01
Sub-Cluster Backup UUID: ########-####-####-####-########7b02
Sub-Cluster UUID: ########-####-####-####-########b401
Sub-Cluster Membership Entry Revision: 11
Sub-Cluster Member Count: 4
Sub-Cluster Member UUIDs: ########-####-####-####-########7301,########-####-####-####-########7b02,########-####-####-####-########7b03,########-####-####-####-########7f04
Sub-Cluster Member HostNames: ......
Sub-Cluster Membership UUID: ########-####-####-####-########df01
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: ########-####-####-####-########1d01 3 2020-03-10T20:40:36.357

[root@ESX1:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint
------------------------------------ --------- ---------------- ------------- ----- ---------- -----------------------------------------------------------
########-####-####-####-########7301 0 true 10.10.16.1 12321 ......
########-####-####-####-########7b02 0 true 10.10.16.2 12321 ......
########-####-####-####-########7b03 0 true 10.10.16.3 12321 ......
########-####-####-####-########7f04 0 true 10.10.16.4 12321 ......

Below is from the host in the multicast cluster partition. See that this node is running in multicast mode, and witness unicast agent entry have Supports Unicast as false.

[root@ESX5:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2020-03-10T22:23:24Z
Local Node UUID: ########-####-####-####-########7305
Local Node Type: NORMAL
Local Node State: AGENT
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: ########-####-####-####-########df05
Sub-Cluster Backup UUID: ########-####-####-####-########7b06
Sub-Cluster UUID: ########-####-####-####-########b401
Sub-Cluster Membership Entry Revision: 11
Sub-Cluster Member Count: 4
Sub-Cluster Member UUIDs: ########-####-####-####-########7305,########-####-####-####-########7b06,########-####-####-####-########7b07,########-####-####-####-########7f08
Sub-Cluster Member HostNames: ......
Sub-Cluster Membership UUID: ########-####-####-####-########df02
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: ########-####-####-####-########1d01 3 2020-03-10T20:40:36.357

[root@ESX5:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint
------------------------------------ --------- ---------------- ------------- ----- ---------- -----------------------------------------------------------
########-####-####-####-########0000 1 false 10.10.20.1 12321
########-####-####-####-########7305 0 true 10.10.16.5 12321 ......
########-####-####-####-########7b06 0 true 10.10.16.6 12321 ......
########-####-####-####-########7b07 0 true 10.10.16.7 12321 ......
########-####-####-####-########7f08 0 true 10.10.16.8 12321 ......

Case 2

vCenter Server is upgraded to a release 6.7u3 or later, and ESXi is upgraded to any release that supports unicast.

The software upgrade sequence remains same as discussed in Case 1, except that for the last step which updates all data nodes with the new witness unicast config. The vCenter Server on or after 6.7u3 does not update data nodes with a new witness unicast config after the witness node upgrades. So, the cluster does not get partitioned at this point and keeps using multicast channel for network communication. The actual witness unicast config update takes place when a cluster remediation event happens. For example, this remediation trigger could be a forced remediation through health check UI, or a RemediateVsanCluster API invocation through a script, or DFC, etc.

In the initial phase of cluster remediation, the vCenter Server will first try to update the witness unicast config on all data nodes as seen in Case 1. As a result, vCenter Server will update the witness unicast config on all data nodes. This might again lead to same issue and is subjected to the similar conditions/diagnoses.

Resolution

Healing a partitioned cluster

In case a partition occurs, it can be healed through either a series of manual steps or a script that runs in parallel on all hosts.

Manually updating the cluster

To resolve this issue the Supports Unicast=false will need to manually be updated to true.
Note: The automated approach explained below is highly recommended instead of the manual approach here.

Note: These steps need to be completed on each host.

List the unicast config with this command: esxcli vsan cluster unicastagent list

NodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint
------------------------------------ --------- ---------------- ------------- ----- ---------- -----------------------------------------------------------
########-####-####-####-########0000 1 false 10.10.20.1 12321
########-####-####-####-########7305 0 true 10.10.16.5 12321 ......
########-####-####-####-########7b06 0 true 10.10.16.6 12321 ......
########-####-####-####-########7b07 0 true 10.10.16.7 12321 ......
########-####-####-####-########7f08 0 true 10.10.16.8 12321 ......

Any unicast agent entry with Supports Unicast = false, will need to be removed: esxcli vsan cluster unicastagent remove -a xx.xx.xx.x

For example: esxcli vsan cluster unicastagent remove -a 10.10.20.1

If the entry is missing

Add the unicast entry manually:
esxcli vsan cluster unicastagent add -a xxx.xxx.xxx.xxx -u NodeUUID -U 1 -t witness

For example:
esxcli vsan cluster unicastagent add -a 10.10.20.1 -u ########-####-####-####-########7310 -U 1 -t witness

Using the Script

Download the attached remediate_witness.py to a machine that has access to all ESXi hosts and is not part of the vSAN cluster.
Run this script with this command: python remediate_witness.py -h host1[,host2][,...] -u [user] -p [password] -a [WITNESS_IP] -U [WITNESS_UUID] [-D]

Note:

Python variables may require the path to run the script, for example:
# python remediate_witness.py -h host1[,host2][,...] -u [user] -p [password] -a [WITNESS_IP] -U [WITNESS_UUID] [-D]
This script needs python paramiko environment to run, different environment settings may be needed for this lib.
Appending -D option (dryrun), will output a test run of the script.

Example:

# python remediate_witness.py -h 10.184.100.218,10.184.107.87,10.184.105.206,10.184.102.27 -a 1.2.3.4 -U ########-####-####-####-########ea91
DEBUG: Remediate hosts: ['10.184.100.218', '10.184.107.87', '10.184.105.206', '10.184.102.27']
10.184.100.218: esxcli vsan cluster unicastagent remove -a 1.2.3.4; esxcli vsan cluster unicastagent add -a 1.2.3.4 -u ########-####-####-####-########ea91 -U 1 -t witness
10.184.107.87: esxcli vsan cluster unicastagent remove -a 1.2.3.4; esxcli vsan cluster unicastagent add -a 1.2.3.4 -u ########-####-####-####-########ea91 -U 1 -t witness
10.184.102.27: esxcli vsan cluster unicastagent remove -a 1.2.3.4; esxcli vsan cluster unicastagent add -a 1.2.3.4 -u ########-####-####-####-########ea91 -U 1 -t witness
10.184.105.206: esxcli vsan cluster unicastagent remove -a 1.2.3.4; esxcli vsan cluster unicastagent add -a 1.2.3.4 -u ########-####-####-####-########ea91 -U 1 -t witness
DEBUG: Running command=esxcli vsan cluster unicastagent remove -a 1.2.3.4; esxcli vsan cluster unicastagent add -a 1.2.3.4 -u ########-####-####-####-########ea91 -U 1 -t witness on host 10.184.100.218
DEBUG: Running command=esxcli vsan cluster unicastagent remove -a 1.2.3.4; esxcli vsan cluster unicastagent add -a 1.2.3.4 -u ########-####-####-####-########ea91 -U 1 -t witness on host 10.184.102.27
DEBUG: Running command=esxcli vsan cluster unicastagent remove -a 1.2.3.4; esxcli vsan cluster unicastagent add -a 1.2.3.4 -u ########-####-####-####-########ea91 -U 1 -t witness on host 10.184.105.206
DEBUG: Running command=esxcli vsan cluster unicastagent remove -a 1.2.3.4; esxcli vsan cluster unicastagent add -a 1.2.3.4 -u ########-####-####-####-########ea91 -U 1 -t witness on host 10.184.107.87
DEBUG: Result rc=0 stdout= stderr= on host 10.184.107.87
DEBUG: Result rc=0 stdout= stderr= on host 10.184.102.27
DEBUG: Result rc=0 stdout= stderr= on host 10.184.105.206
DEBUG: Result rc=0 stdout= stderr= on host 10.184.100.218

Preventing the cluster partitioning

To avoid the partitioning from occurring, the vCenter Server must be on 6.7 U3 or later. As described above in Case 2, the upgrade of ESXi to 6.5/6.7 will not trigger the multicast to unicast change on the cluster (this can be confirmed with "esxcli vsan cluster get" to check on each host).

So, the above script can be run proactively to fix the witness unicast config on all data nodes. This script executes these commands on all hosts at the same time in parallel, so that all hosts can be updated within 30 seconds to avoid the APD timeout (which is why manual steps are not recommended). This proactive step will make sure that there will be no partition during cluster remediation leading to the VC going inaccessible. Run this script as quickly as possible after the ESXi upgrade, and avoid making any configuration changes (see the below list) until script is done. Once this is done, one can check whether each host is in expected state by running esxcli vsan cluster get command (no partition and running in unicast mode).

Note that the script will cause a temporary partition (due to hosts moving from the multicast to unicast mode) which will heal quickly within 30 seconds. However, if script runs into any issue which prevents it to fix a host's unicast config, it will cause a cluster partition which will not heal without the manual intervention. Below are some of the things which can be verified before running the script.

Verify network connectivity between jump Host (Linux/Windows) which will run the script to all the hosts in the cluster.
Verify that root password is the same for all the cluster hosts.
Do the dry run first.
Make sure script is run from jump host(Linux/Windows) which is not running on vSAN datastore.

As described above in Case 2, witness unicast config update will take place automatically(we want to avoid this) if a cluster remediation event happens. Below is a list of changes and user actions which can indirectly cause a cluster remediation event. These changes and actions should be avoided after upgrading the ESXi hosts and before running the scripts. Note that this is not an exhaustive list. To avoid any possible change to the system after the upgrade, and run the script as soon as possible.

Network configuration change
ESXi configuration change
Force health check
RemediateVsanCluster API invocation through a script
DFC
Management operation to the hosts and VC.

Additional Information

Impact/Risks:
If software upgrade causes a cluster to be partitioned, manual intervention is needed to fix the witness unicast configuration on all data nodes.

Some VMs (including the vCenter Server VM) may become inaccessible and their data may not be available.

Attachments

remediate_witness get_app