During an upgrade of a vSAN cluster one or more nodes become partitioned from the rest of the cluster, forming one or more cluster partitions.
No indication of network communication issues (e.g. vmkping between nodes on the vSAN network succeeds).
vmkernel.log for the ESXi hosts which are partitioned show messages similar to the below - where X is the version being promoted to and Y is the version to be promoted from.
WARNING: CMMDS: CMMDSPromoteFormatVersion:423: Failed to promote the node to a format version X beyond its software version Y
vSAN 7.x, 8.x, 9.x
Recreating or adding disk groups to the cluster which are using an on-disk format (ODF) version higher than the rest of the cluster causes the CMMDS version on these nodes to be updated,. These nodes are then non-compatible with the nodes that have not been upgraded yet (as they are unable to use later versions of CMMDS).
Removing the higher ODF disk groups will not resolve the issue as this will not revert the CMMDS version in use.
Setting virsto version to legacy format will not resolve the issue as this will not revert the CMMDS version in use.
Whenever a new node is added to the cluster, or if an existing node is moved out of the cluster and re-added back, the minNodeMajorVersion should be on the same version on all of the ESXi hosts. If it is not, it will trigger this cluster partition issue and could cause VMs to become inaccessible.
minNodeMajorVersion can be verified in the CLI using the below command
/usr/lib/vmware/vsan/bin/clom-tool stats | grep "minNodeMajorVersion"
This issue occurs where nodes have incompatible CMMDS versions .
This issue can be avoided by not adding/creating/re-creating disk groups to a higher format until all hosts have been upgraded to the same ESXi build - if disk groups have to be recreated during the upgrade, then temporarily set the virsto version to use the same ODF version that all other hosts in the cluster are on. These changes should be reverted once all hosts have been upgraded:
How to format vSAN Disk Groups with a legacy format version
Understanding vSAN on-disk format versions and compatibility
Workaround:
If problem has already happened, there are two options to deal with it:
OR
/altbootbank/boot.cfg .If this option is chosen then the disk groups created on a higher ODF version will need to be removed prior to rollback/re-install so as to not cause further issues.