Failure to promote CMMDs version resulting in vSAN cluster to become partitioned during upgrade
search cancel

Failure to promote CMMDs version resulting in vSAN cluster to become partitioned during upgrade

book

Article ID: 326888

calendar_today

Updated On: 03-05-2025

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • During an upgrade of a vSAN cluster one or more nodes become partitioned from the rest of the cluster, forming one or more cluster partitions.

  • No indication of network communication issues (e.g. vmkping between nodes on the vSAN network succeeds).

  • vmkernel.log for the host of the cluster partition(s) messages similar to the following are observed e.g. Where X is the version promoting to and Y is the version promoting from .

  • WARNING: CMMDS: CMMDSPromoteFormatVersion:423: Failed to promote the node to a format version X beyond its software version Y 
  • cluster partition issue may also happen due to minNodeMajorVersion mismatach across the hosts in the vSAN cluster.
  • This issue can also be observed when a new host is added to the vSAN cluster which is on a higher ESXi version than the existing nodes either with a Disk-Group still present or creating new Disk-Groups.
 



Environment

VMware vSAN 6.X
VMware vSAN 7.X
VMware vSAN 8.X

Cause

Recreating or adding Disk-Groups to the cluster which are using an ODF version higher than the rest of the cluster causes CMMDS version on these nodes to be updated, these are then non-compatible with the nodes that have not been upgraded yet (as they are unable to use later versions of CMMDS). 
Removing the higher ODF Disk-Groups will not resolve the issue as this will not revert the CMMDS version in use.
Setting virsto version to legacy format will not resolve the issue as this will not revert the CMMDS version in use.

Whenever a new node is added to the cluster, or if a existing node is moved out of the cluster and re-added back, we need to verify that the minNodeMajorVersion should be on the same version on all of the hosts, if not it will trigger the cluster partition issue and cause production VMs to go inaccessible.

minNodeMajorVersion can be verified in the CLI using the below command

/usr/lib/vmware/vsan/bin/clom-tool stats | grep "minNodeMajorVersion"

Resolution

This issue occurs where nodes have incompatible CMMDS versions .

This issue can be avoided by not adding/creating/re-creating Disk-Groups of a higher format until all hosts have been upgraded to the same ESXi build - if Disk-Groups have to be recreated during the upgrade then

we need to temporarily set the virsto to use the same ODF version that all other hosts are having in the cluster, these changes should be reverted once all hosts have been upgraded:

How to format vSAN Disk Groups with a legacy format version

Understanding vSAN on-disk format versions and compatibility

Workaround:

If you've already encountered this issue, you have two options to work around it:

  • Either move forward and update the remaining nodes in the cluster - note that this may cause further temporary data inaccessibility as depending on how the cluster partitioned, the updated nodes may be joining the cluster partition that does not have the majority of the data accessible and following update it will no longer be able to communicate with the lower version nodes that it was clustered with prior to updating.

OR

  • Roll-back/re-install the previous version of ESXi on the nodes with higher version of CMMDS - before considering rollback option, validate that there is actually the lower build version available by checking the contents of /altbootbank/boot.cfg .

         If this option is chosen then the Disk-Group created on a higher ODF version will need to be removed prior to rollback/re-install so as to not cause possible further issues.