Failure to promote CMMDs version resulting in vSAN cluster to become partitioned during upgrade
search cancel

Failure to promote CMMDs version resulting in vSAN cluster to become partitioned during upgrade

book

Article ID: 326888

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • During an upgrade of a vSAN cluster one or more nodes become partitioned from the rest of the cluster, forming one or more cluster partitions.

  • No indication of network communication issues (e.g. vmkping between nodes on the vSAN network succeeds).

  • vmkernel.log for the host of the cluster partition(s) messages similar to the following are observed e.g. Where X is the version promoting to and Y is the version promoting from .

  • WARNING: CMMDS: CMMDSPromoteFormatVersion:423: Failed to promote the node to a format version X beyond its software version Y 
  • This issue can also be observed when a new host is added to the vSAN cluster which is on a higher ESXi version than the existing nodes either with a Disk-Group still present or creating new Disk-Groups.
 



Environment

VMware vSAN 6.X
VMware vSAN 7.X
VMware vSAN 8.X

Cause

Recreating or adding Disk-Groups to the cluster which are using an ODF version higher than the rest of the cluster causes CMMDS version on these nodes to be updated, these are then non-compatible with the nodes that have not been upgraded yet (as they are unable to use later versions of CMMDS). 
Removing the higher ODF Disk-Groups will not resolve the issue as this will not revert the CMMDS version in use.
Setting virsto version to legacy format will not resolve the issue as this will not revert the CMMDS version in use.

Resolution

This issue occurs where nodes have incompatible CMMDS versions .

This issue can be avoided by not adding/creating/re-creating Disk-Groups of a higher format until all hosts have been upgraded to the same ESXi build - if Disk-Groups have to be recreated during the upgrade then

we need to temporarily set the virsto to use the same ODF version that all other hosts are having in the cluster, these changes should be reverted once all hosts have been upgraded:

How to format vSAN Disk Groups with a legacy format version

Understanding vSAN on-disk format versions and compatibility

Workaround:

If you've already encountered this issue, you have two options to work around it:

  • Either move forward and update the remaining nodes in the cluster - note that this may cause further temporary data inaccessibility as depending on how the cluster partitioned, the updated nodes may be joining the cluster partition that does not have the majority of the data accessible and following update it will no longer be able to communicate with the lower version nodes that it was clustered with prior to updating.

OR

  • Roll-back/re-install the previous version of ESXi on the nodes with higher version of CMMDS - before considering rollback option, validate that there is actually the lower build version available by checking the contents of /altbootbank/boot.cfg .

         If this option is chosen then the Disk-Group created on a higher ODF version will need to be removed prior to rollback/re-install so as to not cause possible further issues.