Failure to promote CMMDs version resulting in vSAN cluster to become partitioned during upgrade
search cancel

Failure to promote CMMDs version resulting in vSAN cluster to become partitioned during upgrade

book

Article ID: 326888

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:
  • During an upgrade of a vSAN cluster one or more nodes become partitioned from the rest of the cluster, forming one or more new cluster partitions.
  • No indication of network communication issues (e.g. vmkping between nodes on the vSAN network succeeds).
  • In vmkernel.log for the host of the new cluster partition(s) messages similar to the following are observed:

    WARNING: CMMDS: CMMDSPromoteFormatVersion:423: Failed to promote the node to a format version X beyond its software version Y
    Where X is the version promoting to and Y is the version promoting from    
        
    This issue can be observed when upgrading a vSAN cluster to any version higher than the version upgrading from where the on-disk format version would require an upgrade as well. For example:
    6.5U3 to 6.7U2
    6.7U2 to 7.0U2
    6.5U3 to 7.0U1


Environment

VMware vSAN 7.0.x
VMware vSAN 6.7.x
VMware vSAN 8.0.x

Cause

Recreating or adding Disk-Groups to the cluster which are using an on-disk format version higher than the rest of the cluster causes CMMDS version on these nodes to be updated, these are then non-compatible with the nodes that have not been upgraded yet (as they are unable to use later versions of CMMDS). This can occur when recreation of a higher on-disk format Disk-Group fails.

Removing the higher version Disk-Groups will not resolve the issue as this will not revert the CMMDS version in use.

Setting virsto version to legacy format will not resolve the issue as this will not revert the CMMDS version in use.

Resolution

There is no resolution for this issue and thus this issue should be avoided or workarounds implemented.

This issue can be avoided by not adding/creating/re-creating Disk-Groups of a higher format until all hosts have been upgraded to the same ESXi build - if Disk-Groups need to be recreated mid-upgrade then temporarily setting virsto to use the highest version that all hosts are compatible with is a valid option, these changes should be reverted once all hosts have been upgraded:

How to format vSAN Disk Groups with a legacy format version (2146221)
Understanding vSAN on-disk format versions and compatibility (2145267)

Workaround:
If you've already encountered this issue, you have two options to work around it:
  • Either move forward and update the remaining nodes in the cluster - note that this may cause further data inaccessibility as depending on how the cluster partitioned, the updated nodes may be joining the cluster partition that does not have the majority of the data accessible and following update it will no longer be able to communicate with the lower version nodes that it was clustered with prior to updating.
OR
  • Roll-back/re-install the previous version of ESXi on the nodes with higher version of CMMDS - before considering rollback option, validate that there is actually the lower build version available by checking the contents of /altbootbank/boot.cfg .
If this option is chosen then the Disk-Group created on a higher version will need to be removed prior to rollback/re-install so as to not cause possible further issues.