Upgrade prechecks fails with ESXI Cluster precheck errors: vsan disk
search cancel

Upgrade prechecks fails with ESXI Cluster precheck errors: vsan disk

book

Article ID: 369423

calendar_today

Updated On:

Products

VMware SDDC Manager

Issue/Introduction

During SDDC Manager upgrade prechecks It fails with two errors for the ESXI precheck:

  • vsan disk component
  • vsan disk group mode

 

  • In /var/log/vmware/vcf/operationsmanager/operationsmanager.log, the below is seen:

2024-04-04T09:56:05.989+0000 DEBUG [vcf_om,7bc8743d140d8,1b3c] [c.v.v.b.p.updaters.PropertyUpdater,pool-2-thread-4] Executing updater method vsanPhysicalDiskComponentHealth of updater VsanPhysicalDiskHealthUpdater, updaterInfo is {"entityType":"cluster","entityName":"Cluster01","propertyName":"vsanPhysicalDiskComponentHealth","isMandatory":true}
2024-04-04T09:56:05.990+0000 ERROR [vcf_om,7bc8743d160d8,1b3c] [c.v.v.b.p.updaters.PropertyUpdater,pool-2-thread-4] Failed to execute updater method vsanPhysicalDiskComponentHealth on entity CLuster01 of type cluster from vcenter.vmware.com due to an exception {}
java.lang.reflect.InvocationTargetException: null

Caused by: java.lang.IllegalStateException: Failed to find group test with id com.vmware.vsan.health.test.componentmetadata in group with id com.vmware.vsan.health.test.physicaldisks

  • The precheck failed with the "Severity" level as "WARNING"

         {
                      "id": "physical-disk-component-health",
                      "constraintExpression": "vsanPhysicalDiskComponentHealth=='green' or vsanPhysicalDiskComponentHealth=='info'",
                      "description": "Checks whether vSAN has encountered an integrity issue of the metadata of a component on a disk for this cluster",
                      "name": "vSAN disk component",
                      "validationCode": "ClusterPerspectiveResourceConstraintsMessages.PHYSICAL_DISK_COMPONENT_HEALTH",
                      "validationSucceededMessage": "All vSAN components are healthy",
                      "validationFailedMessage": "vSAN has encountered an integrity issue of the metadata of a component on a disk for this cluster",
                      "remediationMessage": "This could be due to faulty drives, faulty controller or a misbehaving device driver, but could also originate from a problem in the vSAN software. The best course of action is to engage VMware Support",
                      "severity": "WARNING"

A Similar exception for disk group mode:

  • In /var/log/vmware/vcf/operationsmanager/operationsmanager.log, the below is seen: 

2024-04-04T09:56:06.029+0000 ERROR [vcf_om,7bc874d40d8,1b3c] [c.v.v.b.p.updaters.PropertyUpdater,pool-2-thread-4] Failed to execute updater method vsanControllerDiskGroupModeVmwareCertifiedStatus on entity CLuster01 of type cluster from vcenter.vmware.com due to an exception {}
java.lang.reflect.InvocationTargetException: null

Caused by: java.lang.IllegalStateException: Failed to find group test with id com.vmware.vsan.health.test.controllerdiskmode in group with id com.vmware.vsan.health.test.hcl

Environment

VCF 5.x 

Cause

  • Since VSAN ESA feature was introduced in SDDC Manager, Operations manager (precheck assessment) logs are looking for hcl.controllerdiskmode and physicaldisks.componentmetadata and since we don't find them in the results, we report them as failures.
  • We do not have controllerdiskmode and componentmetadata health for the cluster because it's an ESA cluster. We have different health checks for OSA and ESA Clusters

Resolution

  • For ESA clusters controllerdiskmode and componentmetadata health checks errors are not valid and can be ignored
  •  If the upgrade fails with same checks we will need to set the following steps
    1. Take a snapshot of the SDDC Manager
    2. Login to SDDC Manager Login using vcf,  and su -
    3. Change into the following directory:
    cd /opt/vmware/vcf/lcm/lcm-app/conf/
    1. Edit the file:
    vi application-prod.properties
    1. Search for the term: vsan
    2. Change the values from true to false
    Example changes:

    vsan.healthcheck.enabled=false
    vsan.hcl.update.enabled=false
    vsan.precheck.enabled=false
    1. Save the file
    2. Restart the LCM service:
    systemctl restart lcm