Multiple Skyline Health errors after upgrading a vSAN host from vSphere 7.x to 8.x
search cancel

Multiple Skyline Health errors after upgrading a vSAN host from vSphere 7.x to 8.x

book

Article ID: 428873

calendar_today

Updated On:

Products

VMware vSAN VMware vSAN 8.x

Issue/Introduction

  • After upgrading one of your ESXi hosts in a vSAN cluster from vSphere 7.x to 8.x, you see errors in Skyline Health similar to:

    • High CPU/Memory Usage
    • vSAN data alarm 'vSAN object'
    • vSAN Build Recommendation alarm 'vSAN build Recommendation engine'
    • vSAN Performance Service alarm 'Stats DB object conflicts'
    • vSAN cluster alarm 'Disk format version'

Environment

  • VMware vSAN 8.x

Cause

  • These are commonly seen and expected errors. They should be temporary while the cluster upgrade is taking place:

    • High CPU/Memory Usage:  This is due to one host being a temporarily overloaded in terms of CPU usage, since VM's are migrated moved around as you evacuate the hosts you are upgrading.
    • vSAN data alarm 'vSAN object':  This is due to one host being in maintenance mode using "ensure accessibility" and there are not enough remaining hosts to satisfy the VMs' policies.This may persist temporarily after exiting the host from maintenance mode as the resync operations are taking place.
    • vSAN Build Recommendation alarm 'vSAN build Recommendation engine':  The primary cause is due to older environments still using the older VMware Update Manager (VUM), rather than vLCM (vSphere Lifecycle Manager (vLCM). See KBE
    • vSAN Performance Service alarm 'Stats DB object conflicts':  This is a known issue during upgrades. 
    • vSAN cluster alarm 'Disk format version':   This is expected in more significant updates where the vSAN on-disk format version is updated.

Resolution

  1. Depending on available CPU resources, you may need to need to accept the High CPU/memory utilization until all upgrades are complete. If needed, you can attempt to manually vMotion some VM's to other hosts, to try to balance it out.

  2. Complete the upgrade of the vCenter and remaining hosts in the cluster.

  3. Once the upgrades are complete, please ensure that you have removed all hosts from maintenance mode, then allow any resync operations that are triggered to complete.

  4. If you do encounter the "Performance Service - Stats DB object conflicts," please see KB for information on how to resolve that: vSAN -- vSAN Health Service - Performance Service - Stats DB object conflicts

  5. Once all resyncs have been completed, verify that Object Health errors are resolved and that the alarm in Skyline Health has been cleared. See related KB: vSAN Health Service - Data Health – vSAN Object Health

  6. For the vSAN Build Recommendation Engine, see related KB:  vSAN Build Recommendation Engine errors caused by VMware web service migration

  7. After all of the other issues have been resolved, you may proceed with the on-disk format change. See KB: vSAN -- Health Service - Cluster - Disk format version

  8. If you have difficulty resolving the errors--especially if you have concerns with the object states, or the objects are in an unexpected state, please open a case with Broadcom support. See KB: Creating and managing Broadcom support cases

Additional Information

  • Some things to keep in mind: 
    • You should upgrade the vCenter first, before upgrading the hosts.

    • DRS should keep the CPU and memory balanced as your hosts enter/exit maintenance mode, if it is enabled. If you see the high CPU/memory utilization alarms anyway, with DRS enabled, the remaining hosts in the cluster likely do not have sufficient resource available to keep the alarm(s) from triggering.

    • Depending on the storage policy and the number of hosts in the cluster, you may only be able to place one host into maintenance mode at a time.