VOBs for vSAN SSD endurance alarm introduced in vSphere 8.0U2
search cancel

VOBs for vSAN SSD endurance alarm introduced in vSphere 8.0U2

book

Article ID: 326721

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article provides documentation on the vSAN SSD endurance alarms introduced in vSphere 8.0U2. These alarms are raised when an NVMe disk in a vSAN ESA cluster approaches the end of its endurance.

vCenter/ESXi running version 8.0U2 or higher
The following vSphere error and warning events are observed on an ESXi host when an NVMe disk in vSAN ESA cluster is almost running out of its endurance.
Event ID VOB message Category Purpose Release
esx.problem.vsan.health.ssd.endurance.error One of the disks exceeds 90% of its estimated endurance threshold. Warning Any NVMe disk in vSAN ESA will trigger this event when it exceeds 90% of its estimated endurance threshold. vSphere 8.0 U2
esx.problem.vsan.health.ssd.endurance.warning One of the disks exceeds the estimated endurance threshold. Critical Any NVMe disk in vSAN ESA will trigger this event when it exceeds 100% of its estimated endurance threshold. vSphere 8.0 U2
esx.problem.vsan.health.ssd.endurance One or more disks exceed its/their warning usage of estimated endurance threshold. Info Users can customize endurance thresholds for vCenter clusters, hosts, and disks vSphere 8.0 U3



Environment

VMware vSAN 8.0.x

Cause

In the vSAN ESA cluster, vSphere performs NVMe disk endurance scans and checks every 12 hours. If any NVMe disk reaches or exceeds its endurance threshold, it triggers a Critical event, and if it reaches or exceeds 90% of its endurance threshold, it triggers a Warning event.

Resolution

  • If a spare NVMe is available claim a new NVMe disk to the storage pool of the vSAN ESA cluster and evacuate an old one from the storage pool safely.
  • If there are no spare NVMes engage the hardware vendor to get a replacement NVMe so the failing NVMe can be replaced.

How to Configure alarm with customized endurance threshold

  1. Pick up alarm names "vSAN Health Alarm for disk endurance check" from vCenter → Configuration → Alarm Definition

  2. Edit Alarm and navigate to Alarm Rule, we support different alarm level configurations - Cluster Name, Host Name, Disk Name, and Disk Vendor Name

  3. Batch configuration is supported with "starts with" and "ends with" operation

  4. Check the Alarm definition in the summary view