VOBs for vSAN SSD endurance alarm introduced in vSphere 8.0U2
book
Article ID: 326721
calendar_today
Updated On:
Products
VMware vSAN
Issue/Introduction
This article provides documentation on the vSAN SSD endurance alarms introduced in vSphere 8.0U2. These alarms are raised when an NVMe disk in a vSAN ESA cluster approaches the end of its endurance.
vCenter/ESXi running version 8.0U2 or higher
The following vSphere error and warning events are observed on an ESXi host when an NVMe disk in vSAN ESA cluster is almost running out of its endurance.
Event ID
VOB message
Category
Purpose
Release
esx.problem.vsan.health.ssd.endurance.error
One of the disks exceeds 90% of its estimated endurance threshold.
Warning
Any NVMe disk in vSAN ESA will trigger this event when it exceeds 90% of its estimated endurance threshold.
vSphere 8.0 U2
esx.problem.vsan.health.ssd.endurance.warning
One of the disks exceeds the estimated endurance threshold.
Critical
Any NVMe disk in vSAN ESA will trigger this event when it exceeds 100% of its estimated endurance threshold.
vSphere 8.0 U2
esx.problem.vsan.health.ssd.endurance
One or more disks exceed its/their warning usage of estimated endurance threshold.
Info
Users can customize endurance thresholds for vCenter clusters, hosts, and disks
vSphere 8.0 U3
Environment
VMware vSAN 8.0.x
Cause
In the vSAN ESA cluster, vSphere performs NVMe disk endurance scans and checks every 12 hours. If any NVMe disk reaches or exceeds its endurance threshold, it triggers a Critical event, and if it reaches or exceeds 90% of its endurance threshold, it triggers a Warning event.
Resolution
If a spare NVMe is available claim a new NVMe disk to the storage pool of the vSAN ESA cluster and evacuate an old one from the storage pool safely.
If there are no spare NVMes engage the hardware vendor to get a replacement NVMe so the failing NVMe can be replaced.
How to Configure alarm with customized endurance threshold
Pick up alarm names "vSAN Health Alarm for disk endurance check" from vCenter → Configuration → Alarm Definition
Edit Alarm and navigate to Alarm Rule, we support different alarm level configurations - Cluster Name, Host Name, Disk Name, and Disk Vendor Name
Batch configuration is supported with "starts with" and "ends with" operation