VOBs for vSAN SSD endurance alarm introduced in vSphere 8.0U2

search cancel

VOBs for vSAN SSD endurance alarm introduced in vSphere 8.0U2

book

Article ID: 326721

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article provides documentation on the vSAN SSD endurance alarms introduced in vSphere 8.0U2. These alarms are raised when an NVMe disk in a vSAN ESA cluster approaches the end of its endurance.

vCenter/ESXi running version 8.0U2 or higher

The following vSphere error and warning events are observed on an ESXi host when an NVMe disk in vSAN ESA cluster is almost running out of its endurance.

Event ID	VOB message	Category	Purpose	Release
esx.problem.vsan.health.ssd.endurance.error	One of the disks exceeds 90% of its estimated endurance threshold.	Warning	Any NVMe disk in vSAN ESA will trigger this event when it exceeds 90% of its estimated endurance threshold.	vSphere 8.0 U2
esx.problem.vsan.health.ssd.endurance.warning	One of the disks exceeds the estimated endurance threshold.	Critical	Any NVMe disk in vSAN ESA will trigger this event when it exceeds 100% of its estimated endurance threshold.	vSphere 8.0 U2
esx.problem.vsan.health.ssd.endurance	One or more disks exceed its/their warning usage of estimated endurance threshold.	Info	Users can customize endurance thresholds for vCenter clusters, hosts, and disks	vSphere 8.0 U3

Environment

VMware vSAN 8.0.x

Cause

In the vSAN ESA cluster, vSphere performs NVMe disk endurance scans and checks every 12 hours. If any NVMe disk reaches or exceeds its endurance threshold, it triggers a Critical event, and if it reaches or exceeds 90% of its endurance threshold, it triggers a Warning event.

Resolution

If a spare NVMe is available claim a new NVMe disk to the storage pool of the vSAN ESA cluster and evacuate an old one from the storage pool safely.
If there are no spare NVMes engage the hardware vendor to get a replacement NVMe so the failing NVMe can be replaced.

How to Configure alarm with customized endurance threshold

Pick up alarm names "vSAN Health Alarm for disk endurance check" from vCenter → Configuration → Alarm Definition
Edit Alarm and navigate to Alarm Rule, we support different alarm level configurations - Cluster Name, Host Name, Disk Name, and Disk Vendor Name
Batch configuration is supported with "starts with" and "ends with" operation
Check the Alarm definition in the summary view

Feedback

thumb_up Yes

thumb_down No