Enabling vSAN alerts for NVMe SMART data in vCenter
search cancel

Enabling vSAN alerts for NVMe SMART data in vCenter

book

Article ID: 434866

calendar_today

Updated On:

Products

VMware vSAN VMware vCenter Server

Issue/Introduction

This article assists users to create alerting in vCenter based on SMART codes for their NVMe devices and take proactive steps where possible to address drive health concerns before potential drive failure. 

VMware vSAN introduced a number of new features, including the ability to alarm on SMART data provided by hard drive vendors in relation to the overall 'health' of a disk. 

The following five VOB alerts have been created to report statistics from SMART data for NVMe devices: 

VOB AlertDescription
vob.vsan.lsom.temperaturenvmediskhealthcriticalwarningReports an NVMe disk's available spare capacity is low when below critical threshold.
vob.vsan.lsom.temperaturenvmediskhealthcriticalwarningReports when an NVMe disk temperature is beyond threshold.
vob.vsan.lsom.reliabilitynvmediskhealthcriticalwarningReports when an NVMe disk has become unreliable.
vob.vsan.lsom.readonlynvmediskhealthcriticalwarningReports when an NVMe disk has become read-only.
vob.vsan.lsom.backupfailednvmediskhealthcriticalwarningReports when an NVMe disk's volatile memory backup device has failed (if present).

 

VMware by Broadcom highly recommends configuring these alerts in vCenter to be notified of these NVMe SMART codes due to potential hardware failure.  As of this writing, vSAN does not take any proactive measures when these errors occur as SMART data does not adhere to any industry standard, and may vary between hardware vendors.

Environment

VMware vCenter Server 8.0U3 and higher
VMware vSAN (OSA & ESA) 8.0U3 and higher
NVMe devices 

Resolution

Steps for creating these custom alarms:

In this example we're going to use "NVMe critical health warning for disk. The disk's backup device has failed"

  1. In vCenter, navigate to the "Alarm Definitions", by selecting your vCenter object > Configure > Alarm Definitions > Add.


  2. Set Alarm Name to [Action Required]<copy/paste the VOB Alert from the table above>, Description <Copy/paste the corresponding description from the table above> and set Target Type to "Hosts" and then click Next


  3. Type NVMe in the IF field via Alarm Rule 1 to list all the NVME options, the five to look for are in the red box


  4. Select "NVMe critical health warning for disk. The disk's backup device has failed", set "Trigger the alarm and" to "Show as Critical", click the toggle switch for "Send email notifications" and fill in the email address of the administrators to be notified then click Next


  5. Leave the reset rule set to default, which is off and click Next


  6. Review the information, ensure "Enable this alarm" is toggled on and then click "Create"


  7. Repeat for the other 4 NVMe alerts

  8. If any of these alerts trigger engage your hardware vendor for further investigation of the failures and possible drive replacement.

Additional Information

Create or Edit Alarms

vSAN NVMe disk report read only critical warning

vSAN (OSA/ESA) does not alert or fail out NVMe disk for certain NVMe SMART errors.