In ESXi 5.1, VMware added S.M.A.R.T. functionality to monitor hard drive health. The S.M.A.R.T. feature records various operation parameters from physical hard drives attached to a local controller. The feature is part of the firmware on the circuit board of a physical hard disk (HDD and SSD).
To read the current data from a disk:
- Open a console or SSH session to the ESXi host. For more information, see Using ESXi Shell in ESXi 5.x (2004746).
- Determine the device parameter to use by running the command:
# esxcli storage core device list
- The expected output is a list with all SCSI devices seen by the ESXi host. For example:
t10.ATA_____WDC_WD2502ABYS2D18B7A0________________________WD2DWCAT1H751520
- Read the data from the device where
device
is a value found in step 3:
# esxcli storage core device smart get -d device
Note: External FC/iSCSI LUNs or virtual disks from a RAID controller might not report a S.M.A.R.T. status.
This table breaks down some example output:
Parameter | Value | Threshold | Worst |
Health Status | OK | N/A | N/A |
Media Wearout Indicator | 0 | 0 | 0 |
Write Error Count | N/A | N/A | N/A |
Read Error Count | 118 | 50 | 118 |
Power-on Hours | 0 | 0 | 0 |
Power Cycle Count | 100 | 0 | 100 |
Reallocated Sector Count | 100 | 3 | 100 |
Raw Read Error Rate | 118 | 50 | 118 |
Drive Temperature | 27 | 0 | 34 |
Driver Rated Max Temperature | N/A | N/A | N/A |
Write Sectors TOT Count | N/A | N/A | N/A |
Read Sectors TOT Count | N/A | N/A | N/A |
Initial Bad Block Count | N/A | N/A | N/A |
Note: A physical hard drive can have up to 30 different attributes (the example above supports only 13). For more information, see
How does S.M.A.R.T. function of hard disks Work?Note: The preceding link was correct as of September 2, 2014. If you find the link is broken, provide feedback and a VMware employee will update the link.A raw value can have two possible results:
- A number between 0-253
- A word (for example, N/A or OK)
Column descriptions
Note: The values returned and their meaning for each of these columns can vary by manufacturer. For more information, please consult your hardware supplier.
- Parameter
This is a translation from the attribute ID to human-readable text. For example:
hex 0xE7 = decimal 231 = "Drive Temperature"
For more information, see the Known ATA S.M.A.R.T. attributes section of the S.M.A.R.T. Wikipedia article.
Note: The preceding link was correct as of September 2, 2014. If you find the link is broken, provide feedback and a VMware employee will update the link.
- Value
This is the raw value reported by the disk. To illustrate a simple Value using the example above, the Drive Temperature is reported as 27
, which means 27 degrees Celsius.
A Value can either be a number (0-253) or a word (for example, N/A
or OK
).
- Threshold
The (failure) limit for the attribute.
- Worst
The highest Value ever recorded for the parameter.
smartd daemon
ESXi 5.1 also has the
/sbin/smartd
daemon in the DCUI installed. This tool does not have any command line switches or interaction with the console. If you run the command in the shell, a S.M.A.R.T. status is reported in the
/var/log/syslog.log
file.
For example:
XXXX-XX-28T10:15:12Z smartd: [warn] t10.ATA_____SanDisk_SDSSDX120GG25___________________120506403552________: below MEDIA WEAROUT threshold (0)
XXXX-XX-28T10:15:12Z smartd: [warn] t10.ATA_____SanDisk_SDSSDX120GG25___________________120506403552________: above TEMPERATURE threshold (27 > 0)
XXXX-XX-28T10:15:12Z smartd: [warn] t10.ATA_____WDC_WD2502ABYS2D18B7A0________________________WD2DWCAT1H751520: above TEMPERATURE threshold (113 > 0)
Notes:
- You can stop the daemon by typing Ctrl+c.
- Logged events should be viewed with caution. As can be seen in the example, all three warnings are irrelevant. The output can vary greatly between manufacturers and disk models.