Self-Monitoring Analysis and Reporting Technology or SMART, is technology that was implemented by hard disk drive (HDD) manufacturers in the 1990's.
SMART is intended to recognize conditions that indicate imminent drive failure and is designed to provide sufficient warning to allow a backup of the data before an actual failure occurs.
SMART is built into HDDs and acts by constantly monitoring many internal sensors to assess its own health.
Attributes such as temperature and number of reallocated sectors are monitored against threshold limits.
If one or more of these attributes exceeds the manufacturer predefined threshold, SMART will be triggered generating a warning that is then reported by SGOS.
There are two types of HDD failures - predictable and non-predictable.
- Predictable failures are ones that occur slowly overtime, with say a gradual increase in error rates or decline in performance as the drive wears.
- Non-predictable failures occur suddenly with no warnings, for example a chip on the integrated HDD controller board failing.
NOTE: The drive's firmware monitors specific attributes for degradation over time, but it can't predict an instantaneous failure.
Staring with SGOS 6.x, if SMART is triggered on a HDD, the operating system will report the following warning:
This warning is representative of a predictable failure and indicates imminent drive failure.
Depending on which attribute caused SMART to trip, the drive may continue to function normally, or it may fail in a short period of time.
Unfortunately there is no way to accurately predict the time before the drive completely fails.
In this case please contact Symantec and initiate the RMA process for a replacement HDD.
NOTE: The warning status:offline (failing) is the same as status:present (failing) just that the drive has been manually taken offline using the CLI command:
ProxySG# disk offline <disk number>
The following example shows when SMART has triggered and the warning "status:present (failing)" is being reported by SGOS.
1. Management Console
Number of physical CPUs: 1
Number of cores: 2
CPU frequency: 2600 MHz
Storage: 4 drives
Disk in slot 1: 500 GB SEAGATE ST3500620SS , rev:0003 serial:9QM97YSW status:present
Disk in slot 2: 500 GB SEAGATE ST3500620SS , rev:0003 serial:9QM97Y6L status:present
Disk in slot 3: 500 GB SEAGATE ST3500620SS , rev:0003 serial:9QM97Y6P status:present
Disk in slot 4: 500 GB SEAGATE ST3500620SS , rev:0003 serial:9QMBNMVF status:present (failing)
Disk in slot 5: empty
3. Event log
2013-07-26 23:30:58-00:00UTC "Health Monitor (WARNING): Disk 4 Status is 'present (failing)'"
NOTE: The messages that SGOS reports regarding SMART triggers are not persistent across reboots.
Consider the case if SMART were triggered by the average temperature of the HDD being too high and SGOS reported the warning "status: present (failing)".
This problem could be corrected by checking such things as the surrounding environment, that there is adequate ventilation and/or are that all the Proxy's fans are fully functioning.
After a reboot of the Proxy when SGOS polls the HDD it would no longer receive the SMART trigger to report, as the temperature would now be within an acceptable range.
Please be aware that it is not possible to get additional details regarding which specific attribute/s caused the SMART trip from within SGOS at this time.