Interpreting the smartctl command logged errors

book

Article ID: 168773

calendar_today

Updated On:

Products

XOS

Issue/Introduction

When troubleshooting potential hard drive failure on a CPM or APM  the smartctl command might be useful to identify logged errors. The following errors can be observed.

Error 1 occurred at disk power-on lifetime: 1 hours (0 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 b0 47 fe 63

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 00 b1 46 fe 40 00      01:08:03.352  WRITE FPDMA QUEUED
  61 80 40 31 46 fe 40 00      01:08:03.335  WRITE FPDMA QUEUED
  61 00 38 b1 3c fe 40 00      01:08:03.335  WRITE FPDMA QUEUED
  61 00 30 31 40 fe 40 00      01:08:03.335  WRITE FPDMA QUEUED
  61 80 28 31 44 fe 40 00      01:08:03.335  WRITE FPDMA QUEUED

Resolution

To make sure that the errors observed in the output of the smartctl command correspond with a current hard drive failure, you should take the following factors into consideration. 

1.  Are the errors logged by smartctl command current? The smartctl command can display errors logged only several hours since a power-on lifetime. Such errors are usually not relevant and they don't reflect any hard drive failure:
Error 1 occurred at disk power-on lifetime: 1 hours (0 days + 1 hours)

Compare the total power-on life time with the time stamp when the error was logged. In this example the total power-on lifetime of the hard-drive is 11601 hours while the error was logged after 1 hour of the total life time. Thus, this error can be considered as irrelevant. 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     11601         -


2.  Check to see if any errors logged in /var/log/messages correlate with the time stamp when the smartcl error was logged.