How to diagnose failed hard drives in older Security Analytics appliances using drive enclosures
search cancel

How to diagnose failed hard drives in older Security Analytics appliances using drive enclosures

book

Article ID: 168254

calendar_today

Updated On:

Products

Security Analytics

Issue/Introduction

The older r630/r730 Security Analytics (Solera) appliances come in one of two main configurations.  Stand-a-lone appliances and a head unit with one or more external storage enclosures.  The goal of this article is to help you identify if you have a problem, and if so, what sort of information should you gather before contacting Technical Support.

The newer systems use storage arrays. They either use Netapp E5660 or U144/U084, also known as Dell ME4.  These units use a different architecture.  See

https://knowledge.broadcom.com/external/article/238767/update-me4-storage-array-controllers-and.html



Some other symptoms you may experience if there are possible hard drive issues:

  • An email or syslog event indicating a hardware failure
  • A pink banner displayed in the UI indicating a disk failure
  • A predictive failure in the audit log.
  • A loud audible alarm coming from the appliance.
  • An amber or red indicator light on the hard drive.

Resolution

Due to the fact that the Security Analytics appliances have a redundant RAID configuration (RAID 1 and RAID 5), you can often have a drive fail without losing functionality.  If a drive does fail, you may get a syslog or email alert depending on how you have configured the communication logging settings.  If you think you have an error, use the following methods to confirm:

1.  Use the Audit Log

From within the GUI, go to Settings and then Audit Log.  To filter down just the Hardware messages, in the Advanced Filter section, enter "category=Hardware Events" and you should just see the Hardware related issues.  Search for "Failure" or "Degraded" or "Failed" or "Predictive".  If you find these messages, support will need the full output of the error.  Below is an example:

Adapter 0; seqNum: 0x0000603e; Time: Sat Nov 22 03:18:24 2014; Event Description: Predictive failure: PD 06(e0x20/s6); Device ID: 6; Enclosure Index: 32; Slot Number: 6;

2.  Use 'lsi-show' to determine which HDD has failed.

The command 'lsi-show' is useful in that it queries the raid controllers and will provide a detailed listing of all physical and logical units regardless of the current hardware state.  A sample lsi-show output can be found in the "Additional Information" section below.  If everything is healthy, you will see all Logical Drives in an "Optimal" status and all Physical Drives will show as "Online".  If a drive has failed, it will show up in this report.  Be aware that lsi-show will not provide hard drive information for large storage arrays such as the Dell MD3860 or the NetApp E5600 or the Dell Seagate storage arrays.

NOTE:  if you don't have a root access from the CLI, you can SSH as the 'admin' user and use the 'sudo lsi-show' command.  The lsi-show.txt file is also part of the CSR (generated by going to Settings > System -> Download CSR.  This may take 5-15 minutes and be from 100MB to 1GB+ in size.

  • Drives marked Failed or "Unconfigured(bad)" will need to be replaced. 
  • Drives marked Unconfigured(good) have not been allocated to a virtual disk.  They could be spun up (spinning at 7,200RPM) or spun down to save power. 
  • Drives marked "Unconfigured(good), Spun Up    Foreign" have become disconnected and will need to be imported. The virtual disk will become Optimal or degraded.
  • Virtual disks, or Volumes, marked Offline have more than one disk failed and will need to be rebuilt. 
  • Virtual disks marked Degraded are still running but are no longer redundant.  A disk has failed and should be replace as soon as possible.
  • Virtual disks marked Optimal are redundant and if one disk fails, they will continue function.

lsi-show includes the following -

  • Controller number in the Controller Information paragraph, under the label Id. 
    • Id a0 is for the adapter 0 which manages the drives internal to the sensor. adapters a1 and a2 are for the drives installed in JBOD enclosures external to the sensor.
  • Controller type under Product Name. 
    • Controllers with the label PERC are Dell controllers.  All others are BlueCoat controllers in S500 servers.  There are typically two controllers/adapters for external drives and only one for internal drives.
  • Enclosure Information includes the enclosure Id drive
    • 32 in this case, and the drive slot numbers. There may be 0 enclosures up to 3 attached to each adapter.
  • Logical Drive Information includes the details for each logical/virtual disk. 
    • They are numbered sequentially, include size, status and other information.  A status of Optimal is the only healthy status.
  • Physical Drive Information includes the enclosure id, drive slot, drive status, size, and other manufacturer information.

For example -

Controller Information
Id   Serial No        Firmware             Encl LD   PD   Product Name
a0   SERIAL_NUMBER    4.270.00-8168        1    2    26   PERC H730P Mini

Enclosure Information
Id   Drives
32   0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25 

Logical Drive Information
Id   Size         Status       Stripe   Raid Level   Cache
0    5.456 TB     Optimal      64 KB    Primary-5    WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
  by id: 22,23,24,25
  by es: 32:22,32:23,32:24,32:25
1    34.557 TB    Optimal      64 KB    Primary-5    WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
  by id: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
  by es: 32:0,32:1,32:2,32:3,32:4,32:5,32:6,32:7,32:8,32:9,32:10,32:11,32:12,32:13,32:14,32:15,32:16,32:17,32:18,32:19

Physical Drive Information
Id   E:S      Status                         FC       Size           Inquiry
0    32:0     Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
1    32:1     Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
2    32:2     Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
3    32:3     Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
4    32:4     Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
5    32:5     Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
6    32:6     Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
7    32:7     Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
8    32:8     Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
9    32:9     Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
10   32:10    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
11   32:11    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
12   32:12    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
13   32:13    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
14   32:14    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
15   32:15    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
16   32:16    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
17   32:17    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
18   32:18    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
19   32:19    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
20   32:20    Unconfigured(good), Spun Up    None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
21   32:21    Unconfigured(good), Spun Up    None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
22   32:22    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
23   32:23    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
24   32:24    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER
25   32:25    Online, Spun Up                None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER

Controller Information
Id   Serial No        Firmware             Encl LD   PD   Product Name
a1   SERIAL_NUMBER    4.270.00-8168        2    2    24   PERC H830 Adapter

Enclosure Information
Id   Drives
0    1   4   5   2   7   9   8   11  10  12  3   6  
13   17  18  16  15  23  21  20  19  24  25  14  22 

Logical Drive Information
Id   Size         Status       Stripe   Raid Level   Cache
0    40.019 TB    Optimal      64 KB    Primary-5    WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
  by id: 1,4,5,2,7,9,8,11,10,12,3,6
  by es: 0:1,0:2,0:3,0:4,0:5,0:6,0:7,0:8,0:9,0:10,0:11,0:12
1    40.019 TB    Optimal      64 KB    Primary-5    WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
  by id: 17,18,16,15,23,21,20,19,24,25,14,22
  by es: 13:1,13:2,13:3,13:4,13:5,13:6,13:7,13:8,13:9,13:10,13:11,13:12

Physical Drive Information
Id   E:S      Status                         FC       Size           Inquiry
1    0:1      Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
4    0:2      Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
5    0:3      Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
2    0:4      Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
7    0:5      Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
9    0:6      Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
8    0:7      Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
11   0:8      Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
10   0:9      Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
12   0:10     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
3    0:11     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
6    0:12     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
17   13:1     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
18   13:2     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
16   13:3     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
15   13:4     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
23   13:5     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
21   13:6     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
20   13:7     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
19   13:8     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
24   13:9     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
25   13:10    Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
14   13:11    Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
22   13:12    Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER

Controller Information
Id   Serial No        Firmware             Encl LD   PD   Product Name
a2   SERIAL_NUMBER    4.270.00-8168        2    2    24   PERC H830 Adapter

Enclosure Information
Id   Drives
0    1   2   4   5   7   8   9   10  11  12  3   6  
13   15  14  17  18  23  22  16  19  24  20  21  25 

Logical Drive Information
Id   Size         Status       Stripe   Raid Level   Cache
0    40.019 TB    Offline      64 KB    Primary-5    WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
  by id: 1,2,4,5,7,8,9,10,11,12,3,6
  by es: 0:1,0:2,0:3,0:4,0:5,0:6,0:7,0:8,0:9,0:10,0:11,0:12
1    40.019 TB    Optimal      64 KB    Primary-5    WriteThrough, ReadAheadNone, Direct, Write Cache OK if Bad BBU
  by id: 15,14,17,18,23,22,16,19,24,20,21,25
  by es: 13:1,13:2,13:3,13:4,13:5,13:6,13:7,13:8,13:9,13:10,13:11,13:12

Physical Drive Information
Id   E:S      Status                         FC       Size           Inquiry
1    0:1      Failed                         None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
2    0:2      Failed                         None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
4    0:3      Failed                         None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
5    0:4      Failed                         None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
7    0:5      Failed                         None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
8    0:6      Failed                         None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
9    0:7      Failed                         None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
10   0:8      Failed                         None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
11   0:9      Failed                         None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
12   0:10     Failed                         None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
3    0:11     Failed                         None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
6    0:12     Failed                         None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
15   13:1     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
14   13:2     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
17   13:3     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
18   13:4     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
23   13:5     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
22   13:6     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
16   13:7     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
19   13:8     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
24   13:9     Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
20   13:10    Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
21   13:11    Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER
25   13:12    Online, Spun Up                None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER

 

A few example of drive status:

The drive is not being used and is not spinning.

37   35:2     Unconfigured(good), Spun down  None     3.638 TB       HGST    HUS726T4TAL5205 SERIAL_NUMBER

 

The drive is not being used and is spinning -

20   32:20    Unconfigured(good), Spun Up    None     1.819 TB       SEAGATE ST2000NX0453    SERIAL_NUMBER

The virtual disk has a failed component -

0    931.0 GB     Degraded     64 KB    Primary-1    WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU

One virtual disk has failed and is not functional.  It likely has two drives which have failed.

2    4.545 TB     Offline      64 KB    Primary-5    WriteBack, ReadAdaptive, Direct, Write Cache OK if Bad BBU

This drive is still recognized but has failed -

1    0:1      Failed                         None     3.638 TB       HGST    HUS726040AL5215 SERIAL_NUMBER

The drive has been detached and re-attached.  All drives on this controller will need to be imported by controller number.  For example, megacli -cfgforeign -import -a1

74   77:0     Unconfigured(good), Spun Up    Foreign  3.638 TB       TOSHIBA MG04SCA40EN     SERIAL_NUMBER

 

Be prepared with the lsi-show output and any pertinent audit log messages before calling technical support to submit an RMA for a replacement drive.  If you have Dell hardware, we will also need the Dell Service Tag, which can be located on the physical appliance itself or by running the following command from the CLI as the root user:   dmidecode -s system-serial-number.

If the failed drive happens to be in one of the Dell external attached storage devices, we will need the Service Tag from the storage enclosure.   The serial number for all enclosures can be obtained with:

megacli -encinfo -aall | egrep "adapter|Device ID|Physical|Vendor Specific"




When opening a new case, be sure to include the shipping address with contact name and phone number. Also, we will need to know if you will replace the drive yourselves or do you want a field tech to do the work.