Error: "Controller encountered a fatal error and was reset"
search cancel

Error: "Controller encountered a fatal error and was reset"

book

Article ID: 171916

calendar_today

Updated On:

Products

Security Analytics

Issue/Introduction

You may see the following error in the messages log after running this command:   grep reset /var/log/messages | grep fatal 

Jun  6 04:00:01 localhost disk_subsystem[19706]: snlog: sn="192.0.2.1" id="DS" m="23" c="6" event="DISK_STATUS" category="HARDWARE" ip="192.0.2.1" model="R620" msg="Adapter 0; seqNum: 0x0005be9b; Time: Sat Jan  1 00:00:02 2000; Event Description: Controller encountered a fatal error and was reset; "

Cause

RAID Firmware is not up to date

Resolution

Run:    megacli -fwtermlog -dsply -a0 -nolog  > fwterm.out

and attach the fwterm.out file to the case.

Also captured the logs from the RAID controller by running as root:  megacli -fwtermlog -dsply -a0 -nolog 

fwterm logs showed:

T1: EVT#376474-T1:   1=Firmware version 3.130.05-2086
01/01/00  0:00:02: EVT#376475-01/01/00  0:00:02: 345=Controller encountered a fatal error and was reset
01/01/00  0:00:02: Initializing the Temperature Monitor

Dell recommends that you upgrade the firmware for the internal H710 RAID controller.  It will require 5-10 minutes to install the update and require a 10 minute reboot.

Download the PERC RAID Controller firmware for your specific system on Dell's Support site.

Download the Red Hat Linux Update Package (.BIN) file to /home on the sensor.

To update the firmware do the following:

  • Login as root.
  • Shutdown Security Analytics using:  scotus stop
  • To install the firmware:
    • chmod 755 /home/SAS-RAID_Firmware_XXXXX_LN_VERSION_AXX.BIN
  • Update the firmware:
    • /home/SAS-RAID_Firmware_XXXXX_LN_VERSION_AXX.BIN

This will take a few minutes, There will be verification that the firmware is the correct firmware for the hardware and lots of license details.  Once it is done, it will reboot the system.  

SA should come online after the reboot.