Volumes will not remain on preferred controller even after replacement

Products

Security Analytics

Issue/Introduction

The Security Analytics system UI repeatedly shows an error for failed disks. This is due to it not maintaining a connection on the preferred path. The error can be temporarily resolved with "SMcli -n array2 -c 'reset storageArray volumeDistribution;'", but it will fail over within a few hours again and not be on preferred path.

The lithium batteries may have reached a state where they need a deep discharge and recharge cycle performed. If the batteries are unusable, the cache cannot be maintained in the event of a power failure. For performance reasons, the controller with the failed battery will not support I/O and the volumes will be served through the alternate controller.

Environment

Netapp E5660 storage arrays

Resolution

Netapp has recommended a procedure to refresh the batteries in both controllers on the storage array. There should be no impact to the data I/O and can be done on a live system.

Remove -> This is all done on Santricity, so no need to login as root. Step one is to redistribute the volumes, and then run the battery relearn commands. You will want to collect logs from before and after to verify the state of the battery

Procedure

Start the scheduled battery learn cycle via Santricity's "CLI(Command Line Interface):"

Open SANtricity
In the Enterprise Management window, right click the Array
Paste and run this command:

set storageArray learnCycleDate daysToNextLearnCycle=0 time=HH:MM;

SMcli -n array0 -c 'set storageArray learnCycleDate daysToNextLearnCycle=0 time=HH:MM;'

(Where HH:MM is the current time plus 2 minutes, in 24 hour format)

Select Tools at the top of the script editor window
Select Verify and Execute
Wait 2 minutes, then verify the learn cycle has started by viewing event in the Major Event Log. It will display the following message - Learn Cycle for battery started

Once this is done, the controllers needed to be rebooted -

Steps
1. Select Hardware.
2. If the graphic shows the drives, click Show back of shelf.
  
  The graphic changes to show the controllers instead of the drives.
3. Click the controller that you want to reset. The controller’s context menu appears.
4. Select Reset, and confirm that you want to perform the operation.
5. SMcli -n array0 -c 'reset controller [a]
6. Re-distribute the volumes and right away run the reboot from controller unused -
  1. In the Enterprise Management window, right click the Array
  2. Select Execute Script
  3. Type or copy & paste the command:
  4. reset storageArray volumeDistribution;

-> SMcli -n array0 -c 'reset storageArray volumeDistribution;'

Once completed, wait around 10 minutes, and run reboot on controller "B" by running steps 1-5 above but changing a to be in step 4.

Volumes should remain on the preferred controller. Collect new logs and attach them to the case to allow support to investigate and verify.

Additional Information

Instructions from Netapp

Running a learning cycle over batteries:

https://kb.netapp.com/onprem/E-Series/Management_Apps/How_to_manually_start_a_battery_learn_cycle_using_SANtricity

Run a stagger reboot on each controller:

https://docs.netapp.com/us-en/e-series-santricity-115/sm-hardware/reset-reboot-controller.html
Re-distribute the volumes and right away run the reboot from controller unused.
Once completed, wait 5 minutes, and run the other reboot.
Volumes should remain on the preferred controller
Collect new logs.