Malware Analysis Appliance (MAA) unresponsive

book

Article ID: 170144

calendar_today

Updated On:

Products

Malware Analysis Software - MA

Issue/Introduction

No access to the Malware Analysis (MA) appliance;

  • No logs shown from console connection
  • Appliance not pingable/reachable remotely
  • No access from SSH and https (WebUI)
  • The only way to recover is by performing power cycle/cold reboot by physically turning off and and turning back on the appliance

Upon recovery, few things we noticed from the collected logs after the power cycle:

1. No logs were captured from when the inaccessible to the appliance reported;

Sep  6 20:39:01 mag2 CRON[29349]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete)
Sep  6 20:40:01 mag2 CRON[29454]: (root) CMD (/opt/fpm/bin/i2c_fpd.sh)
---no logs--

2. Once power cycle, syslog starting to capture logs and can then see some of the processes restarted;

Sep  7 12:17:11 mag2 mdadm[2219]: DeviceDisappeared event detected on md device /dev/md/0
Sep  7 12:17:11 mag2 mdadm[2219]: NewArray event detected on md device /dev/md127

Cause

This issue occurred when the MAA completed its mdadm raid check array as part of its autocheck cronjob performed every first of Sunday every month at 00:57AM. What happened was it provided the results of the disk health state but subsequently resulted to device freeze, thus customer does not have a console logs output, no access to the WebUI https, unable to access terminal via SSH and not able to ping the appliance. During the event, even the syslogs did not capture any output and the only way to recover from this state is to perform cold reboot/power cycle of the appliance.

Based on developer/engineering, this issue is due to bug in the RAID1 kernel module running on MA causes a disk IO deadlock in certain situations.

Resolution

  • Workaround is perform power cycle/cold reboot of the appliance to regain access and continued syslog processes
  • Temporary fix would be to disable the RAID disk check which by default performed every first Sunday of evey month. The steps to disable the RAID check on MA 4.x which will preserved the changes;

[email protected]:~$ sudo sed -i 's/^AUTOCHECK=.*/AUTOCHECK=false/' /etc/default/mdadm

*this will change the setting for AUTOCHECK to false

[email protected](none):/etc/default# cat mdadm
# mdadm Debian configuration
#
# You can run 'dpkg-reconfigure mdadm' to modify the values in this file, if
# you want. You can also change the values here and changes will be preserved.
# Do note that only the values are preserved; the rest of the file is
# rewritten.
#

# AUTOCHECK:
# should mdadm run periodic redundancy checks over your arrays? See
# /etc/cron.d/mdadm.
AUTOCHECK=false ---->

<truncated for brevity>

==============================

Rollback/Revert the changes:

[email protected]:~$ sudo sed -i 's/^AUTOCHECK=.*/AUTOCHECK=true/' /etc/default/mdadm

*this will change the setting for AUTOCHECK to true

[email protected](none):/etc/default# cat mdadm
# mdadm Debian configuration
#
# You can run 'dpkg-reconfigure mdadm' to modify the values in this file, if
# you want. You can also change the values here and changes will be preserved.
# Do note that only the values are preserved; the rest of the file is
# rewritten.
#

# AUTOCHECK:
# should mdadm run periodic redundancy checks over your arrays? See
# /etc/cron.d/mdadm.
AUTOCHECK=true ---->

<truncated for brevity>

  • Engineering have identified a set of patches that will resolve this issue. We expect these kernel patches to be included in MA 4.2.12 (no official release dates yet)