Error "Data insert queue size exceeding threshold"

book

Article ID: 170046

calendar_today

Updated On:

Products

Malware Analysis Software - MA

Issue/Introduction

 Malware Analysis Appliance (MAA) Yellow health state with error "Data insert queue size exceeding threshold"

  • Not able to submit new samples and samples send by upstream device (CAS/ASG and Security Analytics) are not being analyzed.
  • System status indicates yellow with Health Status error of "Data insert queue size exceeding threshold"

Cause

The bulk-event-insert queue is the post-processing queue. Once a task finishes, MAA run post processing (matching results against patterns and extracting the data out of the intelliVM). If this queue is having a high value, this means either the post processing has stopped or it does not keep up with the load. The error "data insert queue size exceeding threshold" happens subsequently when the post-processing queue gets backed up. Different things can cause it - DNS issues, or over-driving the system and even unexpected load of file submissions.

Environment

  1. Check mq-consume-events logs with similar error that a SQL error is preventing the queue from emptying:

mag2 mq-consume-events-01[2823]: ' EXCEPTION CLASS :sqlalchemy.exc.IntegrityError'

      2. Check queues and noticed none zero value for bulk-event-insert from the outout https://<maa-IP>/rapi/system/queues

"bulk-event-insert": {
"messages": 82194, ---->
"messages_ready": 82192, ---->
"messages_unacknowledged": 2

      3. Confirm the same from the rabbitmqctl list_queues output:

[email protected]:/opt/mag2# sudo rabbitmqctl list_queues
Listing queues ...
bulk-event-insert 475872 ----------->
sandbox-tasks-medium 0
ivm-tasks-medium 0
ivm-tasks-high 0
sandbox-tasks-low 0
droid-tasks-medium 0
droid-tasks-high 0
droid-tasks-low 0
sandbox-tasks-high 0
ivm-tasks-low 0
...done.

Resolution

1. Try to run these commands to fix this to restart the mq-consume-events:
[email protected]:/opt/mag2#sudo supervisorctl restart mq-consume-events:*

2. If the restart of mq-consume-events did not resolve, recommend to clear out the queue that filled up. Here's the procedure for clearing that queue:

i. Log into the device via SSH as the 'g2' user
ii. Check the currently open firewall ports. Defaults are SSH, HTTP, and HTTPS
[email protected]:~$ df-config-mgr --dump | grep external
network.external_ports=22/tcp, 80/tcp, 443/tcp

iii. Add the rabbitmq service to the open ports:
[email protected]:~$ df-config-mgr -w network.external_ports "22/tcp, 80/tcp,443/tcp, 55672/tcp"

iv. Open a browser and go to http://<maa>:55672/
User name 'guest' password 'guest'
This should display the RabbitMQ web GUI

v. Browse to the Queues tab
vi. Click on the bulk-event-insert queue
vii. Click on Delete/purge at or near the bottom of the page. Click on the Purge button on the right side of the page. A green 'Queue purged' message should display; click Close
viii. Close the browser window
ix. Remove the rabbitmq service from the open network ports

[email protected]:~$ df-config-mgr -w network.external_ports "22/tcp, 80/tcp,443/tcp"

x. Reboot the device

3.  To avoid similar events, recommend to leave 'detailed event capture' disabled by default and monitor files submission from upstream (CAS/ASG and SA) task processing that would result the queue to be too high.