SSIM Event services queue is showing red and 100% full

book

Article ID: 156633

calendar_today

Updated On:

Products

Security Information Manager

Issue/Introduction

On all the SSIM appliances the Event Services queue is showing red and 100% full

Cause

There might be more than one cause:

1.  Poor network connectivity to the Directory SSIM

2.  Upstream SSIM is to busy to accept more events due to:

  • The ICE Service is having problems with an incident in the queue
  • Correlator is overwhelmed due to a poorly configured Correlation Rule or due to an Incident Forwarding Rule pointing back to itself
  • Large number of Open Incidents causing slowness on the Correlator


3.  Summarizers are enabled on the Archiver

4.  The Event Service is overwhelmed due to incorrectly parsed events

4.  Degraded RAID array due to failed drive causing slow writing to Disk for archiving

 

Resolution

 Solution for Cause #1 

Check the Network Interface for Speed and Duplex as well as for an excessive number of dropped packets with the ifconfig command.

To check for dropped packets on the Directory SSIM:

  1. Connect and login with an SSH Client and switch users to root, or login to the console as root.
  2. At the command prompt, run the command ifconfig

If there is an excessive amount of dropped packets or errors, make sure the Speed and Duplex are right.  Also check the network cable and on the health/configuration of the switch/router the SSIM is plugged into.

To check the speed and duplex on the Directory SSIM:

  1. Connect and login with an SSH Client and switch users to root, or login to the console as root. 
  2. At the command prompt, run the command ethtool <interface name>
    Example: ethtool eth0

The Duplex must always be Full.  If the Duplex displays Half, then make sure the Switch the SSIM is plugged into has the same Speed and Duplex settings the SSIM's Network Interface is set for. 


Solution for Cause #2

The ICE Service is having problems:

Follow the Instructions in the KB article Troubleshooting Correlation service issues and why correlation engine sometimes stops creating new incidents under Technical Information for how to clear the ICE queues.

Correlator is overwhelmed due to a poorly configured Correlation Rule or Incident Forwarding Rule:

  • Rules that Trigger to much need to be refined to trigger less frequent and be more useful
    Rules that create to many incidents are not useful not only because of performance impact, but also due to the number of incidents that then need to be worked through.
  • Rules that have references to non-indexed fields which contain a lot of information, such as the Description field, being used in the Criteria or as Tracking Fields.
  • Using to many Lookup Tables in rules.  It is recommended for a short list of items to build this into the Criteria of a Rule rather than using a Lookup Table.
  • Incident Forwarding rules are only used for Service Provider environments.
    If no SSIM in the environment is setup to be a Service Provider, in the SSIM Client go to System > Server Configurations > Expand the Correlator > Incident Forwarding Rules and remove any Incident Forwarding Rules.

 Large number of Open Incidents causing slowness on the Correlator:

  1. Log into the SSIM Client as an Administrator.
  2. In the bottom left corner right click on the Incident counter.
  3. Click Count All Open Incidents and make a note of the number given.
  4. Right click on the Incident counter again.
  5. Click Count all Open Alerts and make a note of the number.
  6. Add the two numbers together and you have the total number of Open Incidents on this SSIM.

If the total number of Open Incidents is close to 25000 or more it is likely the amount of system memory being used by them is robbing other services of memory.  this will impact the other services ability to function and impact performance.


Solution for Cause #3

Summarizers are a legacy functionality from earlier versions of SSIM when there were no State Collectors and no Trending Query function.  Most point products which would apply to using Summarizers for state type information all have State Collectors now.  

In SSIM 4.7 Symantec introduced Trending Queries which allow queries to span time ranges and provide Trending information for the data you specified in the criteria.

It is recommended to disable all Summarizers due to the fact that they have the potential to have a negative impact on performance and due to the introduction of Trending Queries and the use of State collectors.


Solution for Cause #4

In the eventservice.log file when their is a significant number of lines reporting  data in a core field as invalid.  The most common field reported is the IP Address field.

A couple of examples:

2012-03-28 10:42:49,228 1897211183 [Normalizer] WARN  com.symantec.sim.eventservice.util.Networks - Invalid IP address: ResolveIP(BackRef(source_host_name))

2012-04-04 20:15:24,829 636118481 [Normalizer] WARN  com.symantec.sim.eventservice.util.Networks - Invalid IP address: ::ffff:10.200.7.39


Solution for Cause #5

If the RAID array on SSIM or configured External Storage Device is not optimal, then it is degraded and there may be a disk that needs replaced/reseated and built back into the array.  When a RAID array is degraded it cannot perform as fast and depending on the typical event rate in the SSIM environment may cause queues to fill up.

Find the reason for the degraded RAID array and resolve it.