Endpoint servers maxing out aggregator.
Error:
Message: Stack array is empty. The following exception does not have a proper stack trace.
java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
at com.symantec.dlp.communications.common.activitylogging.ConnectionLogger.getThrottler(ConnectionLogger.java:553)
at com.symantec.dlp.communications.common.activitylogging.ConnectionLogger.shouldSuppressHSL(ConnectionLogger.java:506)
at com.symantec.dlp.communications.common.activitylogging.ConnectionLogger.writeToLogFileIfNeeded(ConnectionLogger.java:473)
at com.symantec.dlp.communications.common.activitylogging.ConnectionLogger.writeToLogs(ConnectionLogger.java:459)
at com.symantec.dlp.communications.common.activitylogging.ConnectionLogger.onReplicatorException(ConnectionLogger.java:1161)
at com.symantec.dlp.communications.common.activitylogging.AsynchronousConnectionLogger$ReplicatorExceptionTask.run(AsynchronousConnectionLogger.java:2414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.OutOfMemoryError: Java heap space
Or
java.lang.OutOfMemoryError: GC overhead limit exceeded"
All
There can be several causes here
1. Policy complexity(typically applicable to 15.8 and earlier)
2. Immature policy editing practices. .
3. Bad Load balancer configuration
4. Bad agent comm layer settings.
Review the FileReader logs look for:
com.vontu.policy.loader.execution.ExecutionMatrixGenerator sizeInRows
Consider tuning policies that consist of > 10,000 rows
https://knowledge.broadcom.com/external/article/174430/high-memory-or-cpu-usage-of-the-dlp-agen.html has important tips on how to avoid policies with too many rows in 15.8 and earlier.
In addition to policy complexity, it is a good idea to have controls in place that limit the quantity and/or frequency of policy updates. While more efficient in 16.0 and later. It's important to know that agents start receiving new policies the moment you click save on a policy in a policy group applied to endpoint servers. Because of this, when a dozen policy changes are made back to back, this process gets restarted over and over and can get expensive and taxing on both agents and servers CPU and memory.
DLP agents behind with a load balancer between them and the endpoint server(s) need to have that load balancer configured for source IP persistence. For reference see
About using load balancers in an endpoint deployment
Neglecting the use of IP persistence(also called IP stickiness) Can cause endpoint servers to frequently not see agents for many hours, triggering the endpoint server to report to Enforce that the agent is not reporting. This is based on the 'Configuring Agent Connection Status "Not Reporting" after' setting. This leads to several things happening
These things all make communicating with agents more expensive, and thus consume more resources, this is avoided by utilizing IP persistence on the load balancer.
Load balancers are also often tasked with performing health checks on the endpoint server. As agent connections to the endpoint server are not persistent, it is not necessary to have health check frequency measured in milliseconds. Having health checks kick off dozens of times a second can have a negative impact on Endpoint Server stability and performance.
In general advanced agent settings beginning with 'CommLayer' or 'ServerCommunicator' should be left at their default values as they are interdependent on each other in many cases and changing one without properly changing adjacent settings can result in negative impact to agent and server communication.
Common mistakes made in relation to agent comm layer settings.
If all else fails and we are still encountering
java.lang.OutOfMemoryError: GC overhead limit exceeded or 'java.lang.Exception: java.lang.OutOfMemoryError: Java heap space'
within Aggregator#.log files It may simply be time to increase memory available for the endpoint server component.
Within the Advanced Server Settings
For 16.0.x and earlier
Find BoxMonitor.EndpointServerMemory
Increase the value of the -Xmx setting to a size appropriate for the available physical memory on each Endpoint Detection Server.
For 16.1 and later
Find UDS.Detector.MaxMemory
Increase the value to a size appropriate for the available physical memory on each Endpoint Detector.
Continue to monitor aggregator to see if the service stays running and is stable.