Seeing a variety of issues with Summary MOM:
-Out of Memory (OOM) messages when adding a 4th sender
- Load balancing issues. Differences in number of agents and metrics across collectors
- Dashboard sluggish and draws right to left slowly
- General EM Cluster Performance issues
- Seeing data gaps in dashboard graphs
- Search from Investigator in live mode returns AgentNotFoundException.
- Live graph not showing data.
- Collector hangs after restart.
- GC Heap and Load Balancer Settings. (General cluster performance issues.)
- Sending queries that returned more metrics than supported. (Summary MOM Capacity issue.)
- Sending 1 data point across 2 harvest durations. This resulted in a gap in the first period but eventually would show up. (Summary MOM timing issue.)
- Live query impact.
Customer had issues with Summary MOM 3.1 and APM 10.1.
What was done/proposed:
- Optimize sender regexes across multiple sender collectors to send only needed metrics by adding an exclusion metric.
- Data was sent at harvesting time processing 1 data point across 2 harvest durations. So there is a delay in seeing metric. So was a timing not a capacity issue.
- Add more senders. Changed architecture to only have 2-3 senders per summary MOM (receiver).
- Updated Summary Engine code to be more efficient and provide more logging details helpful for debugging.
- Change EM JVM settings to
- Use G1GC (XX:+UseG1GC )
- Removed Permgen settings. (As PermGen was removed in JVM 1.8.)
- Remove UseConcurrentMarkSweepGC,
- Upgrade from Summary MOM Release 3.1 to 3.4. (Multiple updates tested that became eventually Release 3.4)
- Increase introscope.enterprisemanager.framework.receiver.queue.size
- Proposed changing introscope.enterprisemanager.loadbalancing.staywithhistoricalcollector to always. 3 possible values are: always, notoverloaded, rarely - the default is notoverloaded. This was not done.
- Added APM 10.1 HF 25 to deal with EM issues when EM not responsive
- Reduced load balancing metric threshold
- Proposed switching Switching Workstation to 64 bit JVM but not done.
Also see these KDs for assistance
https://ca-broadcom.wolkenservicedesk.com/external/article?articleId=31358-- Tips for loadbalancing configuration when upgrading an Enterprise Manager Cluster
https://ca-broadcom.wolkenservicedesk.com/external/article?articleId=45641#/-- Unable to start Introscope Enterprise Manager and see the error message, "Could Not Create Java Virtual Machine."
It may be helpful to look at this Supportability metric if available: Custom Metric Host (Virtual)|Custom Metric Process (Virtual)|Custom Metric Agent (Virtual)|Enterprise Manager|JSON Metric Receiver|Queue Depths:pendingMetrics