Seeing Performance and other Issues with the Summary MOM solution.


Article ID: 6177


Updated On:


CA Application Performance Management Agent (APM / Wily / Introscope) INTROSCOPE


 Seeing a variety of issues with Summary MOM:

  -Out of Memory (OOM) messages when adding a 4th sender

  - Load balancing issues. Differences in number of agents and metrics across collectors

  - Dashboard sluggish and draws right to left slowly

  - General EM Cluster Performance issues

  - Seeing data gaps in dashboard graphs

  - Search from Investigator in live mode returns AgentNotFoundException.

  - Live graph not showing data.

  - Collector hangs after restart.



- GC Heap and Load Balancer Settings. (General cluster performance issues.)  

- Sending queries that returned more metrics than supported. (Summary MOM Capacity issue.) 

- Sending 1 data point across 2 harvest durations. This resulted in a gap in the first period but eventually would show up. (Summary MOM timing issue.)

- Live query impact.


Customer had issues with Summary MOM 3.1 and APM 10.1.


What was done/proposed:

  - Optimize sender regexes across multiple sender collectors to send only needed metrics by adding an exclusion metric.

  - Data was sent at harvesting time processing 1 data point across 2 harvest durations. So there is a delay in seeing metric. So was a timing not a capacity issue. 

 - Add more senders. Changed architecture to only have 2-3 senders per summary MOM (receiver). 

 - Updated Summary Engine code to be more efficient and provide more logging details helpful for debugging. 

 - Change EM JVM settings to 

    - Use G1GC (XX:+UseG1GC )

    - Removed Permgen settings. (As PermGen was removed in JVM 1.8.)

    - Remove  UseConcurrentMarkSweepGC,

 - Upgrade from Summary MOM Release 3.1 to 3.4. (Multiple updates tested that became eventually Release 3.4)

 - Increase introscope.enterprisemanager.framework.receiver.queue.size

 - Proposed changing introscope.enterprisemanager.loadbalancing.staywithhistoricalcollector to always. 3 possible values are: always, notoverloaded, rarely - the default is notoverloaded. This was not done.

 - Added APM 10.1 HF 25 to deal with EM issues when EM not responsive

 - Reduced load balancing metric threshold 

 - Proposed switching Switching Workstation to 64 bit JVM but not done.



Additional Information

Also see these KDs for assistance Tips for loadbalancing configuration when upgrading an Enterprise Manager Cluster Unable to start Introscope Enterprise Manager and see the error message, "Could Not Create Java Virtual Machine."

It may be helpful to look at this Supportability metric if available: Custom Metric Host (Virtual)|Custom Metric Process (Virtual)|Custom Metric Agent (Virtual)|Enterprise Manager|JSON Metric Receiver|Queue Depths:pendingMetrics