The Perflog can reveal much about the health of the Collectors in a cluster. This article discusses the most easily diagnosed issues. For a complete analysis, CA Services should be engaged for a comprehensive health check.
Perform these four steps for an initial health check.
Step 1: transport.outgoingMessageQueueSize
Step 2: Max heap size
Step 3: Prepare to Analyze Perflog.txt
Step 4: Analyze Perflog.xlsx
The Perflog can reveal much about the health of the Collectors in a cluster. This article discusses the most easily diagnosed issues. For a complete analysis, CA Services should be engaged for a comprehensive health check.
Performance metrics are written to the Perflog at 15 second intervals. To see a summary of all the values in any column, click the Filter button for that column and scroll through the contents of the Filter window.
Column B reports the total memory available to the JVM. If initial heap (-Xms) and max heap (-Xmx) are equal, this number should not change much over time since the maximum heap will be allocated immediately at startup instead of being acquired by the JVM as needed.
Column C reports the amount of available free JVM memory in any interval. Look for occurrences of free memory dropping to a 2-digit number or less on the Collector. If you see this, increase the heap size available to the JVM. Add memory to the server if necessary. However, if you already have sufficient JVM memory allocated, then proceed by further investigating the rest of the columns. It is unusual to see this problem on a MOM.
Column F reports the Harvest Duration. This is the amount of time the Collector is taking to aggregate 15 second interval metrics in preparation for writing them to the Smartstor database. If Harvest Duration frequently exceeds 3000ms (3 seconds), this is a sign that the Collector is struggling to aggregate the incoming interval metrics. The Collector is overloaded.
Column G reports Smartstor Duration. This is the amount of time the Collector is taking to write harvested data to disk. Values of 5000ms (5 seconds) or more should be addressed. CA recommends using a separate disk on a dedicated controller to store Smartstor data. Check the location of the
Smartstor /data directory to ensure it is not on the same disk as the Enterprise Manager itself, and check IntroscopeEnterpriseManager.properties to verify that
introscope.enterprisemanager.smartstor.dedicatedcontroller=true
when Smartstor data is on a separate, dedicated disk.
Column I (Agent number of metrics) and Column L (Agent metric data rate) should always have equal or very close values. Scroll through the spreadsheet to compare these two columns. The Agent metric data rate says how many metrics were processed in an interval. If the metric data rate (Column L) is much lower than the number of metrics coming in (Column I), it is a clear indication that the Collector cannot cope with the number of agent metrics it is processing. The Collector is overloaded.
Column J reports the number of Agents connected to this Collector. The maximum number of Agents allowed per Collector is 400. Note this value in each Perflog for each Collector in the cluster. If the number of Agents is unbalanced across the cluster, such that some Agent Collectors (but not TIM Collectors) are supporting more Agents than others, then look for load balancing issues in [EM_HOME]/config/loadbalancing.xml.
By default, all agents should be configured to point to the MOM. The MOM will assign Agents to Collectors automatically and enforce load balancing across the cluster at 15 minute intervals.
A cluster consists of one MOM and a maximum of 10 Collectors of all types, including TIM Collectors. Adding more than 10 Collectors to a cluster can negatively impact the performance of the MOM.
Column X reports Performance Transactions Number of Traces. This is the number of traces arriving in any 15 second interval from all Agents reporting to this Collector. The maximum allowed for any one Collector is 500,000.
If the number of traces coming in exceeds this number, then consider disabling socket, file, and network I/O traces on all agents to reduce the metric load. To find out which tracer types are reporting the most traces, it is recommended to disable each type one at a time, then examine the perflog again for improvement.
To disable traces, check to see which PBL file you are using in [AGENT_HOME]/wily/core/config/IntroscopeAgent.profile by checking the directives property:
introscope.autoprobe.directivesFile=websphere-typical.pbl,hotdeploy
Here, we are using websphere-typical.pbl
Checking in websphere-typical.pbl, we see that toggles-typical.pbd is called.
Edit toggles-typical.pbd and comment out the TurnOn directives for socket, file, and network I/O traces as shown:
#######################
# Network Configuration
# ================
#TurnOn: SocketTracing
# NOTE: Only one of SocketTracing and ManagedSocketTracing should be 'on'. ManagedSocketTracing is provided to
# enable pre 9.0 socket tracing.
#TurnOn: ManagedSocketTracing
#TurnOn: UDPTracing
#######################
# File System Configuration
# ================
# TurnOn: FileSystemTracing
#######################
# NIO Socket Tracer Group
# ================
#TurnOn: NIOSocketTracing
#TurnOn: NIOSocketSummaryTracing
#TurnOn: NIOSelectorTracing
A restart of the monitored application will be required.