How to Monitor Which Processes Use Too Much CPU

book

Article ID: 178878

calendar_today

Updated On:

Products

Monitor Solution

Issue/Introduction

 

Resolution

 

Monitor Solution policies are useful for detecting when servers have health problems such as when a rogue process consumes the CPU. The CPU averaged > 85% over past 3 minutes metric rule is a predefined metric rule designed for detecting CPU problems. However the alert and the email for this metric rule give no indication of what process caused the problem and the problem may be resolved or the server may be crashed before the cause can be assessed.  This article describes how to customize a monitor policy, metric rule, and metrics that will give more data and help identify the problem process. Note that the CPU usage numbers reported by monitor solution will not match those shown in Task Manager but instead correspond to CPU usage numbers shown by querying WMI with the following command: wmic path Win32_PerfFormattedData_PerfProc_Process get Name,PercentProcessorTime.

 

1. Create a Monitor Policy

  • In the console go to Manage>Policies then expand Monitoring and Alerting>Monitor>Monitor Policies.
  • Choose an existing folder or create a new folder then right-click it and select New>Monitor Policy (Agent Based). A page will open for the newly created policy.
  • Name the policy appropriately, for example, Processes using > 85% CPU (Agent Based).
  • Click Save Changes to avoid losing work.

2. Create a Metric Rule

  • In the new monitor policy click the blue + (Add) button under the Rules tab.
  • In the Select Rule window that appears click in the Search bar and type in “CPU” to find the CPU averaged > 85% over past 3 minutes metric rule. This predefined rule is very similar to what will be created.
  • Select the rule and then click the Clone button which looks like two pages next to each other.
  • Edit the newly cloned rule by selecting it and clicking the Edit button which looks like a yellow pencil.
  • Name the rule appropriately such as “Processes used > 85% CPU over past 3 minutes”.
  • In the Metrics section select the existing rule (Processor - % Processor Time (_Total)) and click the yellow pencil (Edit) button.
  • In the Edit Metric Evaluation window that appears click on the metric shown in blue (Processor - % Processor Time (_Total)). The Select Metric window will appear.

3. Create a Metric

  • In the Select Metric window type “Process -” in the Search bar to find the Process - % Processor Time (DNS) metric.
  • Select the metric then click the Clone button that looks like two pages.
  • Select the cloned metric and click the yellow pencil (Edit) button.
  • In the Edit Performance Counter Metric window that appears change the name and description of the metric to Process - % Processor Time (Any Instance).
  • Delete the text in the Instance field, then enable All instances.
  • Click OK to save and close the metric.
  • Click OK to close the Select Metric window.

4. Configure the Metric Evaluation

  • In the Edit Metric Evaluation window set Statistics to Average. Then set Time period to 180 seconds (3 minutes). The final configuration should look like figure 1.
  • Click OK to close the Edit Metric Evaluation window.
  • At this point the metric rule will trigger if any process is using > 85% of the CPU; unfortunately this includes the Idle process which consumes all the CPU that is not being used and will constantly cause the rule to trigger. This situation can be useful for testing the policy on an inactive server to see what the generated alert and email will look like. To perform this test click OK twice then skip step 5 and finish it later.

Figure 1. The configuration of the metric evaluation to trigger if any process uses over 85% of the CPU.

5. Disregard the Idle Process

  • To disregard situations where the Idle process is consuming the CPU create a new metric evaluation in the Edit Metric Evaluation window by clicking the yellow * (New) button.
  • In the New Metric Evaluation window that opens click Select metric.
  • Find the Process - % Processor Time (DNS) metric again and clone it again like step 3.
  • Select the cloned metric and click the yellow pencil (Edit) button.
  • In the Edit Performance Counter Metric window that appears change the name and description of the metric to "Process - % Processor Time (Idle)".
  • Delete the text in the Instance field and enter “idle”.
  • Click OK to save and close the metric.
  • Click OK to close the Select Metric window.
  • In the Edit Metric Evaluation window set Statistics to Average.
  • Set Time period to 180 seconds (3 minutes).
  • Set Condition to Is less than.
  • Set Value to 15 (If Idle process is less than 15% then other processes must be using up the CPU). The final configuration should look like figure 2.
  • Click OK to close the Edit Metric Evaluation window.
  • Click OK to close the Edit Metric Rule window.
  • Click OK to close the Select Rule window.
  • It may be necessary to repeat this step for other processes that consume a lot of CPU but aren't actually a problem such as ccSvcHost from Symantec Endpoint Protection.

Figure 2. The configuration of the metric evaluation to not trigger if the Idle process is using over 15% CPU, meaning that no other process is using that part of the CPU.

6. Add a Send Email Task

  • In the monitor policy note that the severity of the metric rule is Warning.
  • Go to the Actions tab and then go to the Warning tab.
  • Click the blue + (Add) button.
  • In the Select Task window that appears browse to System Jobs and Tasks>Monitoring and Alterting>Monitor>Tasks>Send Email or select any previously customized Send Email task used for monitor policies.
  • Click OK to close the Select Task window.
  • Click OK to close the Task Configuration window or click Edit task to make any changes to it such as a To: address.
  • Click Save changes to save the policy

7. Add a Target

  • In the monitor policy go to the Rules tab.
  • To target a single filter click on Apply to>Quick Apply and search for the name of the filter.
  • To target a list of computers or combination of filters click on Apply to>Computers.
  • In the Select computers window that appears click Add rule, change exclude computers in to exclude computers not in, then change Filter to Computer list if desired and finally click in the button to select the computers or filters.
  • Click Update Results to verify that the target contains the correct computers, then click OK to close the Select computers window.
  • Click Save Changes to save the policy.
  • The policy will become active when the client computer agents update configuration.

Attachments