vmware Performance, Stability and Response

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

What do you look at when troubleshooting the vmware probe's performance, stability, and response?

Environment

vmware probe
Using the Infrastructure Manager (IM) not the Admin Console
Admin Console is preferred/recommended and scales much better

Cause

There are various components and complexities involved in the vmware probe's performance, stability and response. A few are addressed here.

Resolution

Establish resources at the VCenter Level

The vmware probe operates most efficiently when resources are configured at the VCenter level or, at the absolute minimum, ESX host level. Typically, we do not recommend setting up more than 1 VCenter (up to 10 ESX hosts) per vmware probe. Again, if you have 10 ESX hosts all under the same VCenter, it is most efficient to setup the resource at the VCenter level.

Automonitors vs. Templates vs. Explicitly Set Monitors

Typically it is always most efficient to set monitors up as automonitors. This makes dealing with monitors more efficient for the probe and it's management of memory and cpu resources to handle monitors. On occasion, you may need to setup a monitor explicitly because you may need a very customized monitor for that specific resource, but we recommend setting up all monitors as automonitors when possible. Also, you can drag templates to automonitors as a way of setting up automonitors. However, applying templates to resources directly is just as inefficient for the vmware probe as setting monitors explicitly.

Workload Balancing

You start by looking at the size of the vmware configuration file. If it is in the order of megabytes in size, there is likely a workload balancing issue. You can often either break up the resources being monitored or the monitor volume among multiple instances of the vmware probe deployed among multiple robots. So balancing the workload helps.

Performance Log File

Under the vmware probe directory, you will find a log file that is titled 'performance.log' (sometimes there may be more than one which are suffixed with a number). Looking at these logs, you never want to have times larger than 10's of thousands of milliseconds. When the milliseconds start translating into minutes, you need to investigate whether or not there are either too many resources or monitors configured for that probe AND/OR if the check interval is set too low.

Check Interval

You may want to consider increasing the resource check interval(s) initially. A 1 minute check interval is considered too tight for response back from the ESX host or VCenter. If certain monitors are more critical than others that the check interval needs to be set at 1 minute intervals, you may want to consider setting up 2 resouces for the same ESX host or VCenter and then you will be able to set a smaller check interval for those more critical monitors on on resource and a longer check interval on the other resource for those monitors that are less critical.

Memory and CPU available to the probe

Review the resource utilization and availability on the host/VM where the vmware probe is deployed and make sure there is ample memory and CPU available to the probes running on that robot. Typically if you are below 20 percent memory or CPU available, you may need to either move probes off that robot to another less loaded robot, or increase the CPU and/or memory resources available on that robot.

If the availability of resource isn't the issue, you can look at going into the vmware probe's Raw Configure and setting the starting and maximum java heap size in the 'properties' -> 'java_options' section. The '-Xms' value is the number of bytes the probe will claim when starting and the '-Xmx' value is the number of maximum bytes the probe will claim if needed. Always remember to append the number with the letter 'm' (megabytes).