For troubleshooting purposes, it may be necessary to check if any processes are consuming a substantial amount of resources on the service console. Processes consuming a substantial amount of resources can prevent correct operation of the ESX system. This article provides you with the steps to check for starvation of resources on the ESX host service console.
If any process is utilizing a substantial amount of CPU or memory on your ESX host service console it can prevent correct operation of the system. ESX includes the top
utility to be able to check for resource utilization on the service console. It can be used to view the current values for the statistics and to determine if there is starvation of resources on the ESX host service console.
To check the utilization of the processes on the service console:
This screen appears and shows the resource utilization and running processes on the server:
Load average is a measurement of the number of processes that currently waiting in the run-queue plus the number of processes that being executed for 1-, 5-, and 15-minute intervals. A load average of 1.00 means that the ESX host machine's physical CPUs are fully utilized, and a load average of 0.5 indicates they are half utilized. A load average of 2.00 indicates that the system is busy. If the load average is over 4.00, the system is heavily utilized and performance is impacted.
A load average similar to this indicates that the ESX Service Console does not have a queue of tasks waiting to process:
load average: 0.14, 0.06, 0.01
A load average similar to this indicates that tasks are waiting in the run queue to be processed:
load average: 2.00, 2.00, 2.00
The CPU state counters provide an overview of the CPU utilization in each state on the system. if your screen looks like this, your system has a high CPU idle percentage. A high CPU idle means that the system not busy:
CPU states: cpu user nice system irq softirq iowait idle
total 0.1% 0.0% 0.0% 0.0% 1.3% 12.1% 86.2%
If the CPU idle counter output is low, investigate into which state is consuming the CPU time. The different states mean:
When the CPU idle state is at 0%, it looks like this:
CPU states: cpu user nice system irq softirq iowait idle
total 1.1% 0.0% 0.1% 0.0% 0.0% 98.6% 0.0%
The CPU time is being consumed in the iowait state. If the CPU time is being consumed in the iowait state, check the disk subsystem to determine what is causing the delay in response from the storage subsystem.
Note: If the CPU time is being consumed in the user state, you can determine the process that is consuming the CPU from the list of tasks below the statistics. The list of tasks refreshes every few seconds to provide an updated view of the process list. In this example, vmware-hostd is consuming 0.9% of the available CPU:
Memory and swap are the statistics you need to review. These statistics provide an overall indication of how much memory is being used and if there is heavy swapping occurring on the system. This screen shows an example of the expected output:
The example above indicates that there is 268248KB (268MB) of RAM in the system and that 84864KB (85MB) is free. There is 554168KB (554MB) of swap available in the system and 503152KB (503MB) is free. In this case there is substantial RAM available for the service console to use and therefore very little swapping occurs.
Note: This view only shows you the amount of RAM that is assigned to the ESX host service console, it does not provide a view of the total RAM in the server.
To troubleshoot an ESX host that shows a low amount of RAM and high amount of swapping:
Note: You can also see the amount of memory and swap currently in use from the /proc/meminfo
file.
I/O Starvation can be caused by many issues, but commonly occurs when a LUN is removed and the ESX host is not rescanned. To properly remove LUNs from your ESX host, see Removing a LUN containing a datastore from VMware ESXi/ESX 4.x (1029786).
For more information, see VMware HA configuration fails with a VMWareClusterManager Rule not enabled error (1004495).
Increasing the amount of RAM assigned to the ESX Server service console