Troubleshooting Load Average Issues
search cancel

Troubleshooting Load Average Issues

book

Article ID: 294094

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

Load average shows how many processes are waiting in the run queue for a system resource (usually a processor). The higher the load average, the more processes are waiting.

Environment


Cause

One way to determine whether a machine has high load average is to use an operating system command such as uptime or top while the application is running.

Uptime

The uptime output below shows that the load average is 0.40, 0.46, 0.43 over the last 1, 5 and 15 minutes, respectively.$ uptime 15:37:27 up 107 days, 2:24, 32 users, load average: 0.40, 0.46, 0.43 Another way to determine whether a machine has high load average is to use either the gemfire stats command or vsd to display the load average values contained in a given GemFire statistics archive. The LinuxProcessStats category contains the load average statistics.

$ uptime
 15:37:27 up 107 days, 2:24, 32 users, load average: 0.40, 0.46, 0.43
Another way to determine whether a machine has high load average is to use either the GemFire stats command or vsd to display the load average values contained in a given GemFire statistics archive. The LinuxProcessStats category contains the load average statistics.

LinuxProcessStats

The gemfire stats command below shows the LinuxProcessStats loadAverage1 value in the stats.gfs archive:

$ gemfire stats :LinuxSystemStats.loadAverage1 -archive=stats.gfs
[info] Found 1 match for ":LinuxSystemStats.loadAverage1"
gfigridcachesw4p, 515600110, LinuxSystemStats: "2009/02/10 16:03:10.442 UTC" samples=2444
 loadAverage1 threads: samples=2444 min=0.37 max=147.59 average=28.92 stddev=31.51

The VSD Tool

In VSD the load averages can be found in LinuxSystemStats loadAverage1, loadAverage5, and loadAverage15 metrics

Resolution

Determining that there is high load is one thing. Finding the source of the load is another (whether it be CPU or I/O). One operating system command that can help determine the cause of high load is top.

The top output shows, among other things, load average, CPU usage percentages and I/O wait (iowait) percentage. The iowait percentage is the percentage of time the CPU is waiting for an I/O to complete. The output below shows a fairly high load average over the past 1 minute (10.40) for the number of CPUs. It also shows that the CPUs are mostly in use (idle=3.0%) and that I/O wait percentage is low (0.4%). In this case, the load is due to CPU.

12:49:24 up 113 days, 23:36, 35 users, load average: 10.40, 5.20, 2.30
615 processes: 587 sleeping, 27 running, 1 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
 total 61.7% 0.0% 31.4% 0.5% 2.5% 0.4% 3.0%

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
22523 user1 15 0 1102M 1.1G 18068 R 3.6 14.1 0:24 1 java
22778 user1 15 0 1102M 1.1G 18068 R 2.1 14.1 0:02 1 java
22682 user1 15 0 1102M 1.1G 18068 R 1.4 14.1 0:07 1 java
22698 user1 15 0 1102M 1.1G 18068 R 1.4 14.1 0:10 0 java
19286 user1 15 0 1100M 1.1G 18080 R 0.5 14.1 0:25 0 java

In this case, the CPU is clearly causing the high load. 

If instead, the I/O wait percentage was high, then the high load might be related to disk I/O.


Additional Information

Environment

GemFire 6 and later