Troubleshooting ESX/ESXi virtual machine performance issues
search cancel

Troubleshooting ESX/ESXi virtual machine performance issues

book

Article ID: 304594

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article provides information on isolating a performance issue on ESXi/ESX.

Symptoms:
  • Services running in guest virtual machines respond slowly.
  • Applications running in the guest virtual machines respond intermittently.
  • The guest virtual machine may seem slow or unresponsive.


Cause

Performance issues may be caused by several different areas such as CPU constraints, memory overcommitment, storage latency, or network latency. If one or more of your virtual machines has a bad response time, consider each of these areas to find the bottleneck.

Resolution

Each step below provides instructions and links to the appropriate documents. The steps are ordered in the most appropriate sequence to isolate the issue and to identify the proper resolution. They are also ordered in the most appropriate sequence to minimize data loss.
 
Note: After completing each step, determine whether the performance issue still exists. Work through each troubleshooting step in order, and do not skip a step.

This article includes four main sections:

  1. CPU constraints
  2. Memory overcommitment
  3. Storage Latency
  4. Network latency

CPU constraints

To determine whether the poor performance is due to a CPU constraint:

  1. Use the esxtop command to determine if the ESXi/ESX server is being overloaded. For more information about esxtop, see the Resource Management Guide for your version of ESXi/ESX:
    1. Examine the load average on the first line of the command output.

      A load average of 1.00 means that the ESXi/ESX Server machine’s physical CPUs are fully utilized, and a load average of 0.5 means that they are half utilized. A load average of 2.00 means that the system as a whole is overloaded.
       
    2. Examine the %READY field for the percentage of time that the virtual machine was ready but could not be scheduled to run on a physical CPU.

      Under normal operating conditions, this value should remain under 5%. If the ready time values are high on the virtual machines that experience bad performance, then check for CPU limiting:
       
    If the load average is too high, and the ready time is not caused by CPU limiting, adjust the CPU load on the host. To adjust the CPU load on the host, either:
     
    • Increase the number of physical CPUs on the host

      OR
       
    • Decrease the number of virtual CPUs allocated to the host. To decrease the number of virtual CPUs allocated to the host, either:
       

Memory overcommitment

To determine whether the poor performance is due to memory overcommitment:

  1. Use the esxtop command to determine whether the ESXi/ESX server's memory is overcommitted. For more information about esxtop, see the Resource Management Guide for your version of ESXi/ESX:
    1. Examine the MEM overcommit avg on the first line of the command output. This value reflects the ratio of the requested memory to the available memory, minus 1.

      Examples:
       
      • If the virtual machines require 4 GB of RAM, and the host has 4 GB of RAM, then there is a 1:1 ratio. After subtracting 1 (from 1/1), the MEM overcommit avg field reads 0. There is no overcommitment and no extra RAM is required.
      • If the virtual machines require 6 GB of RAM, and the host has 4 GB of RAM, then there is a 1.5:1 ratio. After subtracting 1 (from 1.5/1), the MEM overcommit avg field reads 0.5. The RAM is overcommited by 50%, meaning that 50% more than the available RAM is required.
         
    If the memory is being overcommited, adjust the memory load on the host. To adjust the memory load, either:
     
    • Increase the amount of physical RAM on the host

      OR
       
    • Decrease the amount of RAM allocated to the virtual machines. To decrease the amount of allocated RAM, either:
       
      • Decrease the total amount of RAM allocated to all of the virtual machines on the host

        OR
         
      • Reduce the total number of virtual machines on the host.
         
  2. Determine whether the virtual machines are ballooning and/or swapping.

    To detect any ballooning or swapping:
     
    1. Run esxtop.
    2. Type m for memory
    3. Type f for fields
    4. Select the letter J for Memory Ballooning Statistics (MCTL)
    5. Look at the MCTLSZ value.

      MCTLSZ (MB) displays the amount of guest physical memory reclaimed by the balloon driver.
       
    6. Type f for Field
    7. Select the letter for Memory Swap Statistics (SWAP STATS).
    8. Look at the SWCUR value.

      SWCUR (MB) displays the current Swap Usage.
       
    To resolve this issue, ensure that the ballooning and/or swapping is not caused by the memory limit being incorrectly set. If the memory limit is incorrectly set, reset it correctly. For more information, see:
     

Storage Latency

To determine whether the poor performance is due to storage latency:

  1. Determine whether the problem is with the local storage. Migrate the virtual machines to a different storage location.
  2. Reduce the number of Virtual Machines per LUN.
  3. Look for log entries in the Windows guests that look like this:

    The device, \Device\ScsiPort0, did not respond within the timeout period.
     
  4. Using esxtop, look for a high DAVG latency time. For more information, see Using esxtop to identify storage performance issues for ESXi (multiple versions).
  5. Determine the maximum I/O throughput you can get with the iometer command. For more information, see Testing virtual machine storage I/O performance for ESXi and Best practices for performing the storage performance tests within a virtualized environment.
  6. Compare the iometer results for a VM to the results for a physical machine attached to the same storage.
  7. Check for SCSI reservation conflicts. For more information, see Analyzing SCSI Reservation conflicts on VMware Infrastructure 3.x, vSphere 4.x, vSphere 5.x and vSphere 6.0.
  8. If you are using iSCSI storage and jumbo frames, ensure that everything is properly configured. For more information, see:
     
  9. If you are using iSCSI storage and multipathing with the iSCSI software initiator, ensure that everything is properly configured. For more information, see these sections of the iSCSI SAN Configuration Guide:
     
If you identify a storage-related issue:
  1. Ensure that your hardware array and your HBA cards are certified for ESX/ESXi. For more information, see the VMware Hardware Compatibility List.
  2. Ensure that the BIOS of your physical server is up to date. For more information, see Checking your firmware and BIOS levels to ensure compatibility with ESX/ESXi.
  3. Ensure that the firmware of your HBA is up to date. For more information, see Slow performance caused by out of date firmware on a RAID controller or HBA.
  4. Ensure that the ESXi can recognize the correct mode and path policy for your SATP Storage array type and PSP Path Selection Policy. For more information, see Viewing and Managing Storage Paths on ESXi Hosts.

Network latency

Network performance can be highly affected by CPU performance. Rule out a CPU performance issue before investigating network latency.

To determine whether the poor performance is due to network latency:

  1. Test the maximum bandwidth from the virtual machine with the Iperf tool. This tool is available from https://github.com/esnet/iperf

    Note: VMware does not endorse or recommend any particular third-party utility.
     
    1. While using Iperf, change the TCP windows size to 64 K. Performance also depends also on this value. To change the TCP windows size:
       
      1. On the server side, enter this command:

        iperf -s
         
      2. On the client side, enter this command:

        iperf.exe -c sqlsed -P 1 -i 1 -p 5001 -w 64K -f m -t 10 900M

 

  1. Run Iperf with a machine outside the ESXi/ESX host. Compare the results with what you expect you should have, depending on your physical environment.
  2. Run Iperf with another machine outside the ESXi/ESX host on the same VLAN on the same physical switch. If the performance is good, and the issue can only be reproduced with a machine at another geographical location, then the issue is related to your network environment.
  3. Run Iperf between 2 VMs on the same ESX server/portgroup/vswitch. If the result is good, you can exclude a CPU, memory or storage issue.
If you identify a bottleneck on the network:
  1. Work through the steps in Solutions for Poor Network Performance.
  2. If you are using iSCSI storage and jumbo frames, ensure that everything is properly configured. For more information, see:
     
  3. If you are using Network I/O Control, ensure that the shares and limits are properly configured for your traffic. For more information, see vSphere Network I/O Control.
  4. Ensure that traffic shaping is correctly configured. For more information, see What is Traffic Shaping Policy.



Additional Information

VMware Skyline Health Diagnostics for vSphere - FAQ
See these resources for more information on the topics discussed:

CPU

Memory

Storage

 

 

For any performance-related issues, it is always recommended to check the output of the command esxtop real time and we can also run esxtop command in the batch mode 

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-55016CA4-F6F6-4343-8B99-5209244FAC65.html

If all the above pointers are checked

Then additionally kindly check the events on the OS level

Run task manager or top command and check which process is utilizing more resources on the OS level.

 

Performance issues on a VM residing on any esxi host can be due to various reasons as mentioned below

1. Due to the high IOPs generated on the OS layer.

2. Due to the misconfiguration of the numbers of CPU and Cores per socket

3. Overprovisioning on the ESXI hosts

4. Need to check on the values of RDY, CSTP, DAVG, KAVG, GAVG, CMDS/s, READS/s, WRITES/s, LAT/rd, LAT/wr, load average

5. Application-level issues may also cause performance-related issues on the VMs.

6. Check if the issue is happening intermittently or if it is continuos.  We also need to check the pattern of the issue to determine the cause of the issue.

7. VM Backups may also create an issue and create performance issues.

8. Check if the VMs are running on a snapshot for a long time.

9. Check for any storage or network-related issues.

Kindly reference this below article for esxtop and its values 

https://www.virten.net/vmware/esxtop/