Poor network performance or high network latency on Windows virtual machines
search cancel

Poor network performance or high network latency on Windows virtual machines

book

Article ID: 310350

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
A virtual machine with one virtual CPU and a high CPU load, or a virtual machine with two or more virtual CPUs in general and a Windows guest operating system, may experience these symptoms:
  • Poor network performance and/or high ping response times:
     
    • When receiving network traffic (regardless of the amount of data and type)
    • While under high CPU load, or sharing CPU resources with highly-utilized virtual machines
       
  • Observed throughput may decrease to 512 kB/s on Gigabit Ethernet. Timeouts and connectivity disruptions may also be observed.
  • Ping replies may take up to 20 seconds.
  • Sensitive services like database servers may perform poorly or time out.
  • The number of virtual and physical network cards has no effect on this issue.
  • This issue occurs with different virtual network adapter types (E1000, VMXNET2 and VMXNET3).
  • Measured performance results (generated with tools like iperf) may worsen when adding more virtual CPUs to the virtual machine.


Environment

VMware vSphere ESXi 8.x
VMware vSphere ESXi 7.x



Cause

There are three possible causes for this issue:
  • Power plan

    On Windows 2008 and 2008 R2, the power plan is set to Balanced by default. Microsoft has observed and confirmed that changing the power plan from Balanced to High Performance may increase overall performance. Aggressive power saving plans can adversely affect performance, especially with latency-sensitive applications like web and database servers.
     
  • High CPU ready time (also referred as "%RDY" or "%RDY time")

    Note: This information is simplified and its only purpose is to illustrate the cause of the described issue. It should not be referenced outside of the context of this document. Although the example we use here is sufficient to describe the cause of the stated issue, it does not claim to be technically correct in every detail due to the complexity of the CPU scheduling process for virtual machines.

    This example is based on these assumptions:
     
    • An ESXi host with a hyper-threading enabled quad-core CPU, resulting in 4 physical and 8 logical CPUs
    • A Windows 2008 R2 virtual machine with 4 virtual CPUs
    • Three Windows 2003 virtual machines with 2 virtual CPUs each

    In this configuration, the ESXi host exposes 8 logical CPUs as 10 virtual CPUs to the virtual machines. In other words, the ESXi host is overcommitted. Depending on the utilization of the virtual machines, the ESXi host will not be able to provide all virtual machines with the requested CPU time, thus the performance of the virtual machines will be as expected. However, if the load on multiple virtual machines increases, the ESXi host has to decide which virtual machine will be served first with the currently available CPU time. It is important to note that the ESXi host will serve multi-core virtual machines only when it is able to serve all the virtual CPUs from the particular virtual machine at once. Otherwise a virtual machine with a lower number of virtual CPUs will be served first. Although this is how the CPU scheduler is supposed to work, it can also lead to situations where certain virtual machines have to wait for an unreasonable amount of time for the requested CPU time. In such cases you can observe a degraded overall performance and increased response times. 
     
  • Receive Side Scaling (RSS)

    RSS is a mechanism which allows the network driver to spread incoming TCP traffic across multiple CPUs, resulting in increased multi-core efficiency and processor cache utilization. If the driver or the operating system is not capable of using RSS, or if RSS is disabled, all incoming network traffic is handled by only one CPU. In this situation, a single CPU can be the bottleneck for the network while other CPUs might remain idle.

    Note: To make use of the RSS mechanism, the hardware version of the virtual machine must be 7 or higher, the virtual network card must be set to VMXNET3, and the guest operating system must be capable and configured properly. On some systems it has to be enabled manually.
For further information on Microsoft RSS, see Introduction to Receive Side Scaling  For more information on Linux Receive Side Scaling, see RSS and multiqueue support in Linux - https://knowledge.broadcom.com/external/article?legacyId=2020567er for VMXNET3 (2020567).

Resolution

Before proceeding with these steps, ensure that:

  • there are no problems in your external infrastructure like faulty hardware or possible misconfigurations (common configuration problems are IP conflicts, unintended traffic shaping, misconfigured trunk and EtherChannel ports)
  • the network is not congested
  • the network the ESXi host is on is stable and performs as expected
  • the virtual machines are configured with the VMXNET3 network adapter
  • the hardware drivers and firmware versions are recent
  • the BIOS is recent and configured appropriately
  • the virtual machine is running the latest version of VMware Tools (they contain the drivers for the virtual hardware)
  • any security software like intrusion detection/prevention systems or packet inspectors have enough resources available and are configured correctly (check the logs for incorrectly filtered traffic or dropped packets)

After you have confirmed that your infrastructure is healthy and all components are configured correctly, check the power saving configuration. For virtual machines with more than one virtual CPU, also check if high CPU %RDY times have a negative impact on these virtual machines.

The final step is to check the RSS settings. Changing the RSS settings should only be done by trained network administrators. VMware also recommends confirming that all relevant applications (including the operating system) support changes to the RSS configuration.


Power plan

To ensure that the system takes advantage of the available resources, it is important to disable all power saving features while investigating performance issues. If the power saving configuration appears to be related to the performance problems, a customized power plan based on the performance and power saving requirements should be created. If you are unsure about which power saving configuration is recommended for your system, engage your hardware vendor.

To adjust the power plan settings on a Windows Server:
 

  1. Click Start, type powercfg.cpl, and press Enter.
  2. Ensure that the High performance option is selected.

    Note: Steps 3 through 6 are optional.
     
  3. Click Change plan settings.
  4. Click Change advanced power settings.
  5. To enable access to all settings, click Change settings that are currently unavailable.
  6. Browse the available settings and adjust as necessary.
  7. Click OK to confirm and close all windows.

    Note: Some changes might require a reboot of the guest system.
     

Checking CPU %RDY times

To determine if a virtual machine is impacted by high CPU %RDY times, use one of these methods:

  • Count all virtual CPUs on a particular host or cluster, and divide by the number of logical CPUs. A result of one or higher means that the host or cluster is overcommitted and should be investigated. Values of four or higher are considered overloaded and must be investigated immediately.

    Notes:
    • The intent of this method is to quickly determine if a host is overcommitted, rather than determining if it is not. VMware recommends using esxtop to observe detailed host performance.
    • Although hyper-threading doubles the number of logical processors, it cannot provide the same performance as two physical processor cores. If it is likely that the host is overcommitted, calculate using the number of physical CPUs, rather than logical CPUs.
       
  • The esxtop command displays the values for the CPU %RDY time when run on the host with the affected virtual machines. 
  • The vm-support command provides the capability to create performance snapshots. For more information, see Collecting performance snapshots using vm-support (1967).


To relieve an overcommitted host, use one of these methods:

  • Move the affected virtual machine to a host with more available resources
  • Move other virtual machines off the host
  • Decrease the number of virtual CPUs on the affected virtual machine

Note: Changing the CPU count might not be supported by the guest operating system. For more information, contact the operating system vendor.


Enabling and configuring Receive Side Scaling (RSS)

Before enabling RSS:

  • Ensure that the hardware version of the virtual machine is set to Version 7 or higher. For more information, see Virtual machine hardware versions (1003746).
  • Ensure that the virtual network adapter is set to VMXNET3 and that the operating system is supported by this adapter. For more information, see Choosing a network adapter for your virtual machine (1001805).
  • Ensure that RSS is enabled in the guest operating system. To verify this in a Windows guest operating system, open a command prompt and run the command:

    netsh int tcp show global

    The output indicates whether Receive-Side Scaling State is enabled or not.

    To see more details about current RSS settings you can use Powershell commands shared below:

    Get-NetAdapterRss -Name "MyAdapter"   - This example gets the RSS properties of the network adapter named MyAdapter.
     
  • Ensure that the network adapter in the virtual machine is configured to use RSS. To verify this in a Windows guest operating system:
     
    1. Open the Device Manager, navigate to Network adapters, and right-click the adapter you wish to enable RSS on.
    2. In the Properties window, click the Advanced tab, then click RSS in the list on the left side.
    3. Change the Value to Enabled and click OK to close the window. A reboot might be necessary for the changes to take effect.

      Note: Enabling/disabling the RSS feature interrupts the network connection on the adapter for several seconds. If you are accessing the system via a remote desktop session, ensure that you can access the system in another way in case an issue occurs that causes the network connection to not return.