High latency between VMs on the Same Host and same portgroup (VLAN)
search cancel

High latency between VMs on the Same Host and same portgroup (VLAN)

book

Article ID: 414288

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Two virtual machines hosted on the same ESXi host and connected to the same port group experience significant network latency.

Example:

Environment

VMware ESXi

Resolution

  • Validate the vNIC type: Ensure both VMs use VMXNET3.
  • Check VMware tools: Must be installed, running, and up to date.
  • Inspect CPU and memory usage for both VMs. High usage inside the guest can delay the network stack.
  • Migrate both the VM's to another host and check the latency between the same. If latency improves, the issue likely lies with the original host.
    • You could also run iperf between the two affected VMs to narrow down further to host virtual network or not. For more information on the same refer the following 3rd party article How to use Iperf to measure Bandwidth in Windows 
      • If the latency is high in iperf, the issue is within the host’s virtual network.
      • If iperf results don't show any latency, the problem may be on the application level.
  • Deploy two fresh, minimal VMs (fresh Windows VM or lightweight Linux VM) on the same host and portgroup. You can also refer UPSA VM for troubleshooting for deploying test VMs.
    • If they show normal latency, the issue is specific to the production VMs.
    • If they too show high latency, the issue is host level virtual networking.
  • If the issue is only observed on this specific host, then check the following:
  • Check both the VM’s virtual NIC buffers to see if any packets are being dropped when the buffers fill up.
    • You can use the following vsish command to check buffer status. To do the same first we need to find the VM vSwitchPortSetName and portNum using net-stats -l command.
    • Sample Output for TinyLinux2 vmxnet3 statistics:

    • Run the following vsish command with the above details collected: vsish -e get /net/portsets/<vSwitchPortSetName>/ports/<portNum>/vmxnet3/rxSummary

    • If the "running out of buffer" counter is increasing it's very likely that there are packet drops resulting in retransmit and increased latency, refer to the following KB for more information on the same: 324556 and how to fix it.
  • If there are no packet drops due to network adapter running out of buffer or due to any other reasons, we can then consider performing a packet capture on the switchport of the source and destination VMs while initiating pings between the same. For more details on how to capture packets refer: Packet capture on ESXi using the pktcap-uw tool
    1. Analyze the captured ICMP traffic to determine if VM2 was introducing significant delays in sending its reply to VM1's requests.
    2. Monitor the ping results for patterns in latency spikes.
    3. Investigate the packet capture for any other significant network traffic between VM1 and VM2 that might be consuming excessive host or network resources. But would recommend using test VMs to isolate the issue before checking the same over production VMs.
    4. Calculate the time difference between VM1 sending an ICMP request into the vswitch and VM2's corresponding reply exiting the vswitch to see if there is any vswitch Processing delay.

Example:

    • If the time at which VM1 sent the ICMP request is T1 and the time at which VM2 sent the ICMP reply is T2, then T2 - T1 would give us the time, the packet spent in the vswitch I/O chain.
    • If there is a significant delay in the vSwitch, while processing packets (as validated in step 4 above), you can perform a trace capture using the source MAC address filter to pinpoint where the latency is occurring in the I/O path.

Syntax: pktcap-uw --switchport SwitchportID --trace --mac macAddressofVM

  • Based on our observation from the trace and PCAP we may then have to verify the Power Management Policy setting on the ESXi host.
    • If it is not set to High Performance, update the setting accordingly.
      • Edit the Power Management Policy of the ESXi host to "High performance"
      • You can find the Power Management Policy parameter under the configure tab --> Hardware --> Overview

Additional Information

Other references:

How to use Iperf to measure Bandwidth in Windows

High virtual network throughput performance tuning recommendation when using 100G network interface cards

Performance Best Practices for VMware vSphere 8.0