Increased Network Latency in VMware FT VM
search cancel

Increased Network Latency in VMware FT VM

book

Article ID: 406486

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

When running a continuous ping test of a Fault Tolerant (FT) virtual machine (VM) to the default gateway, some elevated ping latency latency is observed. The latency ranged from less than 1 millisecond (<1ms) to ~100 milliseconds (100ms) during normal operations. This behavior was confirmed in a low-usage environment with 3 VMs across both hosts.  

Environment

VMware vCenter Server

VMware vSpherre ESXi

Cause

The observed latency is expected behavior due to the FT synchronization process. During memory checkpoint operations, the primary FT VM is temporarily "stunned" (paused) while memory and state changes are replicated to the secondary VM. The duration of the stun period depends on:

  • The volume of memory changes requiring synchronization.
  • The checkpoint interval, which determines how frequently checkpoints occur.

By default, VMware FT employs dynamic checkpoint intervals to balance compute performance and network latency. Longer intervals improve VM runtime and throughput but increase network latency, while shorter intervals reduce network latency at the expense of VM runtime performance.

Resolution

To manually adjust the FT checkpoint interval for specific performance requirements, follow these steps:

 
  1. Power off the FT VM 
  2. Edit the VMX configuration file for the FT VM and add the following parameter:
     
    ftcpt.cptIntervalUS = "50000"
    This example sets the interval to 50ms (50,000 microseconds). 
  3. Power on the FT VM again. Changes will take effect upon startup.

Example Performance Observations

  • At 50ms interval: Ping latency improved to 6ms/33ms/15ms (min/avg/max).
  • At 100ms interval: Latency increased to 8ms/40ms/21ms, but compute performance improved.

Important Considerations

  • Compute vs. Network Trade-off:
    • Shorter intervals (e.g., 50ms) reduce network latency but may lower compute throughput.
    • Longer intervals (e.g., 100ms) improve compute performance but increase latency.
     
  • The default dynamic interval (recommended) adapts to workload conditions for optimal balance.
  • Evaluate application requirements (e.g., latency-sensitive vs. compute-heavy) before adjusting this setting.

Additional Information

  • For further details on FT network latency impacts, refer to: VMware Documentation .
  • Adjustments to ftcpt.cptIntervalUS may need to be redone across VM configuration upgrades.