Performance degradation is observed on a Fault Tolerance (FT) enabled virtual machine during application startup.
The workload is memory-sensitive, and performance improves once the application is fully loaded.
VMware vSphere
vSphere Fault Tolerance (FT)
This behavior is caused by the operational limitations of the vSphere FT checkpointing mechanism when handling high-frequency memory writes (dirty pages). Analysis of memory metrics confirms a high "dirty page" rate during the startup phase.
FT requires the Primary and Secondary VMs to remain identical at all times. High memory modification rates trigger the "Record/Replay" or "Checkpointing" flow-control , which proactively throttles the Primary VM. This slowing mechanism ensures the Secondary VM does not fall behind while waiting to acknowledge memory state synchronization.
vSphere Fault Tolerance is working as designed. For memory-intensive applications that undergo heavy memory churn during startup or steady-state, alternative availability solutions must be utilized:
Implement application-level high availability (HA) mechanisms native to the workload.
Utilize standard vSphere HA instead of vSphere FT for these specific memory-intensive workloads.
No additional configuration changes within vSphere FT will bypass the flow-control constraints tied to high memory churn.
After enabling Fault Tolerance (FT) on a Virtual Machine, performance or hung issues reported.