VM snapshot consolidation takes significantly longer when LUN replication is enabled
book
Article ID: 430368
calendar_today
Updated On:
Products
VMware vSphere ESXi
Issue/Introduction
Replication to another site (e.g. within a metrocluster configuration) is enabled on the LUNs backing vmfs datatstores
Snapshot consolidation/removal takes significant longer on VMs running with replication enabled.
Environment
VMware vSphere ESXi (all versions)
Cause
Longer snapshot consolidation times arise if enabling replication increases I/O latency on the array.
This is expected with synchronous replication, where I/O completion time includes the completion of replication across the network.
With asynchronous replication there may also indirect sources of additional latency.
Verification:
To confirm increased latency, compare device DAVG/cmd write latency via esxtop with and without replication during snapshot consolidation, e.g. run esxtop, type 'u' for device view, then 'f' and toggle on DAVG/cmd write metric. DAVG/cmd is the latency from the point I/O leaves the driver).
Note: Key is the percentage increase in latency, not the absolute increase, as IOPS is inversely proportional to DAVG/cmd latency. For example, everything else remaining the same, if DAVG/cmd write latency increases from 1 ms to 1.5 ms, then write IOPS would reduce to 67% of its previous value. As such a relatively small increase in latency in absolute terms can have a significant impact on performance.
Resolution
Engage your storage vendor to investigate the source of the additional latecncy