VM snapshot consolidation takes significantly longer when LUN replication is enabled
search cancel

VM snapshot consolidation takes significantly longer when LUN replication is enabled

book

Article ID: 430368

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Replication to another site (e.g. within a metrocluster configuration) is enabled on the LUNs backing vmfs datatstores

  • Snapshot consolidation/removal takes significant longer on VMs running with replication enabled.  

Environment

VMware vSphere ESXi (all versions)

Cause

  • Longer snapshot consolidation times arise if enabling replication increases I/O latency on the array.

  • This is expected with synchronous replication, where I/O completion time includes the completion of replication across the network.

  • With asynchronous replication there may also indirect sources of additional latency. 

Verification:

  • To confirm increased latency, compare device DAVG/cmd write latency via esxtop with and without replication during snapshot consolidation, e.g. run esxtop, type 'u' for device view, then 'f' and toggle on DAVG/cmd write metric. DAVG/cmd is the latency from the point I/O leaves the driver).
  • Note: Key is the percentage increase in latency, not the absolute increase, as IOPS is inversely proportional to DAVG/cmd latency. For example, everything else remaining the same, if DAVG/cmd write latency increases from 1 ms to 1.5 ms, then write IOPS would reduce to 67% of its previous value. As such a relatively small increase in latency in absolute terms can have a significant impact on performance. 




Resolution

Engage your storage vendor to investigate the source of the additional latecncy

Additional Information

See: Using esxtop to identify storage performance issues for ESXi