High IOPS observed on destination storage LUNs during multiple active replications in VMware Cloud Director Availability
search cancel

High IOPS observed on destination storage LUNs during multiple active replications in VMware Cloud Director Availability

book

Article ID: 417836

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

 

In environments running VMware Cloud Director Availability, administrators may notice spikes in IOPS on destination storage LUNs when multiple replications are concurrently active. This can lead to temporary performance bottlenecks or latency for workloads residing on the same datastores.

Monitoring tools or storage arrays may report higher than expected write operations during replication synchronization or recovery activities.

 

Environment

VMware Cloud Director Availability 4.7.3

Cause

This issue occurs when multiple replication write operations are initiated simultaneously in VMware Cloud Director Availability, causing increased I/O activity on the destination LUNs.

The situation is often intensified when a bandwidth-throttling limit (for example, 800 Mbit/s) extends replication durations, leading to overlapping replication windows. Additionally, the presence of large or I/O-intensive virtual machines replicating to the same storage target further amplifies the IOPS load, as each active replication stream contributes to the overall write operations on the destination datastore.

Resolution

1. Monitor Bandwidth Utilization in VCDA

To determine the current bandwidth usage per replication:

  1. Log in to the VCDA Cloud Replication Manager UI.

  2. Navigate to:
    Replications → select a replication → Details → Traffic

  3. Note the Current Transfer Rate (Mbit/s) for each active replication.

  4. Calculate total utilization by summing up the transfer rates of all active replications.

Example:
If 10 replications each show ~75 Mbit/s → total utilization ≈ 750 Mbit/s out of 800 Mbit/s limit.

Note: VCDA does not display a consolidated bandwidth usage counter; per-replication metrics must be used to estimate total bandwidth.

2. Optimize Bandwidth and IOPS Usage

  • Increase the bandwidth-throttling limit (if bandwidth capacity allows) to complete replications faster and reduce overlap.

  • Pause or stagger non-critical replications during peak hours to minimize load on destination LUNs.

  • Identify large or high-I/O VMs contributing to the issue and migrate them to separate storage policies or datastores.