VCDA Replication Stuck in "Synchronizing" state with low throughput after adding a disk
search cancel

VCDA Replication Stuck in "Synchronizing" state with low throughput after adding a disk

book

Article ID: 428868

calendar_today

Updated On:

Products

VMware Cloud Director VMware Cloud Director Availability - Disaster Recovery 4.x VMware Cloud Director Availability - Migration 4.x

Issue/Introduction

In VMware Cloud Director Availability (VCDA) 4.7.x, after adding a virtual disk (VMDK) to a protected Virtual Machine, the replication status remains "Synchronizing" for an extended period. The UI reports very low network traffic (e.g., 1kbps), even for large VMs (e.g., >1 TB).

Environment

VMware Cloud Director Availability (VCDA) 4.7.x

Cause

Adding a disk triggers a Checksum-Based Initial Synchronization. The source ESXi host must read every block of the VM's disks to calculate a hash. This metadata is then compared with the destination to determine which blocks need to be sent. During this read-intensive phase, almost no network traffic is generated, creating the appearance of a stuck task.

Resolution

Do not unconfigure or restart the replication, as this will force the checksum process to restart from the beginning. Instead, verify that the host is working using the following steps:

  1. Identify the VMID: Log in to the Source ESXi host via SSH and run:

    vim-cmd vmsvc/getallvms | grep <VM_NAME>
    

    Note the first column (VMID).

  2. Verify Checksum Progress: Run the following command periodically (e.g., every 10 minutes) to check for progress:

    vim-cmd hbrsvc/vmreplica.getState <VMID>
    
  3. Interpret the Output: Look for the Group State and DiskID sections:

    • State: Should say full sync.

    • checksumTotal: The total size of the disks.

    • checksumDone: The amount of data already scanned.

    Note: If checksumDone is increasing between checks, the replication is healthy. No network traffic will occur until the scan catches up to the current data blocks.

  4. Check for Host Side Errors: If checksumDone is not increasing, check the VMkernel logs for HBR errors:

    grep -i "hbr" /var/log/vmkernel.log | tail -n 20

Additional Information

This is a CPU/Storage-intensive task and limited network traffic will occur until this scan completes.