vSphere Replication Reconfiguration Fails with 'Failed storing configuration state' After Datastore Exhaustion
search cancel

vSphere Replication Reconfiguration Fails with 'Failed storing configuration state' After Datastore Exhaustion

book

Article ID: 437431

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms

  • Multiple virtual machine replications enter an Error (RPO Violation) state.
  • Attempts to Reconfigure the replication to a different datastore fail with the following error:

    Operation Failed. Cannot reconfigure replication group 'VM_NAME' (managed object ID: 'GID-'). Details: 'Failed storing configuration state and error'.

  • The target datastore reached 100% capacity (0B free) prior to the error appearing.
  • Restarting the hms and hbrsrv services on the vSphere Replication appliance alone does not resolve the issue.

Environment

  • vSphere Replication 8.x
  • vSphere Replication 9.0.x

Cause

This issue is caused by metadata desynchronization between the vSphere Replication Management Service (HMS) and the Host-Based Replication (HBR) agents on the target ESXi host.

When a destination datastore becomes completely full, the HBR processes encounter an I/O hang while attempting to update Persistent State Files (.psf) and delta disks. Even after space is reclaimed, the hbr-agent and hbrsrv processes on the target ESXi host may retain stale memory states or exclusive file locks on the replication metadata. These "zombie" locks prevent the HMS from committing new configuration changes or acknowledging incoming delta blocks.

Resolution

To resolve this state, the host-level replication services must be restarted on the affected target ESXi host(s) to clear stale metadata locks.

  1. Reclaim Storage Ensure the destination datastore has sufficient free space. A minimum of 20% free space is recommended to accommodate replication overhead, redo logs, and persistent state files.

  2. Restart Host-Side Replication Services Log in to the Target ESXi host (where the replica files reside) via SSH as root and execute the following commands. Note: These commands only affect replication traffic and do not impact the running state of virtual machines.

    # Restart the HBR Agent service
    /etc/init.d/hbr-agent restart
    # Restart the HBR Server service
    /etc/init.d/hbrsrv restart
  3. Restart Appliance Services (Optional) If the issue persists after restarting host services, restart the management services on the vSphere Replication Appliance (VRA) at both the source and destination sites:

    # From the appliance VAMI page or SSH session
    systemctl restart hms
    systemctl restart hbrsrv
  4. Validation

    1. Log in to the vSphere Client.
    2. Navigate to Site Recovery > Open Site Recovery.
    3. Select the affected VM and click Sync Now or attempt the Reconfigure task again.
    4. Verify that the replication status transitions to OK.

Impact/Risks Restarting the hbr-agent and hbrsrv services will momentarily pause active replication syncs for all VMs residing on that specific ESXi host. Normal replication will resume automatically once the services are back online. This does not cause any downtime for production virtual machines.