Replications fail to synchronize causing the number of RPO violations to increase in Cloud Director Availability 4.x
search cancel

Replications fail to synchronize causing the number of RPO violations to increase in Cloud Director Availability 4.x

book

Article ID: 315081

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

Symptoms:
  • In the destination site's Cloud Director Availability Portal, you see an increasing number of RPO violations for incoming replications.
  • Performing a pause and then resume of the replications doesn't start the synchronization for the impacted replications with RPO violations.


Environment

VMware Cloud Director Availability 4.x

Cause

This issue occurs because the Host-Based Replication (HBR) service threads on the destination Replicator appliance become stuck in a lock condition while accessing the datastores where the replica files are located causing the replications to be unable to synchronize.

Resolution

This is a know issue affecting Cloud Director Availability 4.x.
Currently there is no resolution.

Workaround:
To confirm you are experiencing this issue, perform the following steps:

  1. SSH to a destination site replicator and login as root.
  2. Run the following netstat command and take note of the results:
netstat -ntap | egrep -h "Recv-Q|31031"
  1. Wait for 1 minute and run the netstat command again.
  2. Compare the results from steps 2 and 3.

If the results for the command are the same for the Recv-Q column for one or more connections this means that those threads are stuck and you are currently facing this issue. 

To work around this issue contact VMware Support and note this Article ID (91034) in the problem description. For more information, see Creating and managing Broadcom support cases