RPO violation error - A replication error occurred at the vSphere Replication Server for replication 'xx-xxxx-x'
search cancel

RPO violation error - A replication error occurred at the vSphere Replication Server for replication 'xx-xxxx-x'

book

Article ID: 388929

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Introduction:

An "RPO violation error" in vSphere Replication indicates that the replication process is exceeding the configured Recovery Point Objective (RPO). Which means the replicated data is not being updated frequently enough to meet the desired recovery time window, which could result in potential data loss in the event of a disaster.

Symptoms:

The VM is experiencing an RPO violation error with message: "A replication error occurred at the vSphere Replication Server for replication 'xx-xxxx-x'. Details: 'Error for (hostIP: "x.xx.xx.x"), (flags: retriable): Fault: (vmodl.fault.SystemError) { faultCause = (vmodl.MethodFault) null, faultMessage = <unset>, reason = "Bad file descriptor" msg = "A general system error occurred: Bad file descriptor" }; Set error flag: retriable"

 

Environment

vSphere Replication 8.x

vSphere Replication 9.x

Cause

An RPO violation error occurs due to network issues, IP address changes, or limited bandwidth between the ESXi host and the replication server.
The cause is network connectivity problem between the source ESXi hosts and the vSphere Replication servers at the target site, leading to replication interruption.

Log Message:

From /var/log/vmware/hbrsrv.log, see error event as "timeout and broken pipe"

WARNING:Hbr:5093: Failed to establish connection to [xx.xx.xx.xx]:31031(groupID=GID-xxxxx-xxxx-xxxx-xxxx-xxxxxxx)Timeout
WARNING:Hbr:5093: Failed to establish connection to [xx.xx.xx.xx]:31031(groupID=GID-xxxxx-xxxx-xxxx-xxxx-xxxxxxx)Timeout

WARNING:Hbr:893: Failed to receive from 10.xxx.xxx.219 (groupID=GID-xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx):Broken pipe
WARNING:Hbr:893: Failed to receive from 10.xxx.xxx.219 (groupID=GID-xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx):Broken pipe

error hbrsrv[01333] [Originatoz@6876 sub=Main] [2] Dropping error encountered from network
error hbrsrv[01333] [Originator@6876 sub-Main] [0] ClientConnection (client=[192.xxx.xxx.184] :62421) request callback failed: Connection remote by peer: The connection is terminated by the remote end with a reset packet. Usually, this is a sign of a network
13T05:46:02.182Z error hbrsrv[01334] [Originator@6876 sub=Delta] Exception Vmacore : SystemException: Connection reset by peer: The connection is terminated by the remote end with a reset packet. Usually, this is a sign of a network

Resolution

To resolve connectivity issues and address RPO violation, follow below steps.

  • Verify connectivity between the ESXi host and the vSphere Replication server by using the vmkping command:
    •       vmkping -I vmkX <IP address
      where, vmkX is Replication vmkernel adapter and  <IP address> is vSphere Replication server IP address.

  • Also, check if port 31031 is open between the ESXi host and the vSphere Replication server using below command - Port numbers that must be open for vSphere Replication 8.x
  • Validate  the static route configuration on appliance using - /etc/systemd/network/10-eth1.network
    •       If static route is not added, add a static routes on the appliances to reach the opposite site over the replication network using command:
    •       # route add -net ###.###.###.#/24 gw ###.###.###.#

Note: Also,check if the available bandwidth for replication is sufficient to support the data transfer.

If the above steps, does not fix the issue please reach out to the Broadcom Support.