Introduction:
An "RPO violation error" in vSphere Replication indicates that the replication process is exceeding the configured Recovery Point Objective (RPO). Which means the replicated data is not being updated frequently enough to meet the desired recovery time window, which could result in potential data loss in the event of a disaster.
The VM is experiencing an RPO violation error with message: "A replication error occurred at the vSphere Replication Server for replication 'xx-xxxx-x'. Details: 'Error for (hostIP: "x.xx.xx.x"), (flags: retriable): Fault: (vmodl.fault.SystemError) { faultCause = (vmodl.MethodFault) null, faultMessage = <unset>, reason = "Bad file descriptor" msg = "A general system error occurred: Bad file descriptor" }; Set error flag: retriable"
vSphere Replication 8.x
vSphere Replication 9.x
An RPO violation error occurs due to network issues, IP address changes, or limited bandwidth between the ESXi host and the replication server.
The cause is network connectivity problem between the source ESXi hosts and the vSphere Replication servers at the target site, leading to replication interruption.
Log Message:
From /var/log/vmware/hbrsrv.log,
see error event as "timeout and broken pipe"
WARNING:Hbr:5093: Failed to establish connection to [xx.xx.xx.xx]:31031(groupID=GID-xxxxx-xxxx-xxxx-xxxx-xxxxxxx)Timeout
WARNING:Hbr:5093: Failed to establish connection to [xx.xx.xx.xx]:31031(groupID=GID-xxxxx-xxxx-xxxx-xxxx-xxxxxxx)Timeout
WARNING:Hbr:893: Failed to receive from 10.xxx.xxx.219 (groupID=GID-xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx):Broken pipe
WARNING:Hbr:893: Failed to receive from 10.xxx.xxx.219 (groupID=GID-
xxxxxx
-xxxx-xxxx-xxxx-xxxxxxxxx
):Broken pipeerror hbrsrv[01333] [Originatoz@6876 sub=Main] [2] Dropping error encountered from network
error hbrsrv[01333] [Originator@6876 sub-Main] [0] ClientConnection (client=[192.xxx.xxx.184] :62421) request callback failed: Connection remote by peer: The connection is terminated by the remote end with a reset packet. Usually, this is a sign of a network
13T05:46:02.182Z error hbrsrv[01334] [Originator@6876 sub=Delta] Exception Vmacore : SystemException: Connection reset by peer: The connection is terminated by the remote end with a reset packet. Usually, this is a sign of a network
To resolve connectivity issues and address RPO violation, follow below steps.
vmkping -I vmkX <IP address
where, vmkX
is Replication vmkernel adapter and <IP address> is vSphere Replication server IP address.nc -zv <IP address> 31031
Refer following document for essential ports needed for all versions of vSphere Replication:# route add -net ###.###.###.#/24 gw ###.###.###.#
Note: Also,check if the available bandwidth for replication is sufficient to support the data transfer.
If the above steps, does not fix the issue please reach out to the Broadcom Support.