vSphere Replication (VR) instances may experience intermittent disconnections leading to replication failures. Symptoms observed include:
These issues indicate a problem with the replication data path between the source and target environments.
vSphere Replication (9.x)
The underlying cause was identified as a communication issue between the source ESXi hosts and the target ESXi hosts on port 32032, even though the port was confirmed to be open at the network level. This indicated a service-level problem rather than a network blockage. Specifically, the hbrsrv services on the target ESXi hosts were in a state that prevented proper replication communication.
To resolve this issue, the hbrsrv service on the affected target ESXi hosts needs to be restarted:
After restarting the hbrsrv service on each target ESXi host, monitor the replication status in the vSphere Replication interface. If a replication is still showing as "Error" or "Not Syncing," you may need to manually trigger a "Sync Now" operation for the affected VMs to re-establish the replication. This resolution typically re-establishes the necessary communication channels for vSphere Replication to function correctly.