vSphere Replication cannot establish a TCP connection to server at 127.0.0.1:8123

Products

VMware Live Recovery

Issue/Introduction

1. VMs stop syncing under the Replications tab in SRM UI and report the status as Not Active & RPO Violation Error

2. VRMS or VR Add-on servers status is disconnected under Replication Servers of SRM UI

3. Synchronization monitoring has stopped. Please verify replication traffic connectivity between the source host and the target vSphere Replication Server. Synchronization monitoring will resume when connectivity issues are resolved.

ERROR
Operation Failed
Cannot establish a TCP connection to server at '10.#.#.#:8123'. Details: 'https://10.#.#.#:8123/ invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to 10.#.#.#:8123 [/10.#.#.#] failed: Connection refused (Connection refused)"'.
7/16/23, 12:32:32 AM +0530

/opt/vmware/hms/logs/hms.log:

<YYYY-MM-DD><time> ERROR hms.net.hbr.ping.svr.52708a4f-e992-0321-91ee-1cbaf1d5ff8f [Ping Thread for server 127.0.0.1:8123] (..net.impl.PersistentConnection) | Ping for server 127.0.0.1:8123 failed: com.vmware.vim.vmomi.client.exception.ConnectionException: org.apache.http.conn.HttpHostConnectException: Connection to https://192.#.#.#:8123 refused : org.apache.http.conn.HttpHostConnectException: Connection to https://192.#.#.#:8123 refused.

<YYYY-MM-DD><time> [7FF0C3378700 info 'HostCreds' opID=hs-init-4f74cc43] Ignoring link-local address for host-17: "fe80::#:#:#:#"
<YYYY-MM-DD><time> [7FF0C3378700 info 'HostCreds' opID=hs-init-4f74cc43] Ignoring link-local address for host-826: "169.#.#.#"

VMkernel.log:

2025-03-27T15:37:51.159Z cpu104:4125924)WARNING: Hbr: 893: Failed to receive from 10.#.#.# (groupID=GID-e4cc30f3-####-####-####-36b977d5def4): Broken pipe
2025-03-27T15:57:48.354Z cpu65:4125924)WARNING: Hbr: 893:  Failed to receive from 10.#.#.# (groupID=GID-e4cc30f3-####-####-####-36b977d5def4): Broken pipe
 
2025-03-27T16:34:15.184Z cpu76:4125924)Hbr: 3410: Command: INIT_SESSION: error result=Failed gen=-1: Group GID-e4cc30f3-####-####-####-36b977d5def4 not registered
2025-03-27T16:34:15.184Z cpu76:4125924)WARNING: Hbr: 3438: Command INIT_SESSION failed (result=Failed) (isFatal=FALSE) (Id=0) (GroupID=GID-e4cc30f3-####-####-####-36b977d5def4)
2025-03-27T16:34:15.184Z cpu76:4125924)WARNING: Hbr: 5093: Failed to establish connection to [10.#.#.#]:31031 (groupID=GID-e4cc30f3-88e4-37bc-a08f-36b977d5def4): Failure
2025-03-27T16:35:01.689Z cpu0:4288766)Hbr: 3410: Command: REPLICA_SNAPSHOT: error result=Instance was aborted. gen=-1: Instance was aborted; Stale snapshot request (last instance aborted) for group GID-e4cc30f3-####-####-####-36b977d5def4, disk RDID-68c$
2025-03-27T16:35:01.689Z cpu0:4288766)WARNING: Hbr: 3438: Command REPLICA_SNAPSHOT failed (result=Instance was aborted.) (isFatal=TRUE) (Id=2133226562) (GroupID=GID-e4cc30f3-####-####-####-36b977d5def4)
2025-03-27T16:35:01.691Z cpu0:4288766)Hbr: 3410: Command: REPLICA_SNAPSHOT: error result=Instance was aborted. gen=-1: Instance was aborted; Stale snapshot request (last instance aborted) for group GID-e4cc30f3-####-####-####-36b977d5def4, disk RDID-be0$
2025-03-27T16

2025-03-27T16:34:58.754Z cpu76:4125924)WARNING: Hbr: 762: LWD delta transfer terminated (aborted) (diskID=RDID-68c437be-####-####-####-4108cbe2b64b) (imageID=replica-524e4c91-####-####-####-a3ea1e4e9ab2)
2025-03-27T16:35:00.227Z cpu0:4288765)WARNING: Hbr: 762: LWD delta transfer terminated (aborted) (diskID=RDID-68c437be-####-####-####-4108cbe2b64b) (imageID=replica-524e4c91-####-####-####-a3ea1e4e9ab2)

Environment

VMware vSphere Replication 8.x
VMware vSphere Replication 9.x

Cause

1. This issue occurs when one or more ESX servers registered in the vSphere Replication database has an IPv4 or IPv6 Link Local IP address.

2. IPv4 Link Local address is defined in the range 169.254.0.0/16 and IPv6 Link Local address is assigned with the fe80::/10 prefix.

3. There is a host in not responding state in the vCenter inventory (it must either be removed from the inventory or reconnected)

Resolution

WORKAROUND

Try these common steps first.

1. Reconnect hosts in unresponsive state or remove them from vCenter inventory
2. Power OFF & Power ON VR appliance (Don't shutdown)
3. Reboot vCenter

1. Take a snapshot of the VR appliance you intend to apply this fix on.

2. systemctl stop hbrsrv

3. Backup the latest hbrsrv.***.db (Where * represents the highest DB number)

root@vrmspr [ ~ ]# cd /etc/vmware

root@vrmspr [ /etc/vmware ]# mkdir backup
root@vrmspr [ /etc/vmware ]# cp hbrsrv.***.db /backup
root@vrmspr [ /etc/vmware ]# sqlite3 hbrsrv.***.db (Login to the latest hbrsrv DB)

root@vrmspr [ /etc/vmware ]# sqlite3 hbrsrv.100.db
SQLite version 3.22.0 2024-02-02 18:45:57
Enter ".help" for usage hints.
sqlite> select * from hostinfo; (Displays the contents of hostinfo table)
sqlite> delete from hostinfo; (Deletes the contents of hostinfo table)

The commands below can be used if you'd like to remove a specific host IP/s from the table instead of clearing the entire table.

sqlite3 hbrsrv.100.db "DELETE FROM HostInfo WHERE addresses in ('xx');"
sqlite3 hbrsrv.100.db "DELETE FROM HostInfo WHERE addresses in ('<ip1>', '<ip2>');"
sqlite3 hbrsrv.100.db "DELETE from HostInfo where addresses in ('192.X.X.X','192.X.X.X','192.X.X.X','192.X.X.X');"

4. Run the command: systemctl start hbrsrv

5. Reboot the appliance

TIP: The issue can be avoided in vSphere Replication 8.7 by adding com.vmware.vr.disallowed Host tag in the ESXi host.