Receiving error in vSphere Replication - Unable to retrieve pairs from extension server
search cancel

Receiving error in vSphere Replication - Unable to retrieve pairs from extension server

book

Article ID: 396603

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Accessing vSphere Replication you will encounter the error: "Unable to retrieve pairs from extension server"

One side of the pairing will show in a "Not connected" Or "unknown" error state and attempting to reconnect the pairing will give the error: "Cannot complete login due to an incorrect user name or password."

Environment

vSphere Replication 9.0.2 with Replication Isolation traffic setup

Cause

vSphere Replication HMS service in charge of managing connection to vCenter to provide pairing information is overloading attempting to connect to the HBRSRV service.

hms.log file will show similar messages when starting the service:

2025-04-14 15:36:20.777 ERROR com.vmware.hms.HmsService [hms-main-thread-3] (..vmware.hms.HmsService) [] | stage 2 starting...FAILED
HMS Server failed to start successfully:
com.vmware.vim.vmomi.client.exception.ConnectionException: https://[vSphere Replication FQDN]:8123/ invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to [vSphere Replication FQDN]:8123 [[vSphere Replication FQDN]/[eth 0 IP address], [vSphere Replication FQDN]/[eth 1 IP address], [vSphere Replication FQDN]/0:0:0:0:0:0:0:1] failed: Connection refused"

Eventually the log will repeat the same few tasks over and over similar to the following:

2025-04-14 15:36:27.144 DEBUG com.vmware.hms.i18n.class com.vmware.hms.response.filter.I18nActivationResponseFilter [tcweb-14] (..hms.fault.ExceptionToFaultConverter) [operationID=xxxxxxxx-HMS-xxxxxx,sessionID=xxxxxxx] | Converted exception java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@427a79c[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@76b22a53[Wrapped task = com.vmware.jvsl.sessions.net.impl.TlsPreservingWrapper$2@76e4c67b]] rejected from java.util.concurrent.ThreadPoolExecutor@4a55a6e8[Shutting down, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 60]

And:

Retry connecting to 'HMS@91818625' failed:
java.util.concurrent.RejectedExecutionException: Task com.vmware.jvsl.sessions.net.impl.TlsPreservingWrapper$2@30f658d6 rejected from java.util.concurrent.ThreadPoolExecutor@4a55a6e8[Shutting down, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 83]

hbrsrv.log will show lots of connection issues to the ESXI hosts

2025-04-14T15:36:20.268-05:00 info hbrsrv[03152] [Originator@6876 sub=Host opID=hs-init-30b5b57a] Heartbeat handler detected dead connection for agent: host-#######/hostd

Resolution

Verify connectivity between the ESXI hosts and vSphere replication.
From an SSH session on vSphere replication perform a curl to any of the failed connection ESXI hosts listed in hbrsrv.log 

curl -v telnet://##.##.##.##:443

Follow this KB to verify traffic from ESXI to vSphere replication
Verify connectivity to and from ESXI hosts from vSphere Replication and ensure all port checks work

If no traffic is working from vSphere replication but it is working from ESXI you can verify the static routes:
Verify that static routes exist on eth1 network file

Once networking is up and running - if you are still running into the error verify the certificate used for vSphere replication is correct:
Verify Thumbprints if certificate on vSphere replication was changed

If issues still occur after fixing the static routes and thumbprints you can remove the ESXI hosts and have them re-populate with the following KB:
If no issues with connectivity, remove hosts from hbrsrv database and restart services