SRM Appliance fails to reconfigure after vcenter server license update with Error : java.net.SocketTimeoutException
search cancel

SRM Appliance fails to reconfigure after vcenter server license update with Error : java.net.SocketTimeoutException

book

Article ID: 388404

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms

  • After updating the vCenter server license, the SRM appliance was restarted

  • However, when accessing the site recovery plugin in vCenter, the details for the restarted SRM appliance are not populated

  • Attempts to reconfigure the SRM appliance fails with the following errors

http://1##.0.0.#:9286/sdk invocation failed with "java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-73 [ACTIVE]

Validation Steps: 

  • The srm-server service appears in a stopped state in the SRM VAMI services section

  • Restarting the srm service fails, showing the same timed out error

  • Rebooting the srm appliance does not resolve the issue

Environment

VMware Live Site Recovery 8.x 

VMware Live Site Recovery 9.x 

Cause

After updating the vCenter Server license, the SRM appliance was rebooted, causing it to lose its registration with the lookup service. Subsequent attempts to re-register the SRM appliance with vCenter failed because the DR services were not found in the Lookup Service.

Cause validation

  • From the /var/log/vmware/drconfigui/dr-config.log of the SRM appliance we can see that the attempts to reconfigure the SRM appliance with vCenter timed out

2025-02-12T09:21:46,994 [srm-reactive-thread-23] WARN  com.vmware.dr.configservice.services.RestartServiceHandler 0d63b7ff-818
c-4cf6-9e81-a1267e8719d5 restartService - DrRequestHandlerError:
com.vmware.vim.vmomi.client.exception.ConnectionException: http://127.0.0.1:9286/sdk invocation failed with "java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-14 [ACTIVE]"
        at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.setError(ResponseImpl.java:265)
        at com.vmware.vim.vmomi.client.http.impl.HttpExchangeBase.setResponseError(HttpExchangeBase.java:362)
        at com.vmware.vim.vmomi.client.http.impl.HttpAsyncExchange$1$1.invokeWithinScope(HttpAsyncExchange.java:129)
        at com.vmware.vim.vmomi.core.tracing.NoopTracer$NoopSpan.runWithinSpanContext(NoopTracer.java:120)
        at com.vmware.vim.vmomi.client.http.impl.TracingScopedRunnable.run(TracingScopedRunnable.java:17)
        at com.vmware.dr.ui.tools.utilities.ThreadContext.lambda$wrap$1(ThreadContext.java:55)
        at com.vmware.dr.ui.tools.utilities.ThreadContext.execute(ThreadContext.java:209)
        at com.vmware.dr.ui.tools.utilities.ThreadContext.execute(ThreadContext.java:185)
        at com.vmware.dr.ui.tools.utilities.ThreadContext.setupContext(ThreadContext.java:76)
        at com.vmware.dr.ui.tools.utilities.ThreadContext.setupContext(ThreadContext.java:105)
        at com.vmware.dr.ui.tools.utilities.ExecutorUtils.lambda$wrap$1(ExecutorUtils.java:36)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-14 [ACTIVE]
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387)
        at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:98)
        at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:40)
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:506)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
        ... 1 more

  • Further from the /var/log/vmware/srm/vmware-dr.log of the SRM appliance, we can see that DR services were not found in the Lookup Service

2025-02-12T09:21:19.463Z error vmware-dr[01376] [SRM@6876 sub=LocalLkpServer] Failed to find service
--> (dr.connection.ServiceSpec) {
-->     View ID: DR
-->     Endpoint attributes:
-->     []
-->     Service attributes:
-->     [1 => "e8ff1a19-d264-4e93-9f0b-726b4bb57344", 2 => "com.vmware.vcDr"]
--> }
-->
--> in
--> [NULL]
2025-02-12T09:21:19.466Z error vmware-dr[01376] [SRM@6876 sub=LocalSite] Failed to initialize connections to local services (0)
--> (dr.fault.ServiceNotFound) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>,
-->    service = "com.vmware.dr.vcDr"
-->    msg = ""
--> }
--> [context]zKq7AVECAAQAANjOcAESdm13YXJlLWRyAAAsGRxsaWJ2bWFjb3JlLnNvAAGC3Q5saWJjb25uZWN0aW9uLXBzYy5zbwABC1EZAQ+jGgH2zRkBhaoaAVqrGgF8lRoBDZIaAYhxGwJPLg9saWJjb25uZWN0aW9uLWJhc2Uuc28AAo5wDwKg+wsAzik0ANJCNADgfUkDsI4AbGlicHRocmVhZC5zby4wAATf+g9saWJjLnNvLjYA[/context]
2025-02-12T09:22:19.541Z warning vmware-dr[01371] [SRM@6876 sub=LocalSite] DR service info not found in LS

  • As the service registrations are missing from the lookup service, the srm-server service fails to start. Below events will be observed in /var/log/vmware/dr/drconfig.log

2025-02-12T11:48:17.178Z error drconfig[01010] [SRM@6876 sub=ServiceControl opID=aa2a5054-03fa-4388-acb8-ae972a571fba-Start:97e6] Command "/usr/bin/systemctl start srm-server" exit code: 1
--> stderr:
--> Job for srm-server.service failed because a timeout was exceeded.
--> See "systemctl status srm-server.service" and "journalctl -xe" for details.
-->
2025-02-12T11:48:17.178Z error drconfig[01010] [SRM@6876 sub=DrConfigServiceManager opID=aa2a5054-03fa-4388-acb8-ae972a571fba-Start:97e6] N7Vmacore24InvalidArgumentExceptionE service
--> [context]zKq7AVECAAQAANjOcAEJZHJjb25maWcAAPlkBGxpYmRyLWNvbmZpZy11dGlscy5zbwAB2PkZZHItY29uZmlndXJhdG9yAAIc5Q1saWJkcmNvbmZpZy10eXBlcy5zbwADDd8KbGliZHItdm1vbWkuc28ABM4pNGxpYnZtYWNvcmUuc28ABNJCNATgfUkFsI4AbGlicHRocmVhZC5zby4wAAbf+g9saWJjLnNvLjYA[/context]
2025-02-12T11:48:17.179Z warning drconfig[01010] [SRM@6876 sub=IO.Connection opID=aa2a5054-03fa-4388-acb8-ae972a571fba-Start] Failed to write buffer to stream; <io_obj p:0x00007f890800fc08, h:13, <UNIX '/run/vmware/dr/drconfig-socket'>, <UNIX ''>> e: 32(Broken pipe [system:32]), async: false, duration: 0msec
2025-02-12T11:48:17.179Z error drconfig[01010] [SRM@6876 sub=Listener.HTTPService opID=aa2a5054-03fa-4388-acb8-ae972a571fba-Start] Failed to write to response stream; <<io_obj p:0x00007f890800fc08, h:13, <UNIX '/run/vmware/dr/drconfig-socket'>, <UNIX ''>>, 529bfeef-9ec2-a46f-7754-7d25e8290707>, N7Vmacore15SystemExceptionE(Broken pipe: The communication pipe/socket is explicitly closed by the remote service.)
--> [context]zKq7AVECAAQAANjOcAEPZHJjb25maWcAACwZHGxpYnZtYWNvcmUuc28AAMcKNAA8EzoA6G4qAAMvJwDqLycAmDAnAeeyI2xpYnZtb21pLnNvAAIE4gpsaWJkci12bW9taS5zbwACrxkDAM4pNADSQjQA4H1JA7COAGxpYnB0aHJlYWQuc28uMAAE3/oPbGliYy5zby42AA==[/context]
2025-02-12T11:48:17.180Z error drconfig[01010] [SRM@6876 sub=Default opID=aa2a5054-03fa-4388-acb8-ae972a571fba-Start] Failed to send error to the client
--> (vmodl.fault.SystemError) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>,
-->    reason = "service"
-->    msg = ""
--> }
--> N7Vmacore11IOExceptionE System exception while transmitting HTTP Response:
--> error id = 32
-->

Resolution

To resolve the issue, reregister the SRM appliance with the vCenter server.

Before proceeding with the cleanup of stale service registrations and solution users, please take offline snapshots of the vCenter server appliance. If the vCenter server is in enhanced linked mode, offline snapshots of all the linked appliances are required.

1. Take ssh session to the vcenter server.

2. Run the below command to list all the service registrations and grep for the SRM service-registrations.

# /usr/lib/vmware-lookupsvc/tools/lstool.py list --url http://localhost:7090/lookupservice/sdk > /tmp/psc_services.txt

# less  /tmp/psc_services.txt  | grep -iC4 vcdr

# less  /tmp/psc_services.txt  | grep -iC4 SRM

# less  /tmp/psc_services.txt  | grep -iC4 h5-dr

3. Make a note of the service IDs and remove only the service registrations associated with the impacted SRM appliance using the below command.

# /usr/lib/vmware-lookupsvc/tools/lstool.py unregister --url http://localhost:7090/lookupservice/sdk --user "[email protected]" --password 'Part1tion$' --id 'Service ID' —no-check-cert

4. Run the below command and check the list of solution users

# /usr/lib/vmware-vmafd/bin/dir-cli service list

5. From the above list of solution users identify the solution users related to SRM (srm-xxxx, h5drxxxx) and remove them using the below command

# /usr/lib/vmware-vmafd/bin/dir-cli service delete --name <name of solution user> --login

6. Log in with vCenter Server credentials to https://<vCenter_Server_address>/lookupservice/mob/?moid=ServiceRegistration&method=List&vmodl=1.

7. Search for the VMware Live Site Recovery registrations by replacing the value in the Value field with the following text and click Invoke Method.

<filterCriteria>
   <serviceType>
      <product>com.vmware.dr</product>
      <type>vcDr</type>
   </serviceType>
</filterCriteria>

8. Look for the old VMware Live Site Recovery registration and copy its serviceId value and navigate to https://<vCenter_Server_address>/lookupservice/mob/?moid=ServiceRegistration&method=Delete.

9. To delete the service registration, enter the serviceId value and click Invoke Method.

Once the vcenter server cleanup is complete, reboot the SRM server appliance

Finally, login to the SRM appliance VAMI and reconfigure the appliance to reregister it with the vcenter server.

The site recovery plugin details should now be populated

Ref: Cleaning up decommissioned SRM registration from vCenter Server - For additional details and reference, refer this KB.