After updating the vCenter server license, the SRM appliance was restarted
However, when accessing the site recovery plugin in vCenter, the details for the restarted SRM appliance are not populated
http://1##.0.0.#:9286/sdk invocation failed with "java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-73 [ACTIVE]
The srm-server service appears in a stopped state in the SRM VAMI services section
Restarting the srm service fails, showing the same timed out error
VMware Live Site Recovery 8.x
VMware Live Site Recovery 9.x
After updating the vCenter Server license, the SRM appliance was rebooted, causing it to lose its registration with the lookup service. Subsequent attempts to re-register the SRM appliance with vCenter failed because the DR services were not found in the Lookup Service.
2025-02-12T09:21:46,994 [srm-reactive-thread-23] WARN com.vmware.dr.configservice.services.RestartServiceHandler 0d63b7ff-818
c-4cf6-9e81-a1267e8719d5 restartService - DrRequestHandlerError:
com.vmware.vim.vmomi.client.exception.ConnectionException: http://127.0.0.1:9286/sdk invocation failed with "java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-14 [ACTIVE]"
at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.setError(ResponseImpl.java:265)
at com.vmware.vim.vmomi.client.http.impl.HttpExchangeBase.setResponseError(HttpExchangeBase.java:362)
at com.vmware.vim.vmomi.client.http.impl.HttpAsyncExchange$1$1.invokeWithinScope(HttpAsyncExchange.java:129)
at com.vmware.vim.vmomi.core.tracing.NoopTracer$NoopSpan.runWithinSpanContext(NoopTracer.java:120)
at com.vmware.vim.vmomi.client.http.impl.TracingScopedRunnable.run(TracingScopedRunnable.java:17)
at com.vmware.dr.ui.tools.utilities.ThreadContext.lambda$wrap$1(ThreadContext.java:55)
at com.vmware.dr.ui.tools.utilities.ThreadContext.execute(ThreadContext.java:209)
at com.vmware.dr.ui.tools.utilities.ThreadContext.execute(ThreadContext.java:185)
at com.vmware.dr.ui.tools.utilities.ThreadContext.setupContext(ThreadContext.java:76)
at com.vmware.dr.ui.tools.utilities.ThreadContext.setupContext(ThreadContext.java:105)
at com.vmware.dr.ui.tools.utilities.ExecutorUtils.lambda$wrap$1(ExecutorUtils.java:36)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-14 [ACTIVE]
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387)
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:98)
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:40)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:506)
at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
... 1 more
2025-02-12T09:21:19.463Z error vmware-dr[01376] [SRM@6876 sub=LocalLkpServer] Failed to find service
--> (dr.connection.ServiceSpec) {
--> View ID: DR
--> Endpoint attributes:
--> []
--> Service attributes:
--> [1 => "e8ff1a19-d264-4e93-9f0b-726b4bb57344", 2 => "com.vmware.vcDr"]
--> }
-->
--> in
--> [NULL]
2025-02-12T09:21:19.466Z error vmware-dr[01376] [SRM@6876 sub=LocalSite] Failed to initialize connections to local services (0)
--> (dr.fault.ServiceNotFound) {
--> faultCause = (vmodl.MethodFault) null,
--> faultMessage = <unset>,
--> service = "com.vmware.dr.vcDr"
--> msg = ""
--> }
--> [context]zKq7AVECAAQAANjOcAESdm13YXJlLWRyAAAsGRxsaWJ2bWFjb3JlLnNvAAGC3Q5saWJjb25uZWN0aW9uLXBzYy5zbwABC1EZAQ+jGgH2zRkBhaoaAVqrGgF8lRoBDZIaAYhxGwJPLg9saWJjb25uZWN0aW9uLWJhc2Uuc28AAo5wDwKg+wsAzik0ANJCNADgfUkDsI4AbGlicHRocmVhZC5zby4wAATf+g9saWJjLnNvLjYA[/context]
2025-02-12T09:22:19.541Z warning vmware-dr[01371] [SRM@6876 sub=LocalSite] DR service info not found in LS
2025-02-12T11:48:17.178Z error drconfig[01010] [SRM@6876 sub=ServiceControl opID=aa2a5054-03fa-4388-acb8-ae972a571fba-Start:97e6] Command "/usr/bin/systemctl start srm-server" exit code: 1
--> stderr:
--> Job for srm-server.service failed because a timeout was exceeded.
--> See "systemctl status srm-server.service" and "journalctl -xe" for details.
-->
2025-02-12T11:48:17.178Z error drconfig[01010] [SRM@6876 sub=DrConfigServiceManager opID=aa2a5054-03fa-4388-acb8-ae972a571fba-Start:97e6] N7Vmacore24InvalidArgumentExceptionE service
--> [context]zKq7AVECAAQAANjOcAEJZHJjb25maWcAAPlkBGxpYmRyLWNvbmZpZy11dGlscy5zbwAB2PkZZHItY29uZmlndXJhdG9yAAIc5Q1saWJkcmNvbmZpZy10eXBlcy5zbwADDd8KbGliZHItdm1vbWkuc28ABM4pNGxpYnZtYWNvcmUuc28ABNJCNATgfUkFsI4AbGlicHRocmVhZC5zby4wAAbf+g9saWJjLnNvLjYA[/context]
2025-02-12T11:48:17.179Z warning drconfig[01010] [SRM@6876 sub=IO.Connection opID=aa2a5054-03fa-4388-acb8-ae972a571fba-Start] Failed to write buffer to stream; <io_obj p:0x00007f890800fc08, h:13, <UNIX '/run/vmware/dr/drconfig-socket'>, <UNIX ''>> e: 32(Broken pipe [system:32]), async: false, duration: 0msec
2025-02-12T11:48:17.179Z error drconfig[01010] [SRM@6876 sub=Listener.HTTPService opID=aa2a5054-03fa-4388-acb8-ae972a571fba-Start] Failed to write to response stream; <<io_obj p:0x00007f890800fc08, h:13, <UNIX '/run/vmware/dr/drconfig-socket'>, <UNIX ''>>, 529bfeef-9ec2-a46f-7754-7d25e8290707>, N7Vmacore15SystemExceptionE(Broken pipe: The communication pipe/socket is explicitly closed by the remote service.)
--> [context]zKq7AVECAAQAANjOcAEPZHJjb25maWcAACwZHGxpYnZtYWNvcmUuc28AAMcKNAA8EzoA6G4qAAMvJwDqLycAmDAnAeeyI2xpYnZtb21pLnNvAAIE4gpsaWJkci12bW9taS5zbwACrxkDAM4pNADSQjQA4H1JA7COAGxpYnB0aHJlYWQuc28uMAAE3/oPbGliYy5zby42AA==[/context]
2025-02-12T11:48:17.180Z error drconfig[01010] [SRM@6876 sub=Default opID=aa2a5054-03fa-4388-acb8-ae972a571fba-Start] Failed to send error to the client
--> (vmodl.fault.SystemError) {
--> faultCause = (vmodl.MethodFault) null,
--> faultMessage = <unset>,
--> reason = "service"
--> msg = ""
--> }
--> N7Vmacore11IOExceptionE System exception while transmitting HTTP Response:
--> error id = 32
-->
To resolve the issue, reregister the SRM appliance with the vCenter server.
Before proceeding with the cleanup of stale service registrations and solution users, please take offline snapshots of the vCenter server appliance. If the vCenter server is in enhanced linked mode, offline snapshots of all the linked appliances are required.
1. Take ssh session to the vcenter server.
2. Run the below command to list all the service registrations and grep for the SRM service-registrations.
# /usr/lib/vmware-lookupsvc/tools/lstool.py list --url http://localhost:7090/lookupservice/sdk > /tmp/psc_services.txt
# less /tmp/psc_services.txt | grep -iC4 vcdr
# less /tmp/psc_services.txt | grep -iC4 SRM
# less /tmp/psc_services.txt | grep -iC4 h5-dr
3. Make a note of the service IDs and remove only the service registrations associated with the impacted SRM appliance using the below command.
# /usr/lib/vmware-lookupsvc/tools/lstool.py unregister --url http://localhost:7090/lookupservice/sdk --user "[email protected]" --password 'Part1tion$' --id 'Service ID' —no-check-cert
4. Run the below command and check the list of solution users
# /usr/lib/vmware-vmafd/bin/dir-cli service list
5. From the above list of solution users identify the solution users related to SRM (srm-xxxx, h5drxxxx) and remove them using the below command
# /usr/lib/vmware-vmafd/bin/dir-cli service delete --name <name of solution user> --login
6. Log in with vCenter Server credentials to https://<vCenter_Server_address>/lookupservice/mob/?moid=ServiceRegistration&method=List&vmodl=1.
7. Search for the VMware Live Site Recovery registrations by replacing the value in the Value field with the following text and click Invoke Method.
<filterCriteria>
<serviceType>
<product>com.vmware.dr</product>
<type>vcDr</type>
</serviceType>
</filterCriteria>
8. Look for the old VMware Live Site Recovery registration and copy its serviceId
value and navigate to https://<vCenter_Server_address>/lookupservice/mob/?moid=ServiceRegistration&method=Delete.
9. To delete the service registration, enter the serviceId
value and click Invoke Method.
Once the vcenter server cleanup is complete, reboot the SRM server appliance
Finally, login to the SRM appliance VAMI and reconfigure the appliance to reregister it with the vcenter server.
The site recovery plugin details should now be populated
Ref: Cleaning up decommissioned SRM registration from vCenter Server - For additional details and reference, refer this KB.