The Protection Groups (PGs) are marked as "Not Configured" in Site Recovery UI from both sites (Source and Target).
The VMs associated with these Protection Groups are also in an invalid state with error "Virtual Machine is no longer protected. The session is not authenticated".
The Site Pairing between the two sites shows as healthy on one site, but the second site reports as disconnected.
/opt/vmware/support/logs/srm/vmware-dr.log file, where it is evident that SRM and VR could not communicate due to ping failures and subsequent connection resets. Below is a breakdown of the logs:Logs Indicating Connectivity Failures:2025-05-10T10:42:43.921+05:30 warning vmware-dr[01489] [SRM@6876 sub=LocalHms connID=hms-d48f] Ping failed: "14745869678902165130"2025-05-10T10:47:06.040+05:30 warning vmware-dr[01319] [SRM@6876 sub=LocalHms connID=hms-d48f] Ping failed: "924065267929586046"2025-05-10T10:47:06.105+05:30 warning vmware-dr[01444] [SRM@6876 sub=vmomi.soapStub[44] connID=hms-d48f] Terminating invocation; <SSL(<io_obj p:0x00007f1f6c1ff460, h:96, <TCP '###.##.##.## : 52794'>, <TCP '###.##.##.## : 8043'>>), />, moref: hms.ReplicationManager:replication-manager, method: findReplicationGroup2025-05-10T10:49:14.057+05:30 verbose vmware-dr[01495] [SRM@6876 sub=LocalHms connID=hms-d48f] Connect succeeded, new connection context "15506752107471633429"
These ping failures indicate that the SRM appliance was unable to reach the VR appliance for a period of around 5 minutes, during which time the connection was unreliable.
2025-05-10T10:44:57.209+05:30 verbose vmware-dr[01319] [SRM@6876 sub=vmomi.soapStub[27] connID=hms-d48f] Resetting stub adapter; <[N7Vmacore4Http3Ext15DrUserAgentImplE:0x00007f1f08058378], />, N2Dr5Fault22HmsConnectionDownFault9ExceptionE(Fault cause: dr.fault.HmsConnectionDownFault
This log entry signifies that the SRM server is unable to connect to the Health Monitoring Service (HMS) due to an underlying network connectivity issue.
2025-05-10T10:42:43.907+05:30 warning vmware-dr[01317] [SRM@6876 sub=IO.Connection opID=9d72322f] Address resolution took too long; <resolver p:0x00007f1f7c0f5bf0, 'dr_vcenter.in:443', next:(null)>, async: true, duration: 133464msec
2025-05-10T11:04:52.181+05:30 error vmware-dr[01479] [SRM@6876 sub=Listener.HTTPService opID=58322f8e-a84f-4d4c-a02b-8085b2fa9b14-loginByToken] [52614] Failed to write to response stream; <<io_obj p:0x00007f1f4c02a768, h:25, <UNIX '/run/vmware/srm/srm-socket'>, <UNIX ''>>, 52614b16-c67f-ba6c-51cf-ab7c339c03db>, N7Vmacore15SystemExceptionE(Broken pipe: The communication pipe/socket is explicitly closed by the remote service.)
May 10 10:44:53 DR_vcenter vpxd[6422]: Event [173530208] [1-1] [2025-05-10T05:14:53.2282Z] [vim.event.ExtendedEvent] [warning] [VSHPERE.LOCAL\SRM-36c744bd-22a4-486e-####-a0d2353aee0a] [Datacenter] [173530208] ['VM_Name' in group 'PG_Group': Cannot resolve the file locations of the production VM for replication. (###.##.##.##)]
/opt/vmware/support/logs/dr-client/dr.log from the vSphere Replication appliance indicates a loss of connectivity with the SRM appliance. The error message below suggests that the vSphere Replication appliance was unable to establish communication with the SRM server at the specified address:2025-05-06 10:35:22,505 [srm-reactive-thread-106] WARN com.vmware.dr.ui.tools.reactive.impl.PromiseImpl 3110140686583695429 d3798ce5-d385-4043-87c2-f595f145a9e5 getPairSrmSummaryIssues - Function 'com.vmware.srm.client.infrastructure.pc.utils.PCUtil$$Lambda/0x00007f190cf89d40@730528ef' failed.java.lang.RuntimeException: No connection to server at: https://###.##.##.##:443/drserver/vcdr/vmomi/sdk
To resolve the issue and restore proper functionality to SRM and vSphere Replication (VR), the following actions are recommended:
Network Connectivity Review:
Work closely with the network team to identify and resolve any connectivity issues between the SRM appliance and the vSphere Replication appliances. Common issues could involve:
Firewall/ACL rules blocking required ports.
Network congestion or latency issues affecting communication between the appliances.
Routing issues between the two sites, leading to disconnects.
# tcpdump -i eth0 -w /tmp/pkt_name.pcappkt_name.pcap). Once collected, review the capture file using tools like Wireshark to pinpoint issues.DNS Resolution Optimization:
Address the DNS resolution delays by ensuring that SRM and vSphere Replication components can resolve fully qualified domain names (FQDNs) within an acceptable timeframe.
Workaround (Temporary Solution):
If network issues persist and an immediate resolution is needed, restarting the srm-server.service can temporarily restore communication between the SRM and VR appliances. This action will allow the protection groups to return to a healthy state. However, this is a temporary measure, and the underlying network issue should be addressed for a permanent fix.