Enhanced replication mapping gets stuck while testing connection in SRM and fails after a couple of hours displaying this error -
/opt/vmware/hms/logs/hms.log:
2024-09-30 18:28:06.072 DEBUG com.vmware.hms.hbrsrvuw.healthmonitor.HealthChecksWorkflow [hms-main-thread-5809] (..hbrsrvuw.healthmonitor.HealthChecksWorkflow) [] | Executing ping test for peer site '28341f6c-####-4774-####-1e4####abba6' and replication mapping '(hms.ReplicationMapping) {
hms.log:2024-09-30 20:32:00.937 DEBUG com.vmware.hms.hbrsrvuw.healthmonitor.HealthChecksWorkflow [hms-main-thread-5809] (..hbrsrvuw.healthmonitor.HealthChecksWorkflow) [] | Ping test for peer site '28341f6c-####-4774-####-1e4####abba6' and replication mapping '(hms.ReplicationMapping) {
2024-09-30 18:27:53.477 DEBUG com.vmware.hms.net.HbrAgentHealthMonitorService [hms-main-thread-5803] (..hms.net.HbrAgentHealthMonitorService) [] | Ping test result received: {"group":"PING-GID-dbd5b0e8-####-46f1-####-67d70c8aa3dd","endpoints":{"broker":{"address":"172.##.##.##","port":32032,"connectivity":{"tcp":true,"ssl":true},"latency":{"tcp":{"value":22073,"units":"us"}}},"targets":[{"address":"172.##.##.#3","port":32032,"connectivity":{"tcp":true,"ssl":true,"login":true},"latency":{"tcp":{"value":12609,"units":"us"}}}]}}
2024-09-30 20:30:43.649 DEBUG com.vmware.hms.net.HbrAgentHealthMonitorService [hms-main-thread-5800] (..hms.net.HbrAgentHealthMonitorService) [] | Ping test result received: {"group":"PING-GID-65519a13-####-4a3d-####-cf348baa77b7","endpoints":{"broker":{"address":"172.##.##.##","port":32032,"connectivity":{"tcp":true,"ssl":true},"latency":{"tcp":{"value":20653,"units":"us"}}},"targets":[{"address":"172.##.##.#2","port":32032,"connectivity":{"tcp":true,"ssl":false},"latency":{"tcp":{"value":24284,"units":"us"}},"failReason":"Connect: Connection reset by peer"}]}}
vSphere Replication 9.x
VMware Live Site Recovery 9.x
1. Ping tests failing between the hosts at source and target sites
2. TLS connection cannot be established between the hosts
3. Firewall
Enhanced replication does a series of tests to check - SSL connectivity, ping and latency, if any of these tests fail the overall test result will fail.
vSphere Replication Enhanced Replication Mappings
1. Perform a PING test between the source and target hosts
2. Check if TLS connection can be established between the failing host pairs?
Run this command from the Source host > Target host and from the Target host > Source host
'openssl s_client -connect <target-host IP/FQDN>:32032'
3. Check the port connectivity from Source host > Target host
nc -zv [Target-host IP/FQDN] 32032
openssl s_client -connect <IP/FQDN>:<port>
4. Check the port connectivity from Source host > Target Replication appliance
nc -z [Target-VRMS IP/FQDN] 32032
openssl s_client -connect <IP>:<port>
5. Use Traceroute to check the connectivity from source hosts to target hosts or VRMS
traceroute [target-host IP/FQDN]
traceroute [target-VRMS IP/FQDN]
6. Check if all the required ports are open on the firewall - Services, Ports, and External Interfaces That the vSphere Replication Virtual Appliance Uses
Please log a ticket with support with these results, if you are still unable to resolve this problem with the help of this KB.
Example showing a test passing:
root [ /home/admin ]# openssl s_client -connect 10.#.#.#:32032
CONNECTED(00000003)
Can't use SSL_get_servername
depth=0 C = US, ST = California, L = Palo Alto, O = VMware, OU = VMware Engineering, CN = vrms.vmware.org, emailAddress = [email protected]
verify error:num=20:unable to get local issuer certificate
verify return:1
verify error:num=21:unable to verify the first certificate
---
Certificate chain
0 s:C = US, ST = California, L = Palo Alto, O = VMware, OU = VMware Engineering, CN = vrms.vmware.org, emailAddress = [email protected]
i:CN = CA, DC = vsphere, DC = local, C = US, ST = California, O = vCenter.vmware.org, OU = VMware Engineering
a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
v:NotBefore: May 17 20:39:07 2023 GMT; NotAfter: May 16 20:39:07 2028 GMT
Server certificate
-----BEGIN CERTIFICATE-----
MIIE ... lDQ=
-----END CERTIFICATE-----
subject=C = US, ST = California, L = Palo Alto, O = VMware, OU = VMware Engineering, CN = vrms.vmware.org, emailAddress = [email protected]
issuer=CN = CA, DC = vsphere, DC = local, C = US, ST = California, O = vCenter.vmware.org, OU = VMware Engineering
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: ECDH, prime256v1, 256 bits
SSL handshake has read 1785 bytes and written 423 bytes
Verification error: unable to verify the first certificate
New, TLSv1.2, Cipher is ECDHE-###-######-###-SHA256
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
Protocol : TLSv1.2
Cipher : ECDHE-###-######-###-SHA256
Session-ID: 0F7362AF4B8EFD62D758073B4257EBB4F3254E918BDEB1AF65DC693456F71292
Session-ID-ctx:
Master-Key: …
PSK identity: None
PSK identity hint: None
SRP username: None
TLS session ticket lifetime hint: 7200 (seconds)
TLS session ticket:
0000 - 99 95 71 e9 76 af ee eb-ef db 1f 9f 38 43 d6 5c ..q.v.......8C.\
00b0 - 3a f9 2c 65 df 35 cf 3c-8f a5 e6 ce 85 4a 5f 56 :.,e.5.<.....J_V
Start Time: 1729044770
Timeout : 7200 (sec)
Verify return code: 21 (unable to verify the first certificate)
Extended master secret: yes
Example showing a test failing:
[root@Host:~] openssl s_client -connect host.vmware.org:32032
80FB0882DB000000:error:8000006E:system library:BIO_connect:Connection timed out:crypto/bio/bio_sock2.c:114:calling connect()
80FB0882DB000000:error:10000067:BIO routines:BIO_connect:connect error:crypto/bio/bio_sock2.c:116:
connect:errno=110