Enhanced Replication Mappings: Fault occurred while performing health check
search cancel

Enhanced Replication Mappings: Fault occurred while performing health check

book

Article ID: 381649

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware Live Recovery

Issue/Introduction


Enhanced replication mapping gets stuck while testing connection in SRM and fails after a couple of hours displaying this error - 

Fault occurred while performing health check. Details: 'Connect: Connection reset by peer'




/opt/vmware/hms/logs/hms.log:

Tests initiating:

2024-09-30 18:28:06.072 DEBUG com.vmware.hms.hbrsrvuw.healthmonitor.HealthChecksWorkflow [hms-main-thread-5809] (..hbrsrvuw.healthmonitor.HealthChecksWorkflow) [] | Executing ping test for peer site '28341f6c-####-4774-####-1e4####abba6' and replication mapping '(hms.ReplicationMapping) {
hms.log:2024-09-30 20:32:00.937 DEBUG com.vmware.hms.hbrsrvuw.healthmonitor.HealthChecksWorkflow [hms-main-thread-5809] (..hbrsrvuw.healthmonitor.HealthChecksWorkflow) [] | Ping test for peer site '28341f6c-####-4774-####-1e4####abba6' and replication mapping '(hms.ReplicationMapping) {

Test is successful:

2024-09-30 18:27:53.477 DEBUG com.vmware.hms.net.HbrAgentHealthMonitorService [hms-main-thread-5803] (..hms.net.HbrAgentHealthMonitorService) [] | Ping test result received: {"group":"PING-GID-dbd5b0e8-####-46f1-####-67d70c8aa3dd","endpoints":{"broker":{"address":"172.##.##.##","port":32032,"connectivity":{"tcp":true,"ssl":true},"latency":{"tcp":{"value":22073,"units":"us"}}},"targets":[{"address":"172.##.##.#3","port":32032,"connectivity":{"tcp":true,"ssl":true,"login":true},"latency":{"tcp":{"value":12609,"units":"us"}}}]}}

Test is failing:

2024-09-30 20:30:43.649 DEBUG com.vmware.hms.net.HbrAgentHealthMonitorService [hms-main-thread-5800] (..hms.net.HbrAgentHealthMonitorService) [] | Ping test result received: {"group":"PING-GID-65519a13-####-4a3d-####-cf348baa77b7","endpoints":{"broker":{"address":"172.##.##.##","port":32032,"connectivity":{"tcp":true,"ssl":true},"latency":{"tcp":{"value":20653,"units":"us"}}},"targets":[{"address":"172.##.##.#2","port":32032,"connectivity":{"tcp":true,"ssl":false},"latency":{"tcp":{"value":24284,"units":"us"}},"failReason":"Connect: Connection reset by peer"}]}}


Environment

vSphere Replication 9.x

VMware Live Site Recovery 9.x

Cause


1. Ping tests failing between the hosts at source and target sites 

2. TLS connection cannot be established between the hosts 

3. Firewall 

Enhanced replication does a series of tests to check - SSL connectivity, ping and latency, if any of these tests fail the overall test result will fail. 

Resolution

vSphere Replication Enhanced Replication Mappings

1. Perform a PING test between the source and target hosts

2. Check if TLS connection can be established between the failing host pairs?

Run this command from the Source host > Target host and from the Target host > Source host 

'openssl s_client -connect <target-host IP/FQDN>:32032'  

3. Check the port connectivity from Source host > Target host 

nc -zv [Target-host IP/FQDN] 32032

openssl s_client -connect <IP/FQDN>:<port>  
     
4. Check the port connectivity from Source host > Target Replication appliance 

nc -z [Target-VRMS IP/FQDN] 32032

openssl s_client -connect <IP>:<port>  
     
5. Use Traceroute to check the connectivity from source hosts to target hosts or VRMS 

traceroute [target-host IP/FQDN]

traceroute [target-VRMS IP/FQDN]


6. Check if all the required ports are open on the firewall - Services, Ports, and External Interfaces That the vSphere Replication Virtual Appliance Uses

Please log a ticket with support with these results, if you are still unable to resolve this problem with the help of this KB. 

Example showing a test passing:

root [ /home/admin ]# openssl s_client -connect 10.#.#.#:32032
CONNECTED(00000003)
Can't use SSL_get_servername
depth=0 C = US, ST = California, L = Palo Alto, O = VMware, OU = VMware Engineering, CN = vrms.vmware.org, emailAddress = [email protected]
verify error:num=20:unable to get local issuer certificate
verify return:1
verify error:num=21:unable to verify the first certificate
---
Certificate chain
0 s:C = US, ST = California, L = Palo Alto, O = VMware, OU = VMware Engineering, CN = vrms.vmware.org, emailAddress = [email protected]
   i:CN = CA, DC = vsphere, DC = local, C = US, ST = California, O = vCenter.vmware.org, OU = VMware Engineering
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: May 17 20:39:07 2023 GMT; NotAfter: May 16 20:39:07 2028 GMT
Server certificate
-----BEGIN CERTIFICATE-----
MIIE ... lDQ=
-----END CERTIFICATE-----
subject=C = US, ST = California, L = Palo Alto, O = VMware, OU = VMware Engineering, CN = vrms.vmware.org, emailAddress = [email protected]
issuer=CN = CA, DC = vsphere, DC = local, C = US, ST = California, O = vCenter.vmware.org, OU = VMware Engineering
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: ECDH, prime256v1, 256 bits
SSL handshake has read 1785 bytes and written 423 bytes
Verification error: unable to verify the first certificate
New, TLSv1.2, Cipher is ECDHE-###-######-###-SHA256
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
  Cipher    : ECDHE-###-######-###-SHA256
    Session-ID: 0F7362AF4B8EFD62D758073B4257EBB4F3254E918BDEB1AF65DC693456F71292
    Session-ID-ctx:
    Master-Key: …
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    TLS session ticket lifetime hint: 7200 (seconds)
    TLS session ticket:
    0000 - 99 95 71 e9 76 af ee eb-ef db 1f 9f 38 43 d6 5c   ..q.v.......8C.\
    00b0 - 3a f9 2c 65 df 35 cf 3c-8f a5 e6 ce 85 4a 5f 56   :.,e.5.<.....J_V
    Start Time: 1729044770
    Timeout   : 7200 (sec)
    Verify return code: 21 (unable to verify the first certificate)
    Extended master secret: yes


Example showing a test failing: 

[root@Host:~] openssl s_client -connect host.vmware.org:32032
80FB0882DB000000:error:8000006E:system library:BIO_connect:Connection timed out:crypto/bio/bio_sock2.c:114:calling connect()
80FB0882DB000000:error:10000067:BIO routines:BIO_connect:connect error:crypto/bio/bio_sock2.c:116:
connect:errno=110

Additional Information