Reprotect recovery shows error with Enhanced Replication mappings : "Fault occurred while performing health check. Details: 'Connect: Input/output error"
search cancel

Reprotect recovery shows error with Enhanced Replication mappings : "Fault occurred while performing health check. Details: 'Connect: Input/output error"

book

Article ID: 428951

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

  1. The VMs in Replication appears with below error,

    Error A Replication error occurred at the vSphere replication Server for replication 'VM_Name'. details : 'No connection to VR server for virtual machine VM_Name on host Host_name in Cluster Cluster_Name in Data_center: Unknown'.
    Warning - Not Active (RPO Violation).

  2. The Mappings under Enhanced replication appears with below error,

  3. The ESXi host was recently added to the environment. Immediately following this addition, the enhanced replication mappings began displaying the error below

 

Environment

VMware Live Recovery 9.0.2

Cause

  • The enhanced replication and mappings are failing because the hbr-agent service on the newly added ESXi hosts cannot authenticate with the target ESXi hosts and the vSphere Replication appliance over port 32032. This authentication failure is caused by an invalid broker certificate, which prevents the hbr-agent from logging in successfully. As a result, VMs remain in a 'Not Active' state or fail to replicate entirely.

 

  • The logs From the /var/run/log/hbr-agent log, shows

####-##-##T02:29:10.350Z In(166) hbr-agent-bin[2107714]: [0x0000007014d47700] error: [Proxy [Group: GID-########-####-####-####-############] -> [##.###.###.##:32032]] SSL handshake failed: Connection reset by peer
####-##-##T02:29:10.350Z In(166) hbr-agent-bin[2107714]: [0x0000007014d47700] error: [Proxy [Group: GID-########-####-####-####-############] -> [##.###.###.##:32032]] Failed to connect to server on ##.###.###.##:32032: Connection reset by peer
####-##-##T02:29:10.350Z In(166) hbr-agent-bin[2107714]: [0x0000007014d47700] error: [Proxy [Group: GID-########-####-####-####-############] -> [##.###.###.##:32032]] Failed to reconnect to broker as server: Connection reset by peer
####-##-##T02:29:13.547Z In(166) hbr-agent-bin[2107714]: [0x0000007014eca700] error: [Proxy [Group: GID-########-####-####-####-############] -> [##.###.###.##:32032]] Failed to read from server: Connection timed out
####-##-##T02:29:13.547Z In(166) hbr-agent-bin[2107714]: [0x0000007014e49700] error: [Proxy [Group: GID-########-####-####-####-############] -> [##.###.###.##:32032]] Failed to read from client: Operation canceled
####-##-##T02:29:13.549Z In(166) hbr-agent-bin[2107714]: [0x0000007014eca700] info: [ConfigManager] No user configuration for key=hbrsvc_target_info in ConfigStore.

####-##-##T02:48:01.849Z In(166) hbr-agent-bin[2107714]: [0x0000007014d47700] error: [Proxy [Group: ] -> [##.###.###.##:32032]] Failed to connect to broker on ##.###.###.##:32032: Input/output error
####-##-##T02:48:01.849Z In(166) hbr-agent-bin[2107714]: [0x0000007014d47700] error: [Proxy [Group: ] -> [##.###.###.##:32032]] Failed to connect to broker: Input/output error
####-##-##T02:48:18.273Z In(166) hbr-agent-bin[2107714]: [0x0000007014e49700] error: [Proxy [Group: ] -> [##.###.###.##:32032]] SSL handshake failed: Connection reset by peer
####-##-##T02:48:18.273Z In(166) hbr-agent-bin[2107714]: [0x0000007014e49700] error: [Proxy [Group: ] -> [##.###.###.##:32032]] Failed to connect to broker on ##.###.###.##:32032: Connection reset by peer
####-##-##T02:48:18.273Z In(166) hbr-agent-bin[2107714]: [0x0000007014e49700] error: [Proxy [Group: ] -> [##.###.###.##:32032]] Failed to connect to broker: Connection reset by peer

  • These errors indicate that the hbr-agent cannot establish a connection due to authentication failures from the invalid certificate used by the brokered server.

  • To further investigate, the /var/run/log/hbrsrv.log on the ESXi host indicates network issue which shows "Dropping error encountered from network" messages.

####-##-##T02:43:15.521Z Er(163) hbrsrv[2107256]: [Originator@6876 sub=Main] HbrError stack:
####-##-##T02:43:15.521Z Er(163) hbrsrv[2107256]: [Originator@6876 sub=Main]    [0] ClientConnection (client=[##.###.###.##]:51206) request callback failed: Operation was canceled
####-##-##T02:43:15.521Z Er(163) hbrsrv[2107256]: [Originator@6876 sub=Main]    [1] Dropping error encountered from network
####-##-##T02:43:15.521Z In(166) hbrsrv[3568692]: [Originator@6876 sub=Delta] HbrSrv cleaning out ClientConnection ([##.###.###.##]:51206)
####-##-##T02:43:15.521Z In(166) hbrsrv[2107256]: [Originator@6876 sub=StatsLog] HbrEvent: {"clientAddress"[##.###.###.##]:51206","eventID":"lwdConnectionReset","groupID":"GID-########-####-####-####-############","serverID":"########-####-####-####-############","vimHostName":"#######-New.#######.####.###","hbrEvent":1}
####-##-##T02:43:15.521Z In(166) hbrsrv[2107256]: [Originator@6876 sub=Delta] Destroying client connection (ClientCnx '[##.###.###.##]:51206' id=248 <shut> <clsd>)
####-##-##T02:43:15.521Z In(166) hbrsrv[2196302]: [Originator@6876 sub=Misc] Client session 3739 disconnected from group GID-########-####-####-####-############.

Resolution

To remediate the issue, please follow below steps :

  1. Restart hbr-agent and hbrsrv service on target host (Newly added host).

  2. Run the Enhanced Replication Mapping test and confirm that the mappings are in a healthy state.

  3. Once the mapping test is successful, validate the replication status of the VM, VM replication will resume.