Error "VR synchronization failed for VRM group <VM>. Synchronization monitoring has stopped" on running Reprotect
search cancel

Error "VR synchronization failed for VRM group <VM>. Synchronization monitoring has stopped" on running Reprotect

book

Article ID: 419136

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

  • Reprotect from DR site to Production site fails for VMs with following error:

    “VR synchronization failed for VRM group <VM>. Synchronization monitoring has stopped. Please verify replication traffic connectivity between the source host and the target vSphere Replication Server. Synchronization monitoring will resume when connectivity issues are resolved.”


  • Reprotect fails at step-5 'Synchronization Storage'.




  • Source VR server (DR site) reports following errors:

    (in /opt/vmware/hms/logs/hms.log)

YYYY-MM-DD HH:MM:SS.### INFO  com.vmware.hms.i18n.class com.vmware.hms.response.filter.I18nActivationResponseFilter [tcweb-86] (..response.filter.I18nActivationResponseFilter) [operationID=########-####-####-####-#############-HMS-########,sessionID=#########] | The localized message is: A replication error occurred at the vSphere Replication Server for replication '<VM-Name>'. Details: 'No connection to VR Server for virtual machine <VM-Name> on host <Source-ESXi> in cluster <Source-Cluster> in <Source-vCenter>: Unknown'. 

Note : <Source-ESXi>, <Source-Cluster>, <Source-vCenter> -- mentioned in the above logging are from DR site where Reprotect is run.

Environment

Vmware Live Site Recovery 9.0.2

Cause

  • Cause of the issue is no connection between Source (DR site) and Target ESXi (Production site) over port 32032. This may occur due to block by firewall or a broken connection due to some network glitch.

  • From hbr-agent.log of DR site ESXi hosts, following errors were observed indicating connection issue over port 32032:

YYYY-MM-DD HH:MM:SS.### In(166) hbr-agent-bin[#######]: [0x################] error: [Proxy [Group: GID-########-####-####-####-#############] -> [#.#.#.#:32032] Failed to connect to #.#.#.#:32032. Using nic 'vmkX'. Error: Connection timed out

YYYY-MM-DD HH:MM:SS.### In(166) hbr-agent-bin[#######]: [0x################] error: [Proxy [Group: GID-########-####-####-####-#############] -> [#.#.#.#:32032] Failed to connect to #.#.#.#:32032. Using nic 'vmkX'. Error: Connection timed out

YYYY-MM-DD HH:MM:SS.###In(166) hbr-agent-bin[#######]: [0x################] error: [Proxy [Group: GID-########-####-####-####-#############] -> [#.#.#.#:32032] Failed to connect to #.#.#.#:32032. Using nic 'vmkX'. Error: Connection timed out


Note : (in the above logging)

        1. GID is the GID of the replicating VM that failed to complete Reprotect
        2. #.#.#.# is the IP address of the VMkernel adapter of Target ESXi on which vSphere Replication and vSphere Replication NFC traffic is enabled.
        3. vmkX is the VMkernel adapter where vSphere Replication and vSphere Replication NFC traffic is enabled.

Resolution

  • Run following command from Target DR ESXi server (hosting protected VMs) to Source ESXi hosts to verify connection over port 32032.

# nc -z -s <DR-ESXi-vmk-IP> <Prod-ESXi-vmk-IP> 32032

  • Involve internal networking team to enable/unblock connection between these set of hosts over port 32032.