vSphere Replication fails for multiple VMs with status Not Active (RPO Violation)
search cancel

vSphere Replication fails for multiple VMs with status Not Active (RPO Violation)

book

Article ID: 401159

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

  • vSphere Replication fails for multiple VMs with status Not Active (RPO Violation) detailed error:  A replication error occurred at the vSphere Replication Server for replication. Details: 'No connection to VR Server for virtual machine on host.

  • Attempted to configure replication (mapping test), received the following error.

  • While running test for VMware Live Site Recovery 9.0.3 from Site Recover page > Configure > vSphere Replication > Replication Mappings, fails with below error:

    "The source host (id: 'host-###', name: '###.###.#.###') successfully connected to both the target broker '###.###.#.###' and the target host (id: 'host-##', name: '###.###.#.###'), but the login to the target host '###.###.#.###' failed. The server mappings might not have been updated for the source host '###.###.#.###' on the target broker '1###.###.#.###', or the target host's '###.###.#.###' certificate might have expired. The target host '###.###.#.###' might not have yet received the certificate from the target broker '###.###.#.###' or the broker's certificate on the target host '###.###.#.###' might have expired."

Environment

VMware vSphere Replication 9.x
VMware vSphere Replication 8.x

Cause

  • VR, SRM, vCenter and ESXi host has time difference(synchronization).
  • Following events confirms host clocks may be out of sync

    vmkernel.log (Source ESXi Host):
    YYYY-MM-DDT09:42:23.322Z Wa(180) vmkwarning: cpu22:29907991)WARNING: Hbr: 788: Failed to receive from ###.#.#.# (groupID=GID-#####): Broken pipe
    YYYY-MM-DDT09:42:23.322Z Wa(180) vmkwarning: cpu22:29907991)WARNING: Hbr: 2542: Failed to receive extended handshake response
    YYYY-MM-DDT09:42:23.322Z Wa(180) vmkwarning: cpu22:29907991)WARNING: Hbr: 5362: Failed to establish connection to [###.#.#.#]:32032 (groupID=GID-#####): Broken pipe

    hbr-agent.log  (Source Site):
    YYYY-MM-DDT09:54:23.341Z In(166) hbr-agent-bin[2100560]: [0x0000003ace2c5700] error: [Proxy [Group: GID-#####] -> [#####.63:32032]] Failed to login to brokered server additional error info: Broker or host clocks may be out of sync.
    YYYY-MM-DDT09:54:23.341Z In(166) hbr-agent-bin[2100560]: [0x0000003ace2c5700] error: [Proxy [Group: GID-#####] -> [#####.63:32032]] Exhausted all server endpoints reported by broker.


    hbrsrv.log (Destination Host):
    YYYY-MM-DDT09:54:22.435Z Er(163) hbrsrv[2100365]: [Originator@6876 sub=Main opID=hsl-0] HbrError stack:
    YYYY-MM-DDT09:54:22.435Z Er(163) hbrsrv[2100365]: [Originator@6876 sub=Main opID=hsl-0]    [0] Broker or host clocks may be out of sync.
    YYYY-MM-DDT09:54:22.435Z Er(163) hbrsrv[2100365]: [Originator@6876 sub=Main opID=hsl-0]    [1] Unverified token with ID '#####'is not valid: TOO_NEW (8)
    YYYY-MM-DDT09:54:22.435Z Er(163) hbrsrv[2100365]: [Originator@6876 sub=Main opID=hsl-0]    [2] ClientConnection (client=[#####.11]:50248) failed login attempt
    YYYY-MM-DDT09:54:22.435Z Er(163) hbrsrv[2100365]: [Originator@6876 sub=Main opID=hsl-0]    [3] Hiding full error from unauthenticated client.
    YYYY-MM-DDT09:54:22.435Z Er(163) hbrsrv[2100365]: [Originator@6876 sub=Main opID=hsl-0] HbrError stack:
    YYYY-MM-DDT09:54:22.435Z Er(163) hbrsrv[2100365]: [Originator@6876 sub=Main opID=hsl-0]    [0] Broker or host clocks may be out of sync.

Resolution

For vSphere Replication to function correctly, time should be synced across all the ESXi, VMware Live Site Recovery, vSphere Replication and vCenter appliances.    

 

  • Manually set the time on VR, ESXi and SRM appliances to match the time with vCenter server. 
  • Steps to change the date and Time of SRM/VR manually using following commands:
    1. Capture vCenter server appliance date and time,

      $ watch date - this output gives current time.

      Note: the same command needs to be run on other appliances having issues.

    2. To manually change use below command

      $ date --set="YY-MM-DD HH:MM:SS" 

    3. Restart NTP services

      $service ntpd stop

      $service ntpd start

      $service ntpd status