Time related mismatch errors in Site Recovery nodes can cause multiple operational failures.
search cancel

Time related mismatch errors in Site Recovery nodes can cause multiple operational failures.

book

Article ID: 377917

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Site Recovery documentation recommends a synced timestamp set across all nodes in the site pair. ie: vCenter, PSC, vSphere Replication and Site Recovery appliances.
You either use the time settings of the ESXi host on which the appliance is running, or you configure time synchronization with an NTP server


Configure the Time Zone and Time Synchronization Settings for the Site Recovery Manager Appliance

Configure the Time Zone and Time Synchronization Settings for the vSphere Replication Appliance

Any discrepancy between nodes from either 5 seconds to 30 seconds can lead to communication, misconfiguration or basic operational failures.

Timestamp skew/mismatches can lead to...

  • Failure to configure SRM/VR appliance in VAMI wizard.   
        "Failed to register VRMS  ...." ,
        "Failed to connect to Lookup Service.........."

  • SRM/VR service startup failures.                                       
        "Failed trying to retrieve token: ns0:RequestFailed:" 

  • Operational Recovery task failures.                                   
    "Unable to retrieve pairs from extension server..........."   
    "Cannot complete customization..........." 

  • Collection of logs from SRM/VR appliances.                      
    "Invalid message timestamp, created time is in the future" 

Environment

Site Recovery Manager 8.x
Live Site Recovery Manger 9.x
vSphere Replication 8.x. 9.x

Cause

Communication between multiple nodes in a Site Recovery system requires certificate/token exchange.  Verification of the certificate/token relies on a 'time to live' value associated with the creation timestamp.  If the timestamp is mismatched or skewed, the task receiving the certificate/token can fail, citing certificate/token as expired/invalid due to the time difference.

Resolution

Ensure all nodes are within a minimum 5 second time sync difference.  Ensure the NTP is correct and the time on all VCs. PSCs. SRMs. VRs

To verify timestamps.

  • Open a PuTTY session to ALL VC, SRM, VR nodes simultaneously
  • Run the following cmd  
        # watch -d date -u
  • If any nodes timestamp is out of sync,  investigate from where that node is getting its timestamp from and fix accordingly   ie: ESX server hosting the VM,  NTP server address. 
    Restart NTP sever daemon on node. 
    Change NTP server param
    Examine host server timestamp config etc.
  • Continue to monitor the # watch cmd simultaneously on all nodes.

Additional Information

The time skew issue between the vCenter and SRM was causing the plugin issue to throw error "Unable to retrieve Site Recovery Manager summary data."  https://knowledge.broadcom.com/external/article?articleNumber=376950

Configuring vCenter Server to use a Network Time Protocol (NTP) server  - https://knowledge.broadcom.com/external/article?articleNumber=313945

SRM login to remote site fails with error: Failed trying to retrieve token: ns0:RequestFailed - https://knowledge.broadcom.com/external/article?articleNumber=319353

Site Recovery Manager or vSphere Replication cannot complete a site pair operation. The received single sign-on token is valid from XX to YY - https://knowledge.broadcom.com/external/article?articleNumber=312750 

How to address error: Clock skew too great  - https://knowledge.broadcom.com/external/article?articleNumber=160526