Troubleshooting VMware cloud director availability NTP Issue.

Products

VMware Cloud Director

Issue/Introduction

When configuring a new outgoing replication, you see the following error in the Replications Tasks view of the Cloud Director Availability portal:
Assuming task '########-####-####-####-###########' failed, because it's status did not update in a timely fashion.
In /opt/vmware/h4/cloud/log/cloud.log on the Cloud Director Replication Management Appliance on the recovery site, you see a similar message:2019-04-DEBUG -

[UI_/plugins/Vk13Y#Jl/h4/outgoing-replications/Provider_Site/vapp_######-####-####-####-###########_##_##] [job-3] com.vmware.h4.jobengine.JobEngine : Suspending e#ecution for task ########-####-####-####-###########
DEBUG - [UI_/plugins/Vk13Y#Jl/h4/outgoing-replications/Provider_Site/vapp_######-####-####-####-###########_##_##] [job-3] com.vmware.h4.jobengine.JobEngine : Suspending e#ecution for task ########-####-####-####-###########
WARN - [########-####-####-####-###########] [c4-scheduler-2] com.vmware.task.rest.client.TaskMonitor : Task #######-###-####-#####-########## has timed out (it hasn't been updated in 60000 msec)
ERROR - [UI_/plugins/Vk13Y#Jl/h4/outgoing-replications/Provider_Site/vapp_######-####-####-####-###########_##_##] [c4-scheduler-2] com.vmware.h4.jobengine.JobE#ecution : Task ########-####-####-####-######### (WorkflowInfo{type='start', resourceType='vmReplication', resourceId='C4-#######-####-####-###-############, isPrivate=false, resourceName='null'}) has failed
com.vmware.vdr.error.e#ceptions.TaskMonitoringTimeOutE#ception: Assuming task '#######-####-####-####-#######' failed, because it's status did not update in a timely fashion.at sun.reflect.GeneratedConstructorAccessor146.newInstance(Unknown Source)

In the vCDA appliance settings you see NTP service as offline.
In /opt/vmware/h4/cloud/log/cloud.log you see "clock Skew" errors in the cloud.log. Clock skew is the range of time allowed for a server to accept the authentication.

com.sun.#ml.ws.fault.ServerSOAPFaultE#ception: Client received SOAP Fault from server: The time now Tue Sep 10 14:26:12 GMT 2024 does not fall in the request lifetime interval e#tended with clock tolerance of 600000 ms: [ Tue Sep 10 14:27:39 GMT 2024; Tue Sep 10 14:57:39 GMT 2024). This might be due to a clock skew problem. Please see the server log to find more detail regarding e#act cause of the failure.

INFO - [UI-########-####-#####-####-#####-####-##-##-I1-W] [https-jsse-nio-8043-e#ec-2] okenServiceImpl$RequestResponseProcessor : Request message has e#pired. Server message: ns0:MessageE#pired: The time now Tue Sep 10 14:26:12 GMT 2024 does not fall in the request lifetime interval e#tended with clock tolerance of 600000 ms: [ Tue Sep 10 14:27:39 GMT 2024; Tue Sep 10 14:57:39 GMT 2024). This might be due to a clock skew problem.

INFO - [UI-########-####-#####-####-#####-####-##-##-I1-WP] [https-jsse-nio-8043-e#ec-2] okenServiceImpl$RequestResponseProcessor : Server returned 'request e#pired' less than 0 seconds after request was issued, but it shouldn't have expired for at least 600 seconds.

vCenter Server Lookup service errors out from VCDA Replicator Service management interface with following error :

"Operation canceled due to an unexpected error"

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware Cloud Director Availability 4.x

Cause

This issue can occur when there is a time drift between the Cloud Director Availability components on the protected and recovery sites.

Resolution

To resolve this issue, follow the below steps to configure the time settings.

SSH into all VCDA appliances and run the following command to check the time:

    # watch -n 0.1 date

Verify the time in the following components across all sites:

Tunnel Appliance
Cloud Director Replication Management Appliance
Replicator Appliance(s)
Cloud Director cells
vCenter Server(s)
Platform Services Controller
ESXi Hosts
Additionally, if there is an on-premises appliance: Run the same command in the SSH console for the on-premises appliance.
Ensure that the Cloud Director Availability on-premises appliance, vCenter Server, Platform services controller, and ESXi Hosts have their times synced to the same NTP source.

To check the NTP services status in vCDA run the following command:

    # systemctl status systemd-timesyncd

To restart the NTP services run the following command:

    # systemctl restart systemd-timesyncd

Additional Information

To modify time synchronization on vCenter Server, see Configure the System Time Zone and Time Synchronization Settings.

To modify time synchronization on VCDA appliances, edit NTP section under Appliance Settings , see Configure the network settings of the appliance.