After performing a disaster recovery operations or a cluster-wide reboot of Aria Operations, you see these symptoms:
Aria Operations Administrator interface (https://<Aria_Operations_FQDN/IP_Address>/admin) may report the status message:
Waiting for Analytics
Aria Operations Administrator interface (https://<Aria_Operations_FQDN/IP_Address>/admin) displays that nodes are not coming Online (or remain in Offline Status):
In the Aria Operations Primary or Primary Replica node's ntp logs (located at: /var/log/), you may observe:
ntpd[9764]: no reply; clock not set ntpd[9798]: ntpd exiting on signal 15
In the Aria Operations Primary or Primary Replica node's analytics-wrapper.log (located at: /storage/log/vcops/logs/), you may observe:
INFO | jvm 1 | YYYY/MM/DD <time> | >>> AnalyticsMain.run failed with error: IllegalStateException: time difference between servers is 37110 ms. It is greater than 30000 ms. Unable to operate, terminating...
</time>
Note: The time difference between servers will be unique to the time drift between the Aria Operation nodes.
Environment
Aria Operations 8.x
Cause
This issue occurs due to NTP time drift between the Aria Operations 8.x nodes.
Resolution
Ensure all NTP servers configured for use with the Aria Operations nodes (Analytics and Cloud Proxies) are accessible.
Update the ntp.conf file (located in /etc/) with new NTP server(s) in each Aria Operations node if the original NTP servers are no longer available.
Process Steps:
NOTE: You do not have to restart any nodes before or after completing these steps.
Login as root to each Aria Operations cluster node and cloud proxy
Verify the NTP server(s) configuration is correct by reviewing /etc/ntp.conf
Verify the NTP IP or FQDN is listed under the ## CaSA Section Start # Added by CaSA
Ping the NTP server(s) via the configured IP or FQDN to ensure successful communication from each node and cloud proxy.
On each node and cloud proxy, complete the following:
Stop the NTP daemon service:
systemctl stop ntpd
Sync the time with the time server:
ntpdate -u <NTP_Server-IPorFQDN>
Start the NTP daemon service:
systemctl start ntpd
Verify the time is current via the date command:
date
Monitor the cluster status from the admin UI and validate everything comes Online as expected