Time sync/drift issues between Aria Operations for Logs nodes causes multiple issues
book
Article ID: 387888
calendar_today
Updated On:
Products
VMware Aria Suite
Issue/Introduction
If any of the following symptoms are seen, use the steps in the Resolution section to verify NTP sync status on all nodes in the Aria Operations for Logs cluster.
Entries like below are seen in the /storage/core/loginsight/var/cassandra.log file
[HintsDispatcher:1] 2025-01-21T00:00:00,000 HintsDispatchExecutor.java:294 - Finished hinted handoff of file abcdef01-2345-6789-abcd-ef0123456789-1737417600000-1.hints to endpoint /10.1.2.3:7000: abcdef01-2345-6789-abcd-ef0123456789
Logging in to the UI as the local admin user results in "Error authenticating user"
API call results in HTTP 500 Internal Server Error
Environment
VMware Aria Operations for Logs 8.x
Cause
The internal Cassandra database is sensitive to time drift between nodes in the cluster. Any drift over 1 minute should be resolved to allow for database operations to succeed.
Resolution
Verify that the appliance VMs can communicate and sync with the configured NTP servers
Log in to the Aria Operations for Logs appliance as root via SSH or vSphere Console
Query the time sync status
ntpq -p
Verify that the reach value is 377
Note: reach is an octal counter that indicates the status of the last 8 attempts to contact the configured NTP server.
0 indicates a failed contact 1 indicates a successful contact
377 = 11111111 in binary (meaning all of the last 8 contacts were successful)
The reach count will restart each time the VM or the ntpd service is restarted. Verify that sufficient time has passed since the last restart for 8 contact attempts when checking.
If reach is not 377, network troubleshooting steps such as ping and telnet should be used to verify network connectivity with the configured NTP servers. Review firewalls (external to the appliance VM) to verify that NTP traffic is allowed between all appliance VMs and the configured NTP servers.
Verify that the offset value is between -60000 and 60000
Note: offset is the value in milliseconds that the time on the appliance differs from the NTP server
If the value exceeds 60,000ms (60s) in either direction, manually sync the time with the NTP server
Stop the ntpd service
systemctl stop ntpd
Sync the time with the preferred NTP server
ntpdate ntp_server_ip_or_fqdn
Note: Replace ntp_server_ip_or_fqdn with the IP or FQDN of the preferred NTP server. Use the same NTP server for all nodes.
In environments where NTP is unavailable or unreliable, use the ESXi hosts as the source of time for the appliance VMs. Verify the ESXi hosts all utilize the same time providers and are in sync with each other.
In environments where there is drift between multiple configured NTP servers, use the same single NTP provider for all appliance VMs in the cluster.