This article provides troubleshooting steps for identifying and isolating problems with NTP time synchronization on ESXi hosts.
VMware vSphere ESXi 6.x |
VMware vSphere ESXi 7.x |
VMware vSphere ESXi 8.x |
Use these methods to troubleshoot NTP on ESXi hosts:
CHECKLIST/INDEX
1) Validate connectivity/networking setup
Validate network connectivity between the ESXi host and the NTP server/upstream time source using
a. the ping command (for e.g., ping <IP address>).
b. the traceroute command to trace NTP packet route.
If found unreachable, make sure that an ip route to the ntp server(s) is available, and that the network on your host environment is properly set up.
For more information, refer to Testing network connectivity with the ping command (1003486).
2) Query ntpd service using ntpq
Use the NTP Query utility program ntpq to remotely query the ESXi/ESX host's ntpq service.
The ntpq utility is commonly installed on Linux clients and is also available in the ESX service console and the vSphere Management Assistant. For more information on the installation and use of the ntpq utility program on a given Linux distribution, see your Linux distribution's documentation.
For an ESXi 5.x and later host, the ntpq utility is included by default and does not need to be installed. It can be run locally from the ESXi 5.x and later host.
The ntpq utility is not available on ESXi 3.x/4.x. To query an ESXi host's NTP service ntpd, install ntpq on a remote Linux client and query the ESXi host's ntpd service from the Linux client.
(A) To use the NTP Query utility ntpq to remotely query the ESX host's NTP service (ntpq) and determine whether it is successfully synchronizing with the upstream NTP server, the command is as follows:
ntpq -p <peer-server-address>; If no peer is specified, your local ESXi host’s ntpd service is queried.
The “-n” flag can also be specified for displaying host addresses in dotted numerical format rather than canonical names (ntpq -pn)
The fields returned by ntpq have these meanings:
remote |
Hostname or IP address of the configured upstream NTP server. |
refid |
|
st |
|
t |
|
when |
|
poll |
|
reach |
|
delay |
|
offset |
The offset (in milliseconds) between the time on the configured NTP server and the ESXi host. A value closer to 0 is ideal. |
jitter |
The observed timing jitter or variation between clock pulses of time with the configured NTP server. A value closer to 0 is ideal. |
refid -> the peer/remote host’s upstream time source to which it is synced. For instance, other ntp servers, or gps/radio clock. This column may also display the NTP Kiss-o-death codes. For instance, here is displayed .INIT., which signifies that the association has not yet synchronized for the first time. This could be due to several reasons including:
reach -> an octal number representing the reach shift register. For instance, 377 (octal) translates to 11111111 in binary, where each 1 represents a successful poll/connection to the ntp server. This can be used to check whether the connection to the remote host is consistent or there are any intermittent losses which could potentially cause the system clock on your host to become unsynchronized. In the event when >= 8 polls have been made to the ntp server and reach is not 377, this could indicate that your server is not consistently reachable. Please note that if the network connectivity is good, this register should show the following progression:
The tally code characters displayed in the very first column can be interpreted as follows:
remote refid st t when poll reach delay offset jitter
================================================================================
*static.###.### ###.###.###.### 2 u 18 64 17 147.797 +4.085 0.745
+time.######## 10.202.9.14 3 u 22 64 17 7.609 +1.834 0.272
2a01:###:####:1 .INIT. 16 u - 64 0 0.000 +0.000 0.000
* indicates the source currently synchronized to (syspeer)
+ indicates a candidate peer (a good time source)
<blank> indicates a discarded source
- indicates source rejected as an outlier
x indicates source as a falseticker that distributes bad time
# source selected, but not among the first six peers sorted by synchronization distance.
For more information, see the NTP.org Troubleshooting documentation and the NTP Query Program documentation.
(B) Peer association data can also help provide useful information to point to potential issues. List the server associations and association identifiers for the 3 configured servers as follows:
$ ntpq -c associations
ind assid status conf reach auth condition last_event cnt
===========================================================
1 37250 962a yes yes none sys.peer sys_peer 2
2 37251 942a yes yes none candidate sys_peer 2
3 37252 8011 yes no none reject mobilize 1
Note: The condition column reflects the tally codes seen earlier.
To view the system variables associated with each association ID. You would probably always need to view the server/time source currently synchronized to, ( In this case the first association corresponding to sys.peer in the condition column, use the association identifier from the table above as in the following command:
$ ntpq -c “rv 37250”
associd=37250 status=962a conf, reach, sel_sys.peer, 2 events, sys_peer, srcadr=static.###.###.###.###.clients.your-server.de, srcport=123, dstadr=10.205.69.87, dstport=123, leap=00, stratum=2, precision=-25, rootdelay=6.317, rootdisp=10448.441, refid=###.###.###.###, reftime=e741b79f.81065605 Mon, Dec 12 2022 14:22:23.504, rec=e741b7da.8466a7e8 Mon, XXX 12 #### ##:##:22.517, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=9, flash=400, keyid=0, offset=+0.349, delay=146.270, dispersion=1.904, jitter=1.931, xleave=0.240,
filtdelay= 148.30 146.27 146.37 147.59 148.30 147.77 147.80 148.33,
filtoffset= -0.15 +0.35 +0.00 -0.65 -0.25 +1.03 +4.08 +3.50,
filtdisp= 0.00 0.98 1.98 2.97 3.93 4.89 5.87 6.87
Note: The rootdisp (Root dispersion, in milliseconds) value here, which is an estimate of the error/variance between the correct time and that of the time server. A high rootdisp value is an indicator for poor timekeeping abilities. Please note that “An ESXi/ESX host, by default, does not accept any NTP reply with a root dispersion greater than 1.5 seconds (1500 ms).” (https://kb.vmware.com/s/article/1035833). Hence, the customer would have to add the “tos maxdist” configuration as a workaround if they want to continue using the same configured NTP servers. A flash value of 400 can also indicate that the maximum distance threshold has been exceeded and that the tos maxdist configuration needs to be applied.
To add the configuration on builds 7.0U3 onwards:
1. Update NTP configuration(in file and configstore)
cp /etc/ntp.conf /etc/ntp.conf.bak && echo "tos maxdist 15" >> /etc/ntp.conf.bak && esxcli system ntp set -f /etc/ntp.conf.bak
2. Restart NTP: esxcli system ntp set -e 0 && esxcli system ntp set -e 1
Refer to https://kb.vmware.com/s/article/1035833 for adding this configuration to earlier builds.
3) Capture network traffic
Capture network traffic flowing between the ESXi host and the NTP server to determine whether packets are being sent and received.
For ESXi:
1. Open a console to the ESXi host.
2. Obtain a list of available VMkernel network interfaces using this command:
esxcfg-vmknic -l
3. Capture NTP network traffic on port 123 flowing to and from the NTP server using this command:
tcpdump-uw -c 5 -n -i network_interface host ntp_server_ip_address and port 123
Example: When using a VMkernel interface vmk0 and an NTP server at 10.11.12.13:
tcpdump-uw -c 5 -n -i vmk0 host 10.11.12.13 and port 123
4. Monitor the output for 30 seconds. Messages similar to this indicate NTP synchronization:
21:04:45.446566 172.16.24.16.ntp > 192.168.38.127.ntp: v4 client strat 2 poll 10 prec -16 (DF) [tos 0x10]
5. Press Ctrl+C on your keyboard to stop tcpdump-uw.
4) Check ntpClient Firewall Ruleset
If an absence of outgoing NTP packets is seen, check the ntpClient firewall ruleset:
$ esxcli network firewall ruleset list | grep -i ntpclient
ntpClient true
and the allowedip property of the ruleset:
$ esxcli network firewall ruleset allowedip list | grep -i ntpclient
ntpClient All
If allowedip is not set to “All”, try one of the following:
a) Either set the allip property to true by:
$ esxcli network firewall ruleset set -a true -r ntpClient
[Note that -a|--allowed-all=<bool>
Set to true to allowed all ip, set to false to use allowed ip list.]
b) Or if the customer's ntp server is expected to be permanent and the customer wants to only allow NTP synchronization with that IP address, it can be added to the allowed ip list by:
$ esxcli network firewall ruleset allowedip add -i <ip-address> -r ntpClient
Note that by using method 2, if the IP is decommissioned or customer wishes to change the configured NTP server, they would have to add that to the allowed ip list again each time.
5) Review NTP log entries
Review the ntpd log entries in the var/log directory to determine whether the NTP daemon has synchronized with the remote NTP server.
• Open a console to the ESXi host. For more information, see Unable to connect to an ESX host using Secure Shell (SSH) (1003807) or Using Tech Support Mode in ESXi 4.1 and ESXi 5.0 (1017910).
• To verify when the NTP service is being started/stopped/restarted:
grep -rin “ntpd” syslog.log
Look for output like the following in syslog.log file:
3356:2023-01-04T10:38:32.337Z In(30) ntpd-init[1000106396]: Stopping ntpd
3357:2023-01-04T10:38:32.472Z In(30) watchdog-ntpd[1000106404]: Terminating watchdog process with PID 1000104976
. . .
3361:2023-01-04T10:38:32.710Z In(30) ntpd-init[1000106414]: Starting ntpd. . .
Note: The message like the following in syslog.log are normal and expected when ntpd service is started:
3386:2023-01-04T10:38:33.798Z In(30) ntpd[1000106440]: kernel reports TIME_ERROR: 0x6041: Clock Unsynchronized
3387:2023-01-04T10:38:33.798Z In(30) ntpd[1000106440]: kernel reports TIME_ERROR: 0x6041: Clock Unsynchronized
• In the vmkernel.log files, messages similar to this indicate that ntpd is successfully connecting to the remote NTP server:
8434:2023-01-03T10:26:50.251Z Wa(180) vmkwarning: cpu2:1000082272)WARNING: NTPClock: Adj:1771: system clock synchronized to upstream time servers
• Messages similar to this indicate that the system clock has lost synchronization to the upstream time source.
vmkernel.1:82507:2022-10-28T08:11:56.000Z cpu0:2102208)WARNING: NTPClock: 680: system clock apparently no longer synchronized to upstream time servers
• Messages similar to this indicate that the NTP service is stepping (adjusting) the system clock to bring it to the correct time
vmkernel.1:31132:2022-10-18T13:53:52.961Z cpu15:2991753)WARNING: NTPClock: 1457: system clock stepped to 1666101233.838414000, no longer synchronized to upstream time servers
Review logs and network traffic
When successful network communication is established between the ESXi host and the NTP server, review the logs and network traffic to ensure that NTP synchronization is occurring and that the discrepancy is being reduced.
Note: It may take from 1 to 15 minutes to get the time synchronized after the packets are sent and received correctly by the ESXi host.
Symptom: No exchange of NTP packets (0 reach) and hence no time sync on the ESXi host after using the hostname to configure a dual-stack (IPv4/IPv6) NTP time source. This scenario can be verified by the 'Reach' column for this server from the output of the 'ntpq -pn' command as being 0, when the address resolved by the NTP daemon for the hostname provided is its IPv6 adress. For instance:
$ ntpq -pn
remote refid st t when poll reach delay offset jitter
============================================================================
2a02:###:####:0 .INIT. 16 u - 64 0 0.000 +0.000 0.000
It can also be checked whether the NTP server's IPv6 address is pingable:
command:
$ ping6 <IPv6 address>
output:
PING <IPv6 address> (<IPv6 address>): 56 data bytes
sendto() failed (No route to host)
Cause: This could happen when there isn't a proper IPv6 route configured from the ESXi host to the NTP server. NTP may nevertheless resolve the configured hostname to its IPv6 address (instead of IPv4), even though the IPv6 address may be unreachable from the customer's ESXi host. It may be an indication that the IPv6 config on the ESXi host is inadequate, for instance, having only link-local addresses configured on the host cannot reach the NTP server that has a global-scope address.
Workaround 1 : Setup an adequate route from the ESXi host to the NTP server, so that the latter's IPv6 address may be reachable from within the host. This could entail but may not be limited to configuring a global-scope or static IPv6 address on the ESXi host.
Workaround 2: Directly use IPv4 address of the NTP server to configure NTP instead of the FQDN/hostname.
$ esxcli system ntp set -s <IPv4 address> -e 1