Troubleshooting NTP on ESX and ESXi 6.x / 7.x / 8.x

Products

VMware vSphere ESXi VMware vSphere ESXi 6.0 VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0

Issue/Introduction

Accurate and synchronized clocks are critical to proper functioning of the environments. Incorrect time sync can affect following.

Authentications.
Cryptographic algorithms.
Distributed systems, like SQL/Exchange and Document DBs
AD Replication for group policies and domain controller synchronization.
Industry standards.

Generally users will get a time skew alert/warning in there monitoring system.

This article provides troubleshooting steps for identifying and isolating problems with NTP time synchronization on ESXi hosts.

Environment

VMware vSphere ESXi 6.x

VMware vSphere ESXi 7.x

VMware vSphere ESXi 8.x

Cause

Possible causes for this can be following but not limited to.

Connectivity between NTP server and ESXi.
Configuration Issues.
Firewall related issues.

Resolution

Use these methods to troubleshoot NTP on ESXi hosts:

CHECKLIST/INDEX

Validate connectivity/networking setup
Query ntpd service using ntpq
Capture network traffic
Check ntp Client Firewall Ruleset
Review NTP log entries

1) Validate connectivity/networking setup

Validate network connectivity between the ESXi host and the NTP server/upstream time source using
a. the ping command (for e.g., ping <IP address>).
b. the traceroute command to trace NTP packet route.

If found unreachable, make sure that an ip route to the ntp server(s) is available, and that the network on the host environment is properly set up.
For more information, refer to Testing network connectivity with the ping command (315423).

2) Query ntpd service using ntpq

Use the NTP Query utility program ntpq to remotely query the ESXi/ESX host's ntpq service.

The ntpq utility is commonly installed on Linux clients and is also available in the ESX service console and the vSphere Management Assistant. For more information on the installation and use of the ntpq utility program on a given Linux distribution, see Linux distribution's documentation.

The ntpq utility is not available on ESXi 3.x/4.x. To query an ESXi host's NTP service ntpd, install ntpq on a remote Linux client and query the ESXi host's ntpd service from the Linux client.

(A) To use the NTP Query utility ntpq to remotely query the ESX host's NTP service (ntpq) and determine whether it is successfully synchronizing with the upstream NTP server, the command is as follows:
ntpq -p <peer-server-address>; If no peer is specified, the local ESXi host’s ntpd service is queried.

The “-n” flag can also be specified for displaying host addresses in dotted numerical format rather than canonical names (ntpq -pn)

(i) Run this command:

watch ntpq -pn
(ii) Monitor the output for 30 seconds and press Ctrl+C on the keyboard to stop the watch command.

From the watch command, output similar to this is seen:

Every 2 seconds: ntpq -pn

remote refid st t when poll reach delay offset jitter
============================================================================
*<IP address> 1.0.0.0 1 u 46 64 377 43.76 5.58 40000

Note:
If this message is received ntpq: read: Connection refused, ensure that the NTP servers are configured and the NTP service is running (“esxcli system ntp get” command). If this message is received. No association ID's returned, ESXi host cannot reach the configured NTP server. If this message is received ***Request timed out, the ntpq command did not receive a response from the ESXi host's NTP daemon. Skip to the Capture network traffic section below.

The fields returned by ntpq have these meanings:

remote	Hostname or IP address of the configured upstream NTP server.
refid	Identification of the time stream to which the NTP server is synchronized. If there is a refid of ".INIT.", the ESXi host has not received a response from the configured NTP server.
st	Stratum is a value representing the hierarchy of the upstream NTP servers. Higher values indicate NTP servers further away from the root time source. Values are relative, and can be set manually by an NTP server.
t	Type of packet exchange used for NTP communication. Usually "u" for unicast UDP.
when	Quantity of seconds which have elapsed since the last attempted poll of the configured upstream NTP server.
poll	Interval in seconds which the ESXi host polls the configured NTP server.
reach	An 8-bit shift register in octal (base 8), with each bit representing success (1) or failure (0) in contacting the configured NTP server. A value of 377 is 11111111 (base 2), which indicates that every query was successful during the last 8 poll intervals.
delay	Round trip delay (in milliseconds) for communication between the configured NTP server and the ESXi host.
offset	The offset (in milliseconds) between the time on the configured NTP server and the ESXi host. A value closer to 0 is ideal.
jitter	The observed timing jitter or variation between clock pulses of time with the configured NTP server. A value closer to 0 is ideal.

refid -> the peer/remote host’s upstream time source to which it is synced. For instance, other ntp servers, or gps/radio clock. This column may also display the NTP Kiss-o-death codes. For instance, here is displayed .INIT., which signifies that the association has not yet synchronized for the first time. This could be due to several reasons including:

2.   The ntp server is unreachable (See point 1)
3.   The ntp server is down or misbehaving (Try a different ntp server)
4.   The firewall ruleset is blocking outgoing ntp packets to the ntp server (See point 4)

reach -> an octal number representing the reach shift register. For instance, 377 (octal) translates to 11111111 in binary, where each 1 represents a successful poll/connection to the ntp server. This can be used to check whether the connection to the remote host is consistent or there are any intermittent losses which could potentially cause the system clock on the host to become unsynchronized. In the event when >= 8 polls have been made to the ntp server and reach is not 377, this could indicate that the server is not consistently reachable. Please note that if the network connectivity is good, this register should show the following progression:

•   1 (00000001)
•   3 (00000011)
•   7 (00000111)
•   17 (00001111)
•   37 (00011111)
•   77 (00111111)
•   177 (01111111)
•   377 (11111111)

The tally code characters displayed in the very first column can be interpreted as follows:
remote refid st t when poll reach delay offset jitter
================================================================================
*static.###.### ###.###.###.### 2 u 18 64 17 147.797 +4.085 0.745
+time.######## <IP address> 3 u 22 64 17 7.609 +1.834 0.272
2a01:###:####:1 .INIT. 16 u - 64 0 0.000 +0.000 0.000

* indicates the source currently synchronized to (syspeer)
+ indicates a candidate peer (a good time source)
<blank> indicates a discarded source
- indicates source rejected as an outlier
x indicates source as a falseticker that distributes bad time
# source selected, but not among the first six peers sorted by synchronization distance.

For more information, see the NTP.org Troubleshooting documentation and the NTP Query Program documentation.

(B) Peer association data can also help provide useful information to point to potential issues. List the server associations and association identifiers for the 3 configured servers as follows:

$ ntpq -c associations

ind assid status conf reach auth condition last_event cnt
===========================================================
1 37250 962a yes yes none sys.peer sys_peer 2
2 37251 942a yes yes none candidate sys_peer 2
3 37252 8011 yes no none reject mobilize 1

Note: The condition column reflects the tally codes seen earlier.

To view the system variables associated with each association ID. Will need to always view the server/time source currently synchronized to, ( In this case the first association corresponding to sys.peer in the condition column, use the association identifier from the table above as in the following command:

$ ntpq -c “rv 37250”

associd=37250 status=962a conf, reach, sel_sys.peer, 2 events, sys_peer, srcadr=static.###.###.###.###.clients.your-server.de, srcport=123, dstadr=<IP address>, dstport=123, leap=00, stratum=2, precision=-25, rootdelay=6.317, rootdisp=10448.441, refid=###.###.###.###, reftime=e741b79f.81065605 [YYYY-MM-DDTHH:MM:SS], rec=e741b7da.8466a7e8, XXX 12 #### ##:##:22.517, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=9, flash=400, keyid=0, offset=+0.349, delay=146.270, dispersion=1.904, jitter=1.931, xleave=0.240,
filtdelay= 148.30 146.27 146.37 147.59 148.30 147.77 147.80 148.33,
filtoffset= -0.15 +0.35 +0.00 -0.65 -0.25 +1.03 +4.08 +3.50,
filtdisp= 0.00 0.98 1.98 2.97 3.93 4.89 5.87 6.87

Note: The rootdisp (Root dispersion, in milliseconds) value here, which is an estimate of the error/variance between the correct time and that of the time server. A high rootdisp value is an indicator for poor timekeeping abilities.

Please note that “An ESXi/ESX host, by default, does not accept any NTP reply with a root dispersion greater than 1.5 seconds (1500 ms).”

Synchronizing ESXi/ESX time with a Microsoft Domain Controller

Hence, the customer would have to add the “tos maxdist” configuration as a workaround if they want to continue using the same configured NTP servers. A flash value of 400 can also indicate that the maximum distance threshold has been exceeded and that the tos maxdist configuration needs to be applied.

To add the configuration on builds 7.0U3 onwards:

1. Update NTP configuration(in file and configstore)
cp /etc/ntp.conf /etc/ntp.conf.bak && echo "tos maxdist 15" >> /etc/ntp.conf.bak && esxcli system ntp set -f /etc/ntp.conf.bak

2. Restart NTP: esxcli system ntp set -e 0 && esxcli system ntp set -e 1
Refer to Synchronizing ESXi/ESX time with a Microsoft Domain Controller for adding this configuration to earlier builds.

3) Capture network traffic

Capture network traffic flowing between the ESXi host and the NTP server to determine whether packets are being sent and received.

For ESXi:

1.   Open a console to the ESXi host.

2.   Obtain a list of available VMkernel network interfaces using this command:
esxcfg-vmknic -l

3.   Capture NTP network traffic on port 123 flowing to and from the NTP server using this command:

tcpdump-uw -c 5 -n -i network_interface host ntp_server_ip_address and port 123

Example: When using a VMkernel interface vmk0 and an NTP server at 10.11.12.13:

tcpdump-uw -c 5 -n -i vmk0 host <IP address> and port 123

4. Monitor the output for 30 seconds. Messages similar to this indicate NTP synchronization:

[YYYY-MM-DDTHH:MM:SS] <IP address>ntp > <IP address>.ntp: v4 client strat 2 poll 10 prec -16 (DF) [tos 0x10]

5. Press Ctrl+C on the keyboard to stop tcpdump-uw.

4) Check ntpClient Firewall Ruleset

If an absence of outgoing NTP packets is seen, check the ntpClient firewall ruleset:

$ esxcli network firewall ruleset list | grep -i ntpclient
ntpClient true
and the allowedip property of the ruleset:
$ esxcli network firewall ruleset allowedip list | grep -i ntpclient
ntpClient All
If allowedip is not set to “All”, try one of the following:

a) Either set the allip property to true by:

$ esxcli network firewall ruleset set -a true -r ntpClient

[Note that -a|--allowed-all=<bool>

Set to true to allowed all ip, set to false to use allowed ip list.]

b) Or if the customer's ntp server is expected to be permanent and the customer wants to only allow NTP synchronization with that IP address, it can be added to the allowed ip list by:

$ esxcli network firewall ruleset allowedip add -i <ip-address> -r ntpClient

Note that by using method 2, if the IP is decommissioned or customer wishes to change the configured NTP server, they would have to add that to the allowed ip list again each time.

5) Review NTP log entries

Review the ntpd log entries in the var/log directory to determine whether the NTP daemon has synchronized with the remote NTP server.

• Open a console/SSH session to the ESXi host.

• To verify when the NTP service is being started/stopped/restarted:

grep -rin “ntpd” syslog.log

Look for output like the following in syslog.log file:

[YYYY-MM-DDTHH:MM:SS] In(30) ntpd-init[1000106396]: Stopping ntpd
[YYYY-MM-DDTHH:MM:SS] In(30) watchdog-ntpd[1000106404]: Terminating watchdog process with PID 1000104976
. . .
[YYYY-MM-DDTHH:MM:SS] In(30) ntpd-init[1000106414]: Starting ntpd. . .

Note: The message like the following in syslog.log are normal and expected when ntpd service is started:

[YYYY-MM-DDTHH:MM:SS] In(30) ntpd[1000106440]: kernel reports TIME_ERROR: 0x6041: Clock Unsynchronized

• In the vmkernel.log files, messages similar to this indicate that ntpd is successfully connecting to the remote NTP server:

[YYYY-MM-DDTHH:MM:SS] Wa(180) vmkwarning: cpu2:1000082272)WARNING: NTPClock: Adj:1771: system clock synchronized to upstream time servers

• Messages similar to this indicate that the system clock has lost synchronization to the upstream time source.

vmkernel.1:82507:[YYYY-MM-DDTHH:MM:SS] cpu0:2102208)WARNING: NTPClock: 680: system clock apparently no longer synchronized to upstream time servers

• Messages similar to this indicate that the NTP service is stepping (adjusting) the system clock to bring it to the correct time

vmkernel.1:[YYYY-MM-DDTHH:MM:SS] cpu15:2991753)WARNING: NTPClock: 1457: system clock stepped to 1666101233.838414000, no longer synchronized to upstream time servers

Review logs and network traffic
When successful network communication is established between the ESXi host and the NTP server, review the logs and network traffic to ensure that NTP synchronization is occurring and that the discrepancy is being reduced.

Note: It may take from 1 to 15 minutes to get the time synchronized after the packets are sent and received correctly by the ESXi host.

Unable to sync to dual-stack (IPv4/IPv6) NTP server when configured using FQDN/hostname

Symptom: No exchange of NTP packets (0 reach) and hence no time sync on the ESXi host after using the hostname to configure a dual-stack (IPv4/IPv6) NTP time source. This scenario can be verified by the 'Reach' column for this server from the output of the 'ntpq -pn' command as being 0, when the address resolved by the NTP daemon for the hostname provided is its IPv6 adress. For instance:

$ ntpq -pn

remote refid st t when poll reach delay offset jitter
============================================================================
2a02:###:####:0 .INIT. 16 u - 64 0 0.000 +0.000 0.000

It can also be checked whether the NTP server's IPv6 address is pingable:

command:
$ ping6 <IPv6 address>

output:
PING <IPv6 address> (<IPv6 address>): 56 data bytes

sendto() failed (No route to host)

Cause: This could happen when there isn't a proper IPv6 route configured from the ESXi host to the NTP server. NTP may nevertheless resolve the configured hostname to its IPv6 address (instead of IPv4), even though the IPv6 address may be unreachable from the customer's ESXi host. It may be an indication that the IPv6 config on the ESXi host is inadequate, for instance, having only link-local addresses configured on the host cannot reach the NTP server that has a global-scope address.

Workaround 1 : Setup an adequate route from the ESXi host to the NTP server, so that the latter's IPv6 address may be reachable from within the host. This could entail but may not be limited to configuring a global-scope or static IPv6 address on the ESXi host.

Workaround 2: Directly use IPv4 address of the NTP server to configure NTP instead of the FQDN/hostname.

$ esxcli system ntp set -s <IPv4 address> -e 1

Additional Information

By default, ESXi/ESX uses NTPv4 but some NTP sources use NTPv3. The version mismatch leads to a synchronization failure. To resolve this, update the /etc/ntp.conf file to include the version that is required.

To update the /etc/ntp.conf file:

1. Back up the /etc/ntp.conf file. Do not skip this step.

2. Open the /etc/ntp.conf file in a text editor.

3. Add a line for the NTPv3 server:

server x.x.x.x version 3

For example, after making the modification, the contents of the ntp.conf file is similar to:

restrict <IP address>
restrict default kod nomodify notrap
driftfile /etc/ntp.drift
server <IP address> version 3

4. Save and close the file.

5. Restart the NTP services for the change to take effect.

For ESXi:
# /etc/init.d/ntpd restart

For ESXi:
# /etc/init.d/ntpd restart

For ESX:
# service ntpd restart

Note: To review the delay of the ntpq offset at end of day, create a folder
named /var/log/ntp with the command:
mkdir /var/log/ntp
Append these 4 lines to the ntp.conf file:
statistics loopstats
statsdir /var/log/ntp/
filegen peerstats file peers type day link enable
filegen loopstats file loops type day link enable
The logs are now created in the new ntp directory.

Note: VMware recommends that only configure one time service (netlogond or ntp). However, if there is a requirement for NTP in conjunction with Active Directory (AD), configure the AD server to use a reliable time source and configure the NTP server for the ESXi host to use the AD server or the same NTP server that AD is using.
Synchronizing ESXi/ESX time with a Microsoft Domain Controller