Troubleshooting NTP on ESX and ESXi 6.x / 7.x / 8.x
search cancel

Troubleshooting NTP on ESX and ESXi 6.x / 7.x / 8.x

book

Article ID: 312204

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article provides troubleshooting steps for identifying and isolating problems with NTP time synchronization on ESXi hosts.


Environment

VMware vSphere ESXi 6.x
VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

 

Resolution

Use these methods to troubleshoot NTP on ESXi hosts:

CHECKLIST/INDEX

  1. Validate connectivity/networking setup
  2. Query ntpd service using ntpq
  3. Capture network traffic
  4. Check ntp Client Firewall Ruleset
  5. Review NTP log entries

1) Validate connectivity/networking setup

Validate network connectivity between the ESXi host and the NTP server/upstream time source using 
a.    the ping command (for e.g., ping <IP address>).
b.    the traceroute command to trace NTP packet route. 

If found unreachable, make sure that an ip route to the ntp server(s) is available, and that the network on your host environment is properly set up.
For more information, refer to Testing network connectivity with the ping command (1003486).

2) Query ntpd service using ntpq

Use the NTP Query utility program ntpq to remotely query the ESXi/ESX host's ntpq service.

The ntpq utility is commonly installed on Linux clients and is also available in the ESX service console and the vSphere Management Assistant. For more information on the installation and use of the ntpq utility program on a given Linux distribution, see your Linux distribution's documentation.

For an ESXi 5.x and later host, the ntpq utility is included by default and does not need to be installed. It can be run locally from the ESXi 5.x and later host.

The ntpq utility is not available on ESXi 3.x/4.x. To query an ESXi host's NTP service ntpd, install ntpq on a remote Linux client and query the ESXi host's ntpd service from the Linux client.

(A) To use the NTP Query utility ntpq to remotely query the ESX host's NTP service (ntpq) and determine whether it is successfully synchronizing with the upstream NTP server, the command is as follows:
ntpq -p <peer-server-address>; If no peer is specified, your local ESXi host’s ntpd service is queried.

The “-n” flag can also be specified for displaying host addresses in dotted numerical format rather than canonical names (ntpq -pn)


(i)    Run this command:
watch ntpq -pn 
(ii)    Monitor the output for 30 seconds and press Ctrl+C on your keyboard to stop the watch command.
From the watch command, you see output similar to:
Every 2 seconds: ntpq -pn 
remote         refid    st  t  when   poll  reach  delay  offset  jitter
============================================================================
*10.11.12.130  1.0.0.0   1  u   46     64    377   43.76  5.58    40000
Note:
If you receive the message ntpq: read: Connection refused, ensure that your NTP servers are configured and the NTP service is running (“esxcli system ntp get” command). If you receive the message No association ID's returned, ESXi host cannot reach the configured NTP server. If you receive the message ***Request timed out, the ntpq command did not receive a response from the ESXi host's NTP daemon. Skip to the Capture network traffic section below.


The fields returned by ntpq have these meanings:

   remote   

Hostname or IP address of the configured upstream NTP server.

refid


Identification of the time stream to which the NTP server is synchronized. If you receive a refid of ".INIT.", the ESXi host has not received a response from the configured NTP server.

st


Stratum is a value representing the hierarchy of the upstream NTP servers. Higher values indicate NTP servers further away from the root time source. Values are relative, and can be set manually by an NTP server.

t


Type of packet exchange used for NTP communication. Usually "u" for    unicast UDP.

when


Quantity of seconds which have elapsed since the last attempted poll of the  configured upstream NTP server.

poll


Interval in seconds which the ESXi host polls the configured NTP server.

reach


An 8-bit shift register in octal (base 8), with each bit representing success (1) or failure (0) in contacting the configured NTP server. A value of 377 is 11111111 (base 2), which indicates that every query was successful during the last 8 poll intervals.

delay


Round trip delay (in milliseconds) for communication between the configured NTP server and the ESXi host.

offset

 The offset (in milliseconds) between the time on the configured NTP server and the ESXi host. A value closer to 0 is ideal.

jitter

The observed timing jitter or variation between clock pulses of time with the configured NTP server. A value closer to 0 is ideal.


refid -> the peer/remote host’s upstream time source to which it is synced. For instance, other ntp servers, or gps/radio clock. This column may also display the NTP Kiss-o-death codes. For instance, here is displayed .INIT., which signifies that the association has not yet synchronized for the first time. This could be due to several reasons including:

2.    The ntp server is unreachable (See point 1)
3.    The ntp server is down or misbehaving (Try a different ntp server)
4.    Your firewall ruleset is blocking outgoing ntp packets to the ntp server (See point 4)


reach -> an octal number representing the reach shift register. For instance, 377 (octal) translates to 11111111 in binary, where each 1 represents a successful poll/connection to the ntp server. This can be used to check whether the connection to the remote host is consistent or there are any intermittent losses which could potentially cause the system clock on your host to become unsynchronized. In the event when >= 8 polls have been made to the ntp server and reach is not 377, this could indicate that your server is not consistently reachable. Please note that if the network connectivity is good, this register should show the following progression:

•    1 (00000001)
•    3 (00000011)
•    7 (00000111)
•    17 (00001111)
•    37 (00011111)
•    77 (00111111)
•    177 (01111111)
•    377 (11111111)


The tally code characters displayed in the very first column can be interpreted as follows:
     remote           refid      st  t  when poll reach delay   offset   jitter
================================================================================
*static.###.### ###.###.###.###  2   u  18   64   17   147.797  +4.085   0.745
+time.######## 10.202.9.14   3   u  22   64   17   7.609    +1.834   0.272
 2a01:###:####:1 .INIT.        16  u   -   64   0    0.000    +0.000   0.000


* indicates the source currently synchronized to (syspeer)
+ indicates a candidate peer (a good time source)
<blank> indicates a discarded source
- indicates source rejected as an outlier
x indicates source as a falseticker that distributes bad time
# source selected, but not among the first six peers sorted by synchronization distance.

For more information, see the NTP.org Troubleshooting documentation and the NTP Query Program documentation.

(B) Peer association data can also help provide useful information to point to potential issues. List the server associations and association identifiers for the 3 configured servers as follows:

$ ntpq -c associations

ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 37250  962a   yes   yes  none  sys.peer    sys_peer  2
  2 37251  942a   yes   yes  none candidate    sys_peer  2
  3 37252  8011   yes    no  none    reject    mobilize  1


Note: The condition column reflects the tally codes seen earlier.

To view the system variables associated with each association ID. You would probably always need to view the server/time source currently synchronized to, ( In this case the first association corresponding to sys.peer in the condition column, use the association identifier from the table above as in the following command:

$ ntpq -c “rv 37250”

associd=37250 status=962a conf, reach, sel_sys.peer, 2 events, sys_peer, srcadr=static.###.###.###.###.clients.your-server.de, srcport=123, dstadr=10.205.69.87, dstport=123, leap=00, stratum=2, precision=-25, rootdelay=6.317, rootdisp=10448.441, refid=###.###.###.###, reftime=e741b79f.81065605  Mon, Dec 12 2022 14:22:23.504, rec=e741b7da.8466a7e8  Mon, XXX 12 #### ##:##:22.517, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=9, flash=400, keyid=0, offset=+0.349, delay=146.270, dispersion=1.904, jitter=1.931, xleave=0.240, 
filtdelay=   148.30  146.27  146.37  147.59  148.30  147.77  147.80  148.33, 
filtoffset=   -0.15   +0.35   +0.00   -0.65   -0.25   +1.03   +4.08   +3.50, 
filtdisp=      0.00    0.98    1.98    2.97    3.93    4.89    5.87    6.87


Note: The rootdisp (Root dispersion, in milliseconds) value here, which is an estimate of the error/variance between the correct time and that of the time server. A high rootdisp value is an indicator for poor timekeeping abilities. Please note that “An ESXi/ESX host, by default, does not accept any NTP reply with a root dispersion greater than 1.5 seconds (1500 ms).” (https://kb.vmware.com/s/article/1035833). Hence, the customer would have to add the “tos maxdist” configuration as a workaround if they want to continue using the same configured NTP servers. A flash value of 400 can also indicate that the maximum distance threshold has been exceeded and that the tos maxdist configuration needs to be applied.

To add the configuration on builds 7.0U3 onwards:

1. Update NTP configuration(in file and configstore)
cp /etc/ntp.conf /etc/ntp.conf.bak && echo "tos maxdist 15" >> /etc/ntp.conf.bak && esxcli system ntp set -f /etc/ntp.conf.bak

2. Restart NTP: esxcli system ntp set -e 0 && esxcli system ntp set -e 1
Refer to https://kb.vmware.com/s/article/1035833 for adding this configuration to earlier builds.

3) Capture network traffic

Capture network traffic flowing between the ESXi host and the NTP server to determine whether packets are being sent and received.

For ESXi:

1.    Open a console to the ESXi host.
 
2.    Obtain a list of available VMkernel network interfaces using this command:
       esxcfg-vmknic -l
 
3.    Capture NTP network traffic on port 123 flowing to and from the NTP server using this command: 

tcpdump-uw -c 5 -n -i network_interface host ntp_server_ip_address and port 123

Example: When using a VMkernel interface vmk0 and an NTP server at 10.11.12.13:

tcpdump-uw -c 5 -n -i vmk0 host 10.11.12.13 and port 123


4.    Monitor the output for 30 seconds. Messages similar to this indicate NTP synchronization:

21:04:45.446566 172.16.24.16.ntp > 192.168.38.127.ntp: v4 client strat 2 poll 10 prec -16 (DF) [tos 0x10]

5.    Press Ctrl+C on your keyboard to stop tcpdump-uw.

4) Check ntpClient Firewall Ruleset

If an absence of outgoing NTP packets is seen, check the ntpClient firewall ruleset: 

$ esxcli network firewall ruleset list | grep -i ntpclient
ntpClient                       true

and the allowedip property of the ruleset:
$ esxcli network firewall ruleset allowedip list | grep -i ntpclient
ntpClient                    All

If allowedip is not set to “All”, try one of the following:

a)    Either set the allip property to true by:

$ esxcli network firewall ruleset set -a true -r ntpClient

[Note that -a|--allowed-all=<bool>

            Set to true to allowed all ip, set to false to use allowed ip list.]

b)    Or if the customer's ntp server is expected to be permanent and the customer wants to only allow NTP synchronization with that IP address, it can be added to the allowed ip list by:

$ esxcli network firewall ruleset allowedip add -i <ip-address> -r ntpClient

Note that by using method 2, if the IP is decommissioned or customer wishes to change the configured NTP server, they would have to add that to the allowed ip list again each time.

5) Review NTP log entries

Review the ntpd log entries in the var/log directory to determine whether the NTP daemon has synchronized with the remote NTP server.

•    Open a console to the ESXi host. For more information, see Unable to connect to an ESX host using Secure Shell (SSH) (1003807) or Using Tech Support Mode in ESXi 4.1 and ESXi 5.0 (1017910).

•    To verify when the NTP service is being started/stopped/restarted:

grep -rin “ntpd” syslog.log

Look for output like the following in syslog.log file:

3356:2023-01-04T10:38:32.337Z In(30) ntpd-init[1000106396]: Stopping ntpd
3357:2023-01-04T10:38:32.472Z In(30) watchdog-ntpd[1000106404]: Terminating watchdog process with PID 1000104976
. . .
3361:2023-01-04T10:38:32.710Z In(30) ntpd-init[1000106414]: Starting ntpd
. . .

Note: The message like the following in syslog.log are normal and expected when ntpd service is started:

3386:2023-01-04T10:38:33.798Z In(30) ntpd[1000106440]: kernel reports TIME_ERROR: 0x6041: Clock Unsynchronized

3387:2023-01-04T10:38:33.798Z In(30) ntpd[1000106440]: kernel reports TIME_ERROR: 0x6041: Clock Unsynchronized

•    In the vmkernel.log files, messages similar to this indicate that ntpd is successfully connecting to the remote NTP server:

8434:2023-01-03T10:26:50.251Z Wa(180) vmkwarning: cpu2:1000082272)WARNING: NTPClock: Adj:1771: system clock synchronized to upstream time servers

•    Messages similar to this indicate that the system clock has lost synchronization to the upstream time source.

vmkernel.1:82507:2022-10-28T08:11:56.000Z cpu0:2102208)WARNING: NTPClock: 680: system clock apparently no longer synchronized to upstream time servers

•    Messages similar to this indicate that the NTP service is stepping (adjusting) the system clock to bring it to the correct time

vmkernel.1:31132:2022-10-18T13:53:52.961Z cpu15:2991753)WARNING: NTPClock: 1457: system clock stepped to 1666101233.838414000, no longer synchronized to upstream time servers

Review logs and network traffic
When successful network communication is established between the ESXi host and the NTP server, review the logs and network traffic to ensure that NTP synchronization is occurring and that the discrepancy is being reduced.

Note: It may take from 1 to 15 minutes to get the time synchronized after the packets are sent and received correctly by the ESXi host.

Unable to sync to dual-stack (IPv4/IPv6) NTP server when configured using FQDN/hostname

Symptom: No exchange of NTP packets (0 reach) and hence no time sync on the ESXi host after using the hostname to configure a dual-stack (IPv4/IPv6) NTP time source. This scenario can be verified by the 'Reach' column for this server from the output of the 'ntpq -pn' command as being 0, when the address resolved by the NTP daemon for the hostname provided is its IPv6 adress. For instance:

$ ntpq -pn

 remote             refid   st t when poll reach delay  offset  jitter
============================================================================
 2a02:###:####:0    .INIT.  16 u  -    64    0    0.000 +0.000 0.000

It can also be checked whether the NTP server's IPv6 address is pingable:

command:
$ ping6 <IPv6 address>

output:
PING <IPv6 address> (<IPv6 address>): 56 data bytes

sendto() failed (No route to host)


Cause: This could happen when there isn't a proper IPv6 route configured from the ESXi host to the NTP server. NTP may nevertheless resolve the configured hostname to its IPv6 address (instead of IPv4), even though the IPv6 address may be unreachable from the customer's ESXi host. It may be an indication that the IPv6 config on the ESXi host is inadequate, for instance, having only link-local addresses configured on the host cannot reach the NTP server that has a global-scope address.

Workaround 1 : Setup an adequate route from the ESXi host to the NTP server, so that the latter's IPv6 address may be reachable from within the host. This could entail but may not be limited to configuring a global-scope or static IPv6 address on the ESXi host.

Workaround 2: Directly use IPv4 address of the NTP server to configure NTP instead of the FQDN/hostname.

$ esxcli system ntp set -s <IPv4 address> -e 1

 

Additional Information

By default, ESXi/ESX uses NTPv4 but some NTP sources use NTPv3. The version mismatch leads to a synchronization failure. To resolve this, you must update the /etc/ntp.conf file to include the version you wish to use.

To update the /etc/ntp.conf file:
1.    Back up the /etc/ntp.conf file. Do not skip this step.
2.    Open the /etc/ntp.conf file in a text editor. For more information, refer to Editing configuration files in VMware ESXi and ESX (1017022) .
 
3.    Add a line for the NTPv3 server:
server x.x.x.x version 3
For example, after making the modification, the contents of the ntp.conf file is similar to:
restrict 127.0.0.1
restrict default kod nomodify notrap
driftfile /etc/ntp.drift
server 192.168.0.10 version 3
 
4.    Save and close the file.
 
5.    Restart the NTP services for the change to take effect.
For ESXi:
# /etc/init.d/ntpd restart

For ESXi:
# /etc/init.d/ntpd restart

For ESX:
# service ntpd restart

Note: To review the delay of the ntpq offset at end of day, create a folder
named /var/log/ntp with the command:
mkdir /var/log/ntp
Append these 4 lines to the ntp.conf file:
statistics loopstats
statsdir /var/log/ntp/
filegen peerstats file peers type day link enable
filegen loopstats file loops type day link enable

The logs are now created in the new ntp
directory.

Note: VMware recommends that you only configure one time service (netlogond or ntp). However, if you require NTP in conjunction with Active Directory (AD), configure the AD server to use a reliable time source and configure the NTP server for the ESXi host to use the AD server or the same NTP server that AD is using.
https://kb.vmware.com/s/article/1035833
For more information on NTP and NetLogond, see the  https://docs.vmware.com/en/VMware-vSphere/index.html