Description:
In some cases, the AppLogic BFC, controller, appliance and physical node(dom0) may have incorrect time, this document introduces basic troubleshooting
procedure to address such sort of problem.
Solution:
Background knowledge
In AppLogic 2.x and 3.0, controller is solo ntp server in the grid, time sync flow is like below. Ntp on controller and physical node processes the time
sync up, and hypervisor is in charge of passing time from physical node to appliance VM.
controller<= physical node <= appliance VM
From 3.1, BFC take the role of controller and become the ntp server of all grids managed by it, time sync flow is changed to the following procedure. Ntp
on BFC and physical node processes the time sync up, and hypervisor is in charge of passing time from physical node to controller and appliance VM.'
BFC<=physical node<= controller and application VM
If external ntp server is configured from BFC GUI, time sync flow should look like the below
external ntp server<=BFC<=physical node<= controller and application VM
Another major change from 3.1 is all physical node clock (both system clock and hwclock) use UTC+0 time as opposed to local time. Time drift sync with BFC
to physical node is also based on UTC time.
If you would like know more details of ntp, please refer to following link
http://en.wikipedia.org/wiki/Network_Time_Protocol
Troubleshooting procedure
The time synchronization issue of any link of the chain may result in bfc, controller, appliance VM or physical node of next links have incorrect time.
Please follow the below check list to locate which part has time time synchronization issue
Check list for AppLogic 2.x and 3.0
-
Controller time is correctly sync or not
- Physical node system time is correctly sync or not
- Application VM time zone is correctly configured or not
Check list for AppLogic 3.1 and newer release
-
If external ntp server is configured in BFC GUI, BFC time is correctly configured or not
- Physical node system time is correctly sync or not
- Physical node hwclock is correctly configured or not
- Affected Appliance VM time zone is correctly configured or not
- Affected appliance is windows or linux box which is running HVM mode or PV mode
Note:
there is known time sync issue in 3.1 due to a Xen time drift bug in which physical node(dum0) has trouble to pass time drift to hypervisor, the end result
is controller and appliance VM has incorrect time. The solution is set independent wall clock in appliance VM, additionally, install and configure ntp to
sync time from either BFC or external ntp server.
How to identify the time sync with external ntp works properly
This section applies to controller in AppLogic 3.0 and prior release , as well as BFC of 3.1 and newer release if external ntp server is configured.
Note:
When configuring ntp server in the BFC GUI 3.1 and newer release, you may input any valid and available external ntp server, but not BFC name or ip
address.
-
In /etc/ntp.conf, the entry with keyword "server" is the external ntp server name/ip.
server < controller private ip>
- Run "ntpq -p" to show ntp configuration of local server. In the following sample, external ntp servers are ntpsrv1 and ntpsrv2
remote refid st t when poll reach delay offset jitter
==============================================================================
*ntpsrv1 141.202.0.2 4 u 995 1024 377 0.411 0.019 0.031
ntpsrv2 141.202.0.25 5 u 708 1024 377 46.798 -0.100 0.051
if there are multiple entities in the output of "ntpq -p", the entity started with *(asterisk) is the current (preferred) ntp source.
Note:
please refer to following documet for more details of how to utilize ntpq to address the connection issue with ntp source
https://support.ca.com/irj/portal/anonymous/redirArticles?reqPage=search&searchID=TEC573076
- Run "ntpq -c readvar" to presents external ntp server name/ip and status. Here is a sample of output
assID=0 status=06f4 leap_none, sync_ntp, 15 events, event_peer/strat_chg,
version="ntpd [email protected] Fri Nov 18 13:21:16 UTC 2011 (1)",
processor="i686", system="Linux/2.6.18-238.el5PAE", leap=00, stratum=5,
precision=-20, rootdelay=17.898, rootdispersion=75.212, peer=26686,
refid=141.202.0.25,
reftime=d3965450.304f487c Wed, Jun 27 2012 23:56:00.188, poll=10,
clock=d3965990.187e30f8 Thu, Jun 28 2012 0:18:24.095, state=4,
offset=0.019, frequency=115.304, jitter=0.108, noise=0.634,
stability=0.003, tai=0
- If configuration is correct, next step is to verify if time sync up from external work properly. The recommend procedure including following steps
- service ntpd stop
- ntpdate -d <ntp server name/ip> --> <ntp server name/ip> can be found in "ntpq -p" output
- service ntpd start
- date
if step b or c has any error, please check if external ntp server name/ip is valid or accessible. If you would like refer to a public ntp server, please
refer to the following link
http://support.ntp.org/bin/view/Servers/WebHome
Note:
when ntpd service is started up, it may take a while, usually, less than 5 minutes, for ntpd service to connect to the primary external ntp server and mark
*(asterisk) in the output of "ntpq -p"
How to identify the time sync to physical node work properly
If physical node time are different to the solo ntp server in the grid(controller in 3.0 and prior release, BFC in 3.1 and newer release), similarly, we
can also utilize the following approaches for verification and troubleshooting.
-
Check /etc/ntp.conf. In 3.0 and prior release, ntp server should point to controller, in 3.1 and newer release, it should be BFC. For instance, the
below entity in ntp.conf stands for ntp server is controller(private ip)
server < controller private ip>
- Run "ntpq -p" and "ntpq -c readvar" to verify ntp configuration. The below is a smaple of "ntpq -p" output, 192.168.6.254 is the controller ip. If there
is multiple entity, the entity started with *(asterisk) is the current (preferred) ntp source, please make sure it's controller private ip in 3.0 and older
release, or BFC private ip in 3.1 and newer release
remote refid st t when poll reach delay offset jitter
==============================================================================
*192.168.6.254 LOCAL(0) 11 u 91 1024 377 0.204 467.992 0.764
LOCAL(0) .LOCL. 10 l 29 64 377 0.000 0.000 0.001
- Sync node system clock as below
- service ntpd stop
- ntpdate -d <ntp server name/ip>
- service ntpd start
- date
- Sync node hardware clock by running "hwclock -systohc". "hwclock" without parameter is used to display current hardware clock time.
Note:
From 3.1, both system time and hardware clock time of physical node should be UTC+0 time, and they should not have significant gap. BFC is
still local time. For instance, current time on BFC is 20:00 PM (UTC+10 time zone), node time is 10:00 AM (UTC+0), in such case, their time are consistent.
[[email protected] srv1 ~]# date
Thu Jun 28 06:19:37 UTC 2012 -> OS system time
[[email protected] srv1 ~]# hwclock
Thu 28 Jun 2012 06:19:38 AM UTC -0.549140 seconds -> hardware clock
The time zone of physical node system is stored in /etc/localtime, it should either set as UTC like below or link to a /usr/share/zoneinfo/UTC
[[email protected] srv1 ~]# cat /etc/localtime
TZif2UTCTZif2UTC
UTC0
The time zone of hardware clock is stored in /etc/sysconfig/clock as below.
[[email protected] srv1 ~]# cat /etc/sysconfig/clock
ZONE="UTC"
UTC=true
ARC=false
How to identify the time of appliance VM correct or not
Basically, if physical node time is correct, the appliance VM time should be correct as long as its time zone is configured as correct local time zone. If
appliance VM time is incorrect, following information may help to address the problem
-
PV appliance VM has incorrect time in 3.1(and ONLY in 3.1)
Appliance VM in AppLogic 3.1 may not obtain the correct time due to Xen time drift bug even though BFC and physical node have correct time. This bug only
affect PV appliance, not HVM appliance(Windows appliance/VDS always run as HVM mode)
If the system is affected by this bug, the recommended workaround is to set independent wall clock in PV appliance VM as what below document indicates.
http://docs.vmd.citrix.com/XenServer/4.0.1/guest/ch04s06.html
In addition, it's strongly recommended to install ntp package into PV appliance VM in such a scenario and configure either BFC or external ntp server as
the source of time sync.
Note:
Run "hwclock -systohc" to correct physical node hwclock then reboot physical node can only temporarily pass correct time to appliance for hours, it's not
the final solution.
- Windows appliance has incorrect time in 3.1 and newer release
From 3.1, the physical node system time and hwclock is set as UTC+0 time during the installation. Microsoft Windows is expecting the realtime clock to be
set to localtime rather than UTC by default. As a result of this, the time/date is not correctly calculated inside the windows appliance for the various
timezones.
To fix this a registry edit is needed to tell the system the realtime clock is set to UTC as follows. After rebooting the date/time is adjusted correctly
for the timezone set.
Navigate to HKLM\SYSTEM\CurrentControlSet\Control\TimeZoneInformation\ and create or set the RealTimeIsUniversal value to a dword value of 1
- Windows appliance in domain does incorrect time.
If Windows box join the domain, by default, it sync time from domain controller. It's necessary to make sure domain controller time is correct.
You may refer to following document for details of how time is sync in the domain.
http://blogs.msdn.com/b/w32time/archive/2007/07/07/welcome.aspx
http://blogs.msdn.com/b/w32time/archive/2007/09/04/keeping-the-domain-on-time.aspx
"w32tm /query [/peers | /status | /configuration]" is used to display time sync source and status on a windows box. In the following sample, you may see
time sync up source is AUSYDC02.ca.com.
C:>w32tm /query /peers
#Peers: 1
Peer: AUSYDC02.ca.com
State: Active
Time Remaining: 282.9147723s
Mode: 3 (Client)
Stratum: 6 (secondary reference - syncd by (S)NTP)
PeerPoll Interval: 13 (8192s)
HostPoll Interval: 13 (8192s)