Discussion:
[chrony-users] chrony losing sync with GPS reference
Brendan Simon (eTRIX)
2018-12-05 03:14:49 UTC
Permalink
I'm running Debian Linux (Jessie based with a few backport packages) on
some embedded systems that use a GPS receiver to set the system clock.

The problem I have seen is that the chrony loses sync with the GPS
reference after some time, and the system clock starts to drift and
never recovers.  This is disastrous as I need to have the clocks
synchronised.

gpsd was restarted and then chrony started synchronising the system
clock again.

gpsd is 3.16 (jessie-backports) and chrony is 1.30 (jessie)

Is there anything in chrony 1.30 that could be known to cause this loss
of sync with no recovery ever?

Is there anything in gpsd 3.16 that could be known to cause this loss of
sync with no recovery ever?

I'm looking at moving to Debian Buster (still in Testing but to be
released early 2019).  It has gpsd 3.17 (but hopefully might move to
3.18 before release) and chrony 3.4.

Does anyone know if this latter combination of gpsd/chrony is likely to
solve the issues/symptoms described above?

gpsd changelog isn't too informative with fixes.  I see the following:

3.18: Fix several buffer issues.
      Too many other bug fixes and improvements to mention.

3.17: Fix a SiRF driver bug that occasionally confused NTP.

\Thanks,
Brendan.
--
------------------------------------------------------------------------
*eTRIX Services*
PO Box 497, Inverloch, VIC 3996, AUSTRALIA.
(m) 0417-380-984
------------------------------------------------------------------------
Brendan Simon (eTRIX)
2018-12-05 03:15:09 UTC
Permalink
I'm running Debian Linux (Jessie based with a few backport packages) on
some embedded systems that use a GPS receiver to set the system clock.

The problem I have seen is that the chrony loses sync with the GPS
reference after some time, and the system clock starts to drift and
never recovers.  This is disastrous as I need to have the clocks
synchronised.

gpsd was restarted and then chrony started synchronising the system
clock again.

gpsd is 3.16 (jessie-backports) and chrony is 1.30 (jessie)

Is there anything in chrony 1.30 that could be known to cause this loss
of sync with no recovery ever?

Is there anything in gpsd 3.16 that could be known to cause this loss of
sync with no recovery ever?

I'm looking at moving to Debian Buster (still in Testing but to be
released early 2019).  It has gpsd 3.17 (but hopefully might move to
3.18 before release) and chrony 3.4.

Does anyone know if this latter combination of gpsd/chrony is likely to
solve the issues/symptoms described above?

gpsd changelog isn't too informative with fixes.  I see the following:

3.18: Fix several buffer issues.
      Too many other bug fixes and improvements to mention.

3.17: Fix a SiRF driver bug that occasionally confused NTP.

Thanks,
Brendan.
Bill Unruh
2018-12-05 03:23:17 UTC
Permalink
I'm running Debian Linux (Jessie based with a few backport packages) on some embedded systems that use a GPS
receiver to set the system clock.
The problem I have seen is that the chrony loses sync with the GPS reference after some time, and the system clock
starts to drift and never recovers.  This is disastrous as I need to have the clocks synchronised.
gpsd was restarted and then chrony started synchronising the system clock again.
gpsd is 3.16 (jessie-backports) and chrony is 1.30 (jessie)
I think more information is needed. Make sure that the "refclocks" logs are
enabled, and look at them when the clock loses sync to see what is happening.
Are you using PPS or just the UTC time stamos from theGPS?
.
Is there anything in chrony 1.30 that could be known to cause this loss of sync with no recovery ever?
Is there anything in gpsd 3.16 that could be known to cause this loss of sync with no recovery ever?
I'm looking at moving to Debian Buster (still in Testing but to be released early 2019).  It has gpsd 3.17 (but
hopefully might move to 3.18 before release) and chrony 3.4.
Does anyone know if this latter combination of gpsd/chrony is likely to solve the issues/symptoms described above?
3.18: Fix several buffer issues.
      Too many other bug fixes and improvements to mention.
3.17: Fix a SiRF driver bug that occasionally confused NTP.
\Thanks,
Brendan.
--
____________________________________________________________________________________________________________________
eTRIX Services
PO Box 497, Inverloch, VIC 3996, AUSTRALIA.
(m) 0417-380-984
____________________________________________________________________________________________________________________
Miroslav Lichvar
2018-12-05 11:06:09 UTC
Permalink
Post by Bill Unruh
I'm running Debian Linux (Jessie based with a few backport packages) on some embedded systems that use a GPS
receiver to set the system clock.
The problem I have seen is that the chrony loses sync with the GPS reference after some time, and the system clock
starts to drift and never recovers.  This is disastrous as I need to have the clocks synchronised.
gpsd was restarted and then chrony started synchronising the system clock again.
gpsd is 3.16 (jessie-backports) and chrony is 1.30 (jessie)
I think more information is needed. Make sure that the "refclocks" logs are
enabled, and look at them when the clock loses sync to see what is happening.
Are you using PPS or just the UTC time stamos from theGPS?
If would also help to see the configuration file and the debug output
from chronyd -d -d (it needs to be compiled with the --enable-debug
option to get the debug messages).

When PPS doesn't work, it's usually a problem with the "locked"
reference. Its offset may be too large or too variable.
--
Miroslav Lichvar
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Bill Unruh
2018-12-05 14:30:39 UTC
Permalink
As Miroslav says, it would help to see the chrony.conf file as well. It is
bizarre that it sits there with a 1 second offset without trying to reduce it.
What is the 3 hr gap in the data? Did you select output or was this a
contiguous part of the data? offset sits there reliably at 1 second with no
hint that the system is trying to fix it.(mind you this is only a 10 sec piece
of data) Then there is suddenly a 10 year jump in the offset. It looks to me
like your GPS is unit is toasted since there is no indication in the actuall
system time that anything strange has happened. Ie, either gpsd or the gps
unit itself is misbehaving badly.


William G. Unruh __| Canadian Institute for|____ Tel: +1(604)822-3273
Physics&Astronomy _|___ Advanced Research _|____ Fax: +1(604)822-5324
UBC, Vancouver,BC _|_ Program in Cosmology |____ ***@physics.ubc.ca
Canada V6T 1Z1 ____|____ and Gravity ______|_ www.theory.physics.ubc.ca/
Post by Bill Unruh
Post by Brendan Simon (eTRIX)
I'm running Debian Linux (Jessie based with a few backport packages) on
some embedded systems that use a GPS
receiver to set the system clock.
The problem I have seen is that the chrony loses sync with the GPS
reference after some time, and the system clock
starts to drift and never recovers.  This is disastrous as I need to have
the clocks synchronised.
gpsd was restarted and then chrony started synchronising the system clock again.
gpsd is 3.16 (jessie-backports) and chrony is 1.30 (jessie)
I think more information is needed. Make sure that the "refclocks" logs are
enabled, and look at them when the clock loses sync to see what is happening.
Are you using PPS or just the UTC time stamos from theGPS?
.
Post by Brendan Simon (eTRIX)
Is there anything in chrony 1.30 that could be known to cause this loss of
sync with no recovery ever?
Is there anything in gpsd 3.16 that could be known to cause this loss of
sync with no recovery ever?
I'm looking at moving to Debian Buster (still in Testing but to be released
early 2019).  It has gpsd 3.17 (but
hopefully might move to 3.18 before release) and chrony 3.4.
Does anyone know if this latter combination of gpsd/chrony is likely to
solve the issues/symptoms described above?
3.18: Fix several buffer issues.
      Too many other bug fixes and improvements to mention.
3.17: Fix a SiRF driver bug that occasionally confused NTP.
\Thanks,
Brendan.
--
____________________________________________________________________________________________________________________
eTRIX Services
PO Box 497, Inverloch, VIC 3996, AUSTRALIA.
(m) 0417-380-984
____________________________________________________________________________________________________________________
Brendan Simon (eTRIX)
2018-12-06 00:56:34 UTC
Permalink
I selected the last part of the refclocks.log.1 file.  The proceeding
lines looked uninteresting (to me).

The 3 hour gap is either loss of GPS signal, or GPS data (e.g. gpsd).  I
did see chrony come good when gpsd was restarted, but I can't recall if
it was for this unit/logfile.

The 1 second offset could be due to the the "offset 0.9999" in the conf
file (see below).  I'm not sure why it is set to this.  Probably from
defaults or taken from other sources on the internet.  From memory I've
also seen config settings of "offset of 0.5".

Is the offset setting meant to be tuned for each deployed location, or
for each device type (using a Quectel L76)?  All our systems will have
the "0.9999" setting.

Initially NTP servers were being used, but we found that chrony could
stop tracking them (and never every try them again).  We figured moving
to GPS would be more reliable (especially if our celluar internet
connection was flaky or down temporarily).

Here is the `chrony.conf` file (with the comments removed)

makestep 1 -1

refclock SHM 0 refid GPS precision 1e-1 offset 0.9999 delay 0.2

keyfile /etc/chrony/chrony.keys

commandkey 1

driftfile /var/lib/chrony/chrony.drift

log measurements statistics tracking rtc refclocks tempcomp

logdir /var/log/chrony

maxupdateskew 100.0

dumponexit

dumpdir /var/lib/chrony

local stratum 10

logchange 0.5

rtconutc


Thanks,
Brendan.

------------------------------------------------------------------------
Post by Bill Unruh
As Miroslav says, it would help to see the chrony.conf file as well. It is
bizarre that it sits there with a 1 second offset without trying to reduce it.
What is the 3 hr gap in the data? Did you select output or was this a
contiguous part of the data? offset sits there reliably at 1 second with no
hint that the system is trying to fix it.(mind you this is only a 10 sec piece
of data) Then there is suddenly a 10 year jump in the offset. It looks to me
like your GPS is unit is toasted since there is no indication in the actuall
system time that anything strange has happened. Ie, either gpsd or the gps
unit itself is misbehaving badly.
William G. Unruh __| Canadian Institute for|____ Tel: +1(604)822-3273
Physics&Astronomy _|___ Advanced Research _|____ Fax: +1(604)822-5324
Canada V6T 1Z1 ____|____ and Gravity ______|_ www.theory.physics.ubc.ca/
Post by Bill Unruh
Post by Brendan Simon (eTRIX)
I'm running Debian Linux (Jessie based with a few backport packages)
on some embedded systems that use a GPS
receiver to set the system clock.
The problem I have seen is that the chrony loses sync with the GPS
reference after some time, and the system clock
starts to drift and never recovers.  This is disastrous as I need to
have the clocks synchronised.
gpsd was restarted and then chrony started synchronising the system clock again.
gpsd is 3.16 (jessie-backports) and chrony is 1.30 (jessie)
I think more information is needed. Make sure that the "refclocks" logs are
enabled, and look at them when the clock loses sync to see what is happening.
Are you using PPS or just the UTC time stamos from theGPS?
.
Post by Brendan Simon (eTRIX)
Is there anything in chrony 1.30 that could be known to cause this
loss of sync with no recovery ever?
Is there anything in gpsd 3.16 that could be known to cause this
loss of sync with no recovery ever?
I'm looking at moving to Debian Buster (still in Testing but to be
released early 2019).  It has gpsd 3.17 (but
hopefully might move to 3.18 before release) and chrony 3.4.
Does anyone know if this latter combination of gpsd/chrony is likely
to solve the issues/symptoms described above?
3.18: Fix several buffer issues.
      Too many other bug fixes and improvements to mention.
3.17: Fix a SiRF driver bug that occasionally confused NTP.
Thanks,
Brendan.
Bill Unruh
2018-12-06 02:08:01 UTC
Permalink
I selected the last part of the refclocks.log.1 file.  The proceeding lines looked uninteresting (to me).
The 3 hour gap is either loss of GPS signal, or GPS data (e.g. gpsd).  I did see chrony come good when gpsd was
restarted, but I can't recall if it was for this unit/logfile.
Very strange
chrony will give up if the offset is too large ( and 10 years is definitely
too large) I cannot remember what the value is.

It is really had to figure out whyin the world you would suddenly have a 10
year offset. Something seriously wrong either with the gps or with gpsd.
The 1 second offset could be due to the the "offset 0.9999" in the conf file (see below).  I'm not sure why it is
set to this.  Probably from defaults or taken from other sources on the internet.  From memory I've also seen config
settings of "offset of 0.5".
That should be adjusted to the time that it takes the gps unit to send out the
signal. It is certainly less than 1 sec-- and is probably more like .3 sec or
so. You need something else (like a remote ntp server) to tell you what a
reasonable value for the offset is.
Is the offset setting meant to be tuned for each deployed location, or for each device type (using a Quectel L76)? 
All our systems will have the "0.9999" setting.
Usually devices will have some average time delay, but it also depends
crucially on how many NMEA sentences the GPS is asked to send out.The fewer
the better.
Initially NTP servers were being used, but we found that chrony could stop tracking them (and never every try them
again).  We figured moving to GPS would be more reliable (especially if our celluar internet connection was flaky or
down temporarily).
You could use them to determine your system's average offset.
Here is the `chrony.conf` file (with the comments removed)
makestep 1 -1
refclock SHM 0 refid GPS precision 1e-1 offset 0.9999 delay 0.2
keyfile /etc/chrony/chrony.keys
commandkey 1
driftfile /var/lib/chrony/chrony.drift
log measurements statistics tracking rtc refclocks tempcomp
The statistics and tracking might give more clues. NOt sure why tempcomp is
there as it is really only useful if you a PPS source.
logdir /var/log/chrony
maxupdateskew 100.0
Now that could be problematic. An NMEA source could really be worse than that.
The default is 1000 and I would advise that for an NMEA source.
dumponexit
dumpdir /var/lib/chrony
local stratum 10
You have this there why?
logchange 0.5
rtconutc
Thanks,
Brendan.
____________________________________________________________________________________________________________________
As Miroslav says, it would help to see the chrony.conf file as well. It is
bizarre that it sits there with a 1 second offset without trying to reduce it.
What is the 3 hr gap in the data? Did you select output or was this a
contiguous part of the data? offset sits there reliably at 1 second with no
hint that the system is trying to fix it.(mind you this is only a 10 sec piece
of data) Then there is suddenly a 10 year jump in the offset. It looks to me
like your GPS is unit is toasted since there is no indication in the actuall
system time that anything strange has happened. Ie, either gpsd or the gps
unit itself is misbehaving badly.
William G. Unruh __| Canadian Institute for|____ Tel: +1(604)822-3273
Physics&Astronomy _|___ Advanced Research _|____ Fax: +1(604)822-5324
Canada V6T 1Z1 ____|____ and Gravity ______|_ www.theory.physics.ubc.ca/
I'm running Debian Linux (Jessie based with a few backport packages) on some
embedded systems that use a GPS
receiver to set the system clock.
The problem I have seen is that the chrony loses sync with the GPS reference
after some time, and the system clock
starts to drift and never recovers.  This is disastrous as I need to have the
clocks synchronised.
gpsd was restarted and then chrony started synchronising the system clock again.
gpsd is 3.16 (jessie-backports) and chrony is 1.30 (jessie)
I think more information is needed. Make sure that the "refclocks" logs are
enabled, and look at them when the clock loses sync to see what is happening.
Are you using PPS or just the UTC time stamos from theGPS?
.
Is there anything in chrony 1.30 that could be known to cause this loss of sync
with no recovery ever?
Is there anything in gpsd 3.16 that could be known to cause this loss of sync
with no recovery ever?
I'm looking at moving to Debian Buster (still in Testing but to be released
early 2019).  It has gpsd 3.17 (but
hopefully might move to 3.18 before release) and chrony 3.4.
Does anyone know if this latter combination of gpsd/chrony is likely to solve
the issues/symptoms described above?
3.18: Fix several buffer issues.
      Too many other bug fixes and improvements to mention.
3.17: Fix a SiRF driver bug that occasionally confused NTP.
Thanks,
Brendan.
Brendan Simon (eTRIX)
2018-12-06 03:14:40 UTC
Permalink
Post by Brendan Simon (eTRIX)
The 3 hour gap is either loss of GPS signal, or GPS data (e.g.
gpsd).  I did see chrony come good when gpsd was
restarted, but I can't recall if it was for this unit/logfile.
Very strange chrony will give up if the offset is too large ( and 10
years is definitely
too large) I cannot remember what the value is.
It is really had to figure out whyin the world you would suddenly have a 10
year offset. Something seriously wrong either with the gps or with gpsd.
I have no idea either.  What would happen if there was some big delay in
the Linux system before chrony had a chance to process the NMEA data? 
Would offsets get screwed up?
Post by Brendan Simon (eTRIX)
The 1 second offset could be due to the the "offset 0.9999" in the
conf file (see below).  I'm not sure why it is
set to this.  Probably from defaults or taken from other sources on
the internet.  From memory I've also seen config
settings of "offset of 0.5".
That should be adjusted to the time that it takes the gps unit to send out the
signal. It is certainly less than 1 sec-- and is probably more like .3 sec or
so. You need something else (like a remote ntp server) to tell you what a
reasonable value for the offset is.
Post by Brendan Simon (eTRIX)
Is the offset setting meant to be tuned for each deployed location,
or for each device type (using a Quectel L76)? 
All our systems will have the "0.9999" setting.
Usually devices will have some average time delay, but it also depends
crucially on how many NMEA sentences the GPS is asked to send out.The fewer
the better.
Ok.  Assuming I tune the offset (with or without changing NMEA
sentences), would that affect the time between to units (with 1 second
resolution) if one unit was tuned (e.g. say 0.3 or 0.5)? and the other
left at 0.9999 ?

It's important that two units sample at the same time (triggered by PPS
signal), and log the same timestamp (with 1 second resolution).  i.e. if
the logs are out by 1 (or 2 seconds) on different units, then meaningful
system wide calculations are not possible.
Post by Brendan Simon (eTRIX)
      maxupdateskew 100.0
Now that could be problematic. An NMEA source could really be worse than that.
The default is 1000 and I would advise that for an NMEA source.
The default configuration file generated for Debian has this value set
to 100, so we just kept it.  Will change it to 1000.
Post by Brendan Simon (eTRIX)
      local stratum 10
You have this there why?
No reason.  Just an oversight.  Will remove it or set to off.

Thanks,
Brendan.

Loading...