Rob Janssen
2017-10-23 08:54:52 UTC
I noticed a small problem with PPS synchronization...
We have a couple of sites where there is locally distributed PPS from a GPSDO without access to its monitoring.
(another group is monitoring the GPSDO)
So, we use a chrony config with a PPS source and a couple of network time sources to get absolute time.
Config is like this:
refclock PPS /dev/pps0 refid PPS
server xx.xxx.72.10 iburst
server xx.xxx.72.130 iburst
server xx.xxx.72.131 iburst
(ldattach 18 /dev/ttyS0 is used to provide the /dev./pps0)
This works OK. After startup chrony initially synchronizes to the network time and after a minute or so it locks
in to the PPS pulses. The sources output is like this:
210 Number of sources = 4
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
#* PPS 0 4 377 24 +218ns[ +278ns] +/- 124ns
^- xxxxxx.xxxx.xxx 1 10 377 877 -147us[ -122us] +/- 11ms
^- xxxxxx.xxxx.xxx 1 10 377 14 +1480us[+1480us] +/- 10ms
^- xxx.xxxxxx.xxxx.xxx 1 10 377 345 +1446us[+1447us] +/- 10ms
However, recently at one site the PPS signal was lost, but chrony keeps "locked" to it:
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
#* PPS 0 4 0 13h -279ns[ -401ns] +/- 79ns
^- xxxxxx.xxxx.xxx 1 10 377 250 +3462us[+3462us] +/- 10ms
As can be seen, it has been lost for 13 hours but it still has the * sign in the 2nd column.
We are remotely monitoring these systems using chronyc tracking and it still indicated stratum 1 referenced to PPS.
I would have expected it to drop back to using those network time servers after some time of not getting pulses
(i.e. once "Reach" is 0) and the stratum to increase to 2. When it would operate that way, we would have
received an alert.
Furthermore, the clock had drifted by 3.5ms by the time the above status was noticed, while when synchronized
to network time it usually is within 1 to 1.5ms. So it really is not considering those network time sources anymore.
The above situation occurred with chrony 2.1
However, I have reproduced it with an installation updated to version 3.2 although with an "outage" time of 15 minutes.
It had Reach 0 but still was indicating lock to PPS after 869 seconds.
Is it to be considered a bug, or is this just a design feature?
How could we work around that in this case?
Rob
We have a couple of sites where there is locally distributed PPS from a GPSDO without access to its monitoring.
(another group is monitoring the GPSDO)
So, we use a chrony config with a PPS source and a couple of network time sources to get absolute time.
Config is like this:
refclock PPS /dev/pps0 refid PPS
server xx.xxx.72.10 iburst
server xx.xxx.72.130 iburst
server xx.xxx.72.131 iburst
(ldattach 18 /dev/ttyS0 is used to provide the /dev./pps0)
This works OK. After startup chrony initially synchronizes to the network time and after a minute or so it locks
in to the PPS pulses. The sources output is like this:
210 Number of sources = 4
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
#* PPS 0 4 377 24 +218ns[ +278ns] +/- 124ns
^- xxxxxx.xxxx.xxx 1 10 377 877 -147us[ -122us] +/- 11ms
^- xxxxxx.xxxx.xxx 1 10 377 14 +1480us[+1480us] +/- 10ms
^- xxx.xxxxxx.xxxx.xxx 1 10 377 345 +1446us[+1447us] +/- 10ms
However, recently at one site the PPS signal was lost, but chrony keeps "locked" to it:
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
#* PPS 0 4 0 13h -279ns[ -401ns] +/- 79ns
^- xxxxxx.xxxx.xxx 1 10 377 250 +3462us[+3462us] +/- 10ms
As can be seen, it has been lost for 13 hours but it still has the * sign in the 2nd column.
We are remotely monitoring these systems using chronyc tracking and it still indicated stratum 1 referenced to PPS.
I would have expected it to drop back to using those network time servers after some time of not getting pulses
(i.e. once "Reach" is 0) and the stratum to increase to 2. When it would operate that way, we would have
received an alert.
Furthermore, the clock had drifted by 3.5ms by the time the above status was noticed, while when synchronized
to network time it usually is within 1 to 1.5ms. So it really is not considering those network time sources anymore.
The above situation occurred with chrony 2.1
However, I have reproduced it with an installation updated to version 3.2 although with an "outage" time of 15 minutes.
It had Reach 0 but still was indicating lock to PPS after 869 seconds.
Is it to be considered a bug, or is this just a design feature?
How could we work around that in this case?
Rob
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.