[chrony-users] Re: chrony losing sync with timeserver and never recovers

Discussion:

Brendan Simon (eTRIX)

2017-10-19 12:19:57 UTC

Hi chrony-users,

Anyone know why chrony would stop polling time servers?Â maxpoll is
supposed to be 1024 (17 minutes) max, but I my system polls initially
and then seems to stop polling servers completely.Â The example below
shows 463 days with no response from 2 servers.

Chrony 1.30 on Debian 8 (Jessie)

Thanks, Brendan.

------------------------------------------------------------------------

I have a number embedded systems that are located in remote areas that
need to be up 24/7 for logging of data via a 3G internet connection.Â
The systems are ARM based and running Debian 8 (Jessie) with chrony
installed as the ntp client.
The systems sync with 2 ntp servers (`tic.ntp.telstra.net` and
`toc.ntp.telstra.net`) on boot.Â I know this because (a) there is no
RTC on the system, and (b) the application does not start until the
system date is > 2015 (i.e. not the startup of default 1970).
For some reason chrony loses sync with the servers and never
recovers.Â I have system times that are out by minutes !!
# chronyc sources
210 Number of sources = 4
MS Name/IP addressÂ Â Â Â Â Â Â Â Stratum Poll Reach LastRx Last sample
===============================================================================
#? GPSÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0Â Â 4Â Â Â Â 0Â Â 10yÂ Â Â Â +0ns[Â Â
+0ns] +/-Â Â Â 0ns
#? PPSÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0Â Â 4Â Â Â Â 0Â Â 10yÂ Â Â Â +0ns[Â Â
+0ns] +/-Â Â Â 0ns
^? tic.ntp.telstra.netÂ Â Â Â Â Â Â Â Â Â 2Â 10Â Â Â Â 0Â 463dÂ Â Â -14ms[Â
-15ms] +/-Â Â 44ms
^? toc.ntp.telstra.netÂ Â Â Â Â Â Â Â Â Â 2Â 10Â Â Â Â 0Â 463dÂ Â Â -23ms[Â
-24ms] +/-Â Â 79ms
As can be seen, the servers have a state of '?' and haven't recieved
data in 463 days !!Â yet I can ping them ok, and if I restart chrony
all is good again.
The 3G modem can have problems and are reset (powered down and up)
whenever internet connectivity is lost (detected by pings not
responding).Â And 3G connectivity is not the most reliable.
server tic.ntp.telstra.net iburst
server toc.ntp.telstra.net iburst
makestep 1000 -1
initstepslew 30 0.au.pool.ntp.org 1.au.pool.ntp.org
2.au.pool.ntp.org 3.au.pool.ntp.org
*What causes chrony to not retry servers?*
Is there a config setting I need to always try these servers?
I notice the `online` and `offline` settings.Â Do I need to explicitly
tag servers as `online`?Â I presume that's the default.
Do I need to explicitly tag the servers as `offline` before powering
the modem up and down?Â I thought leaving them online would be ok.Â
The only downside is it may take a little longer to get the time back
in sync, right?
But they can only get back in sync if chrony is talking to the servers.
Thanks,
Brendan.

Miroslav Lichvar

2017-10-19 14:31:34 UTC

Permalink

What does "chronyc activity" say when this happens?

The default is "online" for servers specified in chrony.conf without
the "offline" option. Maybe you have a ppp script that calls "chronyc
offline" when the link goes down, but there is no script that would
call "chronyc online" when it goes up?

Post by Brendan Simon (eTRIX)
Chrony 1.30 on Debian 8 (Jessie)

The 3G modem can have problems and are reset (powered down and up)
whenever internet connectivity is lost (detected by pings not
responding). And 3G connectivity is not the most reliable.

--
Miroslav Lichvar
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.

Bill Unruh

2017-10-19 15:22:45 UTC

Permalink

I am certainly confused by the log. The GPS and the
PPS have never delivered a valid ntp to chrony. What happened 463 days ago?
When was this system started?
Or have you edited these responses, not showing us the complete picture?
Ie, are there other servers which are being polled? It may be that if there is
no response from a server for N poll periods, chrony gives up and stops
polling. I no longer know chrony well enough to say that this is what happens
or not.

Post by Brendan Simon (eTRIX)
Hi chrony-users,
Anyone know why chrony would stop polling time servers?Â maxpoll is supposed
to be 1024 (17 minutes) max, but I my system polls initially and then seems
to stop polling servers completely.Â The example below shows 463 days with
no response from 2 servers.
Chrony 1.30 on Debian 8 (Jessie)
Thanks, Brendan.
____________________________________________________________________________
I have a number embedded systems that are located in remote
areas that need to be up 24/7 for logging of data via a 3G
internet connection.Â The systems are ARM based and running
Debian 8 (Jessie) with chrony installed as the ntp client.
The systems sync with 2 ntp servers (`tic.ntp.telstra.net` and
`toc.ntp.telstra.net`) on boot.Â I know this because (a) there
is no RTC on the system, and (b) the application does not start
until the system date is > 2015 (i.e. not the startup of default
1970).
For some reason chrony loses sync with the servers and never
recovers.Â I have system times that are out by minutes !!
# chronyc sources
210 Number of sources = 4
MS Name/IP addressÂ Â Â Â Â Â Â Â Stratum Poll Reach LastRx
Last sample
===========================================================================
====
#? GPSÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0Â Â 4Â Â Â Â 0Â Â
10yÂ Â Â Â +0ns[Â Â +0ns] +/-Â Â Â 0ns
#? PPSÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0Â Â 4Â Â Â Â 0Â Â
10yÂ Â Â Â +0ns[Â Â +0ns] +/-Â Â Â 0ns
^? tic.ntp.telstra.netÂ Â Â Â Â Â Â Â Â Â 2Â 10Â Â Â Â 0Â
463dÂ Â Â -14ms[Â -15ms] +/-Â Â 44ms
^? toc.ntp.telstra.netÂ Â Â Â Â Â Â Â Â Â 2Â 10Â Â Â Â 0Â
463dÂ Â Â -23ms[Â -24ms] +/-Â Â 79ms
As can be seen, the servers have a state of '?' and haven't
recieved data in 463 days !!Â yet I can ping them ok, and if I
restart chrony all is good again.
The 3G modem can have problems and are reset (powered down and
up) whenever internet connectivity is lost (detected by pings
not responding).Â And 3G connectivity is not the most reliable.

It would actually be more helpful if you gave us all the information. You do
not know what is happening, so how can you be sure that these are the
only "interesting" commands in your chrony.conf?

Post by Brendan Simon (eTRIX)
server tic.ntp.telstra.net iburst
server toc.ntp.telstra.net iburst
makestep 1000 -1
initstepslew 30 0.au.pool.ntp.org 1.au.pool.ntp.org
2.au.pool.ntp.org 3.au.pool.ntp.org
What causes chrony to not retry servers?
Is there a config setting I need to always try these servers?
I notice the `online` and `offline` settings.Â Do I need to
explicitly tag servers as `online`?Â I presume that's the
default.
Do I need to explicitly tag the servers as `offline` before
powering the modem up and down?Â I thought leaving them online
would be ok.Â The only downside is it may take a little longer
to get the time back in sync, right?
But they can only get back in sync if chrony is talking to the servers.

It might help to tag them as online when the 3g comes back up.

Post by Brendan Simon (eTRIX)
Thanks,
Brendan.

Parker, Michael D.

2017-10-19 17:04:14 UTC

Permalink

I had a similar type of problem using the release Chrony 2.x under RHEL 6 using only 2 time sources.
All looked good at the start but eventually the problem showed up.
If I recall my research, it had something to do some type of time source window situation.
If the sources' +/- differences windows did not overlap chrony quit trying to go back into sync.
I was polling much more frequently than you were.

***** ***** *****
Michael D. Parker
General Atomics – ElectroMagnetics Systems Division (EMS)
***@ga.com <<<<< NOTE: Remember to include my middle initial >>>>>

************************************************************************
CONFIDENTIALITY NOTICE: This communication is intended to be confidential to the
person(s) to whom it is addressed. If you are not the intended recipient or the agent of the
intended recipient or if you are unable to deliver this communication to the intended
recipient, you must not read, use or disseminate this information. If you have received
this communication in error,please advise the sender immediately by telephone and delete
this messageand any attachments without retaining a copy.
*************************************************************************

-----Original Message-----
From: Bill Unruh [mailto:***@physics.ubc.ca]
Sent: Thursday, October 19, 2017 8:23 AM
To: chrony-***@chrony.tuxfamily.org
Subject: -EXT-Re: [chrony-users] Re: chrony losing sync with timeserver and never recovers

I am certainly confused by the log. The GPS and the PPS have never delivered a valid ntp to chrony. What happened 463 days ago?
When was this system started?
Or have you edited these responses, not showing us the complete picture?
Ie, are there other servers which are being polled? It may be that if there is no response from a server for N poll periods, chrony gives up and stops polling. I no longer know chrony well enough to say that this is what happens or not.

Post by Brendan Simon (eTRIX)
Hi chrony-users,
Anyone know why chrony would stop polling time servers? maxpoll is
supposed to be 1024 (17 minutes) max, but I my system polls initially
and then seems to stop polling servers completely. The example below
shows 463 days with no response from 2 servers.
Chrony 1.30 on Debian 8 (Jessie)
Thanks, Brendan.
______________________________________________________________________
______
I have a number embedded systems that are located in remote
areas that need to be up 24/7 for logging of data via a 3G
internet connection. The systems are ARM based and running
Debian 8 (Jessie) with chrony installed as the ntp client.
The systems sync with 2 ntp servers (`tic.ntp.telstra.net` and
`toc.ntp.telstra.net`) on boot. I know this because (a) there
is no RTC on the system, and (b) the application does not start
until the system date is > 2015 (i.e. not the startup of default
1970).
For some reason chrony loses sync with the servers and never
recovers. I have system times that are out by minutes !!
# chronyc sources
210 Number of sources = 4
MS Name/IP address         Stratum Poll Reach LastRx
Last sample
===========================================================================
====
#? GPS                           0   4     0
10y     +0ns[   +0ns] +/-    0ns
#? PPS                           0   4     0
10y     +0ns[   +0ns] +/-    0ns
^? tic.ntp.telstra.net           2 10     0
463d    -14ms[ -15ms] +/-   44ms
^? toc.ntp.telstra.net           2 10     0
463d    -23ms[ -24ms] +/-   79ms
As can be seen, the servers have a state of '?' and haven't
recieved data in 463 days !! yet I can ping them ok, and if I
restart chrony all is good again.
The 3G modem can have problems and are reset (powered down and
up) whenever internet connectivity is lost (detected by pings
not responding). And 3G connectivity is not the most reliable.

It would actually be more helpful if you gave us all the information. You do not know what is happening, so how can you be sure that these are the only "interesting" commands in your chrony.conf?

Post by Brendan Simon (eTRIX)
server tic.ntp.telstra.net iburst
server toc.ntp.telstra.net iburst
makestep 1000 -1
initstepslew 30 0.au.pool.ntp.org 1.au.pool.ntp.org
2.au.pool.ntp.org 3.au.pool.ntp.org
What causes chrony to not retry servers?
Is there a config setting I need to always try these servers?
I notice the `online` and `offline` settings. Do I need to
explicitly tag servers as `online`? I presume that's the
default.
Do I need to explicitly tag the servers as `offline` before
powering the modem up and down? I thought leaving them online
would be ok. The only downside is it may take a little longer
to get the time back in sync, right?
But they can only get back in sync if chrony is talking to the servers.

It might help to tag them as online when the 3g comes back up.

Post by Brendan Simon (eTRIX)
Thanks,
Brendan.

��칻�&�zf��k�|��z�\��'�۱}��*+��칻�&ފ{az˛��-��zZ^��r��+�z�+z��!��_jh�ʊ��+a��i�{az˛��-

Bryan Seitz

2017-10-19 19:52:17 UTC

Permalink

I've seen the same issue recently with:

chrony-3.1-4.fc26.x86_64

[***@storage ~]$ chronyc sources
210 Number of sources = 7
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* ntp-gps1.fiber.house 1 7 377 122 +799ns[-2226ns] +/- 1252us
^- some.server 1 10 377 995 +32us[-3108ns] +/- 35ms
^- another.server 1 10 377 927 -3831us[-3869us] +/- 15ms

Where ntp-gps1.fiber.house was dead/down/not powered up, lastRx was 10h but it remained 'synced' to it :(
Seems like correct operation would mark this server as offline, but continue to poll and mark it online again
once it comes back up? I know this is how 'ntpd' works.

Post by Parker, Michael D.
I had a similar type of problem using the release Chrony 2.x under RHEL 6 using only 2 time sources.
All looked good at the start but eventually the problem showed up.
If I recall my research, it had something to do some type of time source window situation.
If the sources' +/- differences windows did not overlap chrony quit trying to go back into sync.
I was polling much more frequently than you were.
-----Original Message-----
Sent: Thursday, October 19, 2017 8:23 AM
Subject: -EXT-Re: [chrony-users] Re: chrony losing sync with timeserver and never recovers
I am certainly confused by the log. The GPS and the PPS have never delivered a valid ntp to chrony. What happened 463 days ago?
When was this system started?
Or have you edited these responses, not showing us the complete picture?
Ie, are there other servers which are being polled? It may be that if there is no response from a server for N poll periods, chrony gives up and stops polling. I no longer know chrony well enough to say that this is what happens or not.

It would actually be more helpful if you gave us all the information. You do not know what is happening, so how can you be sure that these are the only "interesting" commands in your chrony.conf?

Post by Brendan Simon (eTRIX)
server tic.ntp.telstra.net iburst
server toc.ntp.telstra.net iburst
makestep 1000 -1
initstepslew 30 0.au.pool.ntp.org 1.au.pool.ntp.org
2.au.pool.ntp.org 3.au.pool.ntp.org
What causes chrony to not retry servers?
Is there a config setting I need to always try these servers?
I notice the `online` and `offline` settings. Do I need to
explicitly tag servers as `online`? I presume that's the
default.
Do I need to explicitly tag the servers as `offline` before
powering the modem up and down? I thought leaving them online
would be ok. The only downside is it may take a little longer
to get the time back in sync, right?
But they can only get back in sync if chrony is talking to the servers.

It might help to tag them as online when the 3g comes back up.

--
Bryan G. Seitz
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.

Miroslav Lichvar

2017-10-20 07:59:16 UTC

Permalink

Post by Bryan Seitz
210 Number of sources = 7
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* ntp-gps1.fiber.house 1 7 377 122 +799ns[-2226ns] +/- 1252us
^- some.server 1 10 377 995 +32us[-3108ns] +/- 35ms
^- another.server 1 10 377 927 -3831us[-3869us] +/- 15ms
Where ntp-gps1.fiber.house was dead/down/not powered up, lastRx was 10h but it remained 'synced' to it :(
Seems like correct operation would mark this server as offline, but continue to poll and mark it online again
once it comes back up? I know this is how 'ntpd' works.

chrony doesn't work like that. The ntp-gps1.fiber.house source is
about 10 times better than the next best source (comparing the +/-
value). When it stops responding, chronyd will not switch to another
source until the estimate of the maximum local error becomes worse
than the estimate of error with another source, or all samples of the
source with * are older than all samples of the other sources. You
could increase the maxclockerror option to speed up the former, or
decrease the maxsamples option to speed up the later.

With a GPS refclock the difference in accuracy may be 3 or more orders
of magnitude. When the GPS signal was lost only for few minutes, I
think it would be a bad idea to switch to an NTP source.

Post by Bryan Seitz

That works as expected. If there are only two sources and they don't
agree with each other (they are marked with the x symbol), chronyd
knows that at least one of them is wrong, but it doesn't know which
one that is, so it must give up on synchronization until they agree
again. That's the reason why 3 is the minimum recommended number of
sources.