Discussion:
[chrony-users] Many servers became unreachable
Roel Schroeven
2015-06-01 15:41:42 UTC
Permalink
Hi,

I have a strange situation with chronyd on our company network.

We have a small network with a Debian server, which runs chrony (version
1.24 from Debian squeeze) amongst other things. Chrony is configured to use
3 upstream time servers.

That setup has worked nicely for quote some time, but today suddenly it
failed: chrony can't connect to its upstream servers anymore, and I have no
idea why. I added three servers from the Debian pool; 3 of those are also
unreachable, 1 fortunately still works. Output from "chronyc sources":

210 Number of sources = 7
MS Name/IP address Stratum Poll LastRx Last sample
============================================================================
^? ntp1.belbone.be 0 7 10y +0ns[ +0ns] +/- 0ns
^? ntp1.telenet-ops.be 0 7 10y +0ns[ +0ns] +/- 0ns
^? daedalus.belnet.be 0 7 10y +0ns[ +0ns] +/- 0ns
^? 79.132.231.103.static.edp 0 7 10y +0ns[ +0ns] +/- 0ns
^? ssh2.ulyssis.student.kule 0 7 10y +0ns[ +0ns] +/- 0ns
^* rumst.verbert.be 2 6 49 -28ms[-2397us] +/- 33ms
^? 79.132.231.104.static.edp 0 7 10y +0ns[ +0ns] +/- 0ns

My first thought was an error in our firewall, our perhaps a firewall at our
ISP. But then 185.77.199.1 wouldn't work. Also, I tried "ntpdate -q
ntp1.belbone.be" and that *does* work. And it seems the other servers do
reply, but only once, as can be seen from this tcpdump trace:

$ sudo tcpdump -i eth0 tcp port 123 or udp port 123
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
17:35:59.534560 IP rack02.tresco.41820 > ntp1.telenet-ops.be.ntp: NTPv3,
Client, length 48
17:35:59.559284 IP ntp1.telenet-ops.be.ntp > rack02.tresco.41820: NTPv3,
Server, length 48
17:35:59.559392 IP rack02.tresco.41820 > ntp1.telenet-ops.be.ntp: NTPv3,
Client, length 48
17:35:59.583403 IP ntp1.telenet-ops.be.ntp > rack02.tresco.41820: NTPv3,
Server, length 48
17:35:59.583462 IP rack02.tresco.41820 > ntp1.telenet-ops.be.ntp: NTPv3,
Client, length 48
17:35:59.607389 IP ntp1.telenet-ops.be.ntp > rack02.tresco.41820: NTPv3,
Server, length 48
17:35:59.607479 IP rack02.tresco.41820 > ntp1.telenet-ops.be.ntp: NTPv3,
Client, length 48
17:35:59.631380 IP ntp1.telenet-ops.be.ntp > rack02.tresco.41820: NTPv3,
Server, length 48
17:35:59.734705 IP rack02.tresco.41820 > daedalus.belnet.be.ntp: NTPv3,
Client, length 48
17:35:59.758341 IP daedalus.belnet.be.ntp > rack02.tresco.41820: NTPv3,
Server, length 48
17:35:59.758392 IP rack02.tresco.41820 > daedalus.belnet.be.ntp: NTPv3,
Client, length 48
17:35:59.781458 IP daedalus.belnet.be.ntp > rack02.tresco.41820: NTPv3,
Server, length 48
17:35:59.781514 IP rack02.tresco.41820 > daedalus.belnet.be.ntp: NTPv3,
Client, length 48
17:35:59.805471 IP daedalus.belnet.be.ntp > rack02.tresco.41820: NTPv3,
Server, length 48
17:35:59.805513 IP rack02.tresco.41820 > daedalus.belnet.be.ntp: NTPv3,
Client, length 48
17:35:59.860138 IP daedalus.belnet.be.ntp > rack02.tresco.41820: NTPv3,
Server, length 48
17:36:01.540042 IP rack02.tresco.ntp > ntp1.belbone.be.ntp: NTPv3, Client,
length 48
17:36:03.861352 IP rack02.tresco.ntp > rumst.verbert.be.ntp: NTPv3, Client,
length 48
17:36:03.861373 IP rack02.tresco.ntp > 79.132.231.104.static.edpnet.net.ntp:
NTPv3, Client, length 48
17:36:03.861382 IP rack02.tresco.ntp > 79.132.231.103.static.edpnet.net.ntp:
NTPv3, Client, length 48
17:36:03.861393 IP rack02.tresco.ntp > ssh2.ulyssis.student.kuleuven.be.ntp:
NTPv3, Client, length 48
17:36:03.861401 IP rack02.tresco.ntp > daedalus.belnet.be.ntp: NTPv3,
Client, length 48
17:36:03.861409 IP rack02.tresco.ntp > ntp1.telenet-ops.be.ntp: NTPv3,
Client, length 48
17:36:03.894164 IP rumst.verbert.be.ntp > rack02.tresco.ntp: NTPv3, Server,
length 48
17:36:03.894312 IP rack02.tresco.ntp > rumst.verbert.be.ntp: NTPv3, Client,
length 48
17:36:03.927595 IP rumst.verbert.be.ntp > rack02.tresco.ntp: NTPv3, Server,
length 48
17:36:03.927707 IP rack02.tresco.ntp > rumst.verbert.be.ntp: NTPv3, Client,
length 48
17:36:03.962826 IP rumst.verbert.be.ntp > rack02.tresco.ntp: NTPv3, Server,
length 48
17:36:03.963200 IP rack02.tresco.ntp > rumst.verbert.be.ntp: NTPv3, Client,
length 48
17:36:03.993066 IP rumst.verbert.be.ntp > rack02.tresco.ntp: NTPv3, Server,
length 48
17:36:03.993352 IP rack02.tresco.ntp > rumst.verbert.be.ntp: NTPv3, Client,
length 48
17:36:04.024304 IP rumst.verbert.be.ntp > rack02.tresco.ntp: NTPv3, Server,
length 48
17:36:09.540677 IP rack02.tresco.ntp > ntp1.belbone.be.ntp: NTPv3, Client,
length 48
17:36:13.862246 IP rack02.tresco.ntp > 79.132.231.104.static.edpnet.net.ntp:
NTPv3, Client, length 48
17:36:13.862260 IP rack02.tresco.ntp > 79.132.231.103.static.edpnet.net.ntp:
NTPv3, Client, length 48
17:36:13.862267 IP rack02.tresco.ntp > ssh2.ulyssis.student.kuleuven.be.ntp:
NTPv3, Client, length 48
17:36:13.862274 IP rack02.tresco.ntp > daedalus.belnet.be.ntp: NTPv3,
Client, length 48
17:36:13.862281 IP rack02.tresco.ntp > ntp1.telenet-ops.be.ntp: NTPv3,
Client, length 48
17:36:17.541421 IP rack02.tresco.ntp > ntp1.belbone.be.ntp: NTPv3, Client,
length 48
17:36:21.863260 IP rack02.tresco.ntp > 79.132.231.104.static.edpnet.net.ntp:
NTPv3, Client, length 48
17:36:23.863232 IP rack02.tresco.ntp > 79.132.231.103.static.edpnet.net.ntp:
NTPv3, Client, length 48
17:36:23.863392 IP rack02.tresco.ntp > ntp1.telenet-ops.be.ntp: NTPv3,
Client, length 48
17:36:25.863211 IP rack02.tresco.ntp > ssh2.ulyssis.student.kuleuven.be.ntp:
NTPv3, Client, length 48
17:36:25.863359 IP rack02.tresco.ntp > daedalus.belnet.be.ntp: NTPv3,
Client, length 48
17:36:25.863507 IP rack02.tresco.ntp > ntp1.belbone.be.ntp: NTPv3, Client,
length 48
17:36:29.864156 IP rack02.tresco.ntp > 79.132.231.104.static.edpnet.net.ntp:
NTPv3, Client, length 48
17:36:31.864115 IP rack02.tresco.ntp > 79.132.231.103.static.edpnet.net.ntp:
NTPv3, Client, length 48

We have another server at another location which uses the same version of
Debian with the same version of Chrony using the same servers; that one
still runs without problems.

Is it possible that a number of servers are actively blocking us? But then
why do "ntpdate -q" and chrony on our other server still work?
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Roel Schroeven
2015-06-01 17:36:37 UTC
Permalink
Post by Roel Schroeven
That setup has worked nicely for quote some time, but today suddenly it
failed: chrony can't connect to its upstream servers anymore, and I have no
idea why.
Everything works again now. It's a complete mystery to me.



Best regards,
Roel
--
"Life ain't no fairy tale
Just give me another ale
And I'll drink to Rock 'n Roll"
-- Barkeep (The Scabs)
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Miroslav Lichvar
2015-06-02 07:45:23 UTC
Permalink
Post by Roel Schroeven
Post by Roel Schroeven
That setup has worked nicely for quote some time, but today suddenly it
failed: chrony can't connect to its upstream servers anymore, and I have no
idea why.
Everything works again now. It's a complete mystery to me.
It's probably the NAT on your firewall giving you source ports below
123, which older ntpd versions reject as bogus:

https://bugs.ntp.org/show_bug.cgi?id=2174

ntpdate -q works because it uses a random source port. chronyd
in recent versions does that too by default. Here is what I see with
when I fix the source port and query one of the servers from your
report:

# /usr/sbin/chronyd -Q 'acquisitionport 122' 'server ntp1.telenet-ops.be iburst'
2015-06-02T07:41:37Z chronyd version 2.0 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +DEBUG +ASYNCDNS +IPV6 +SECHASH)
2015-06-02T07:41:37Z Initial frequency -22.698 ppm
2015-06-02T07:41:47Z No suitable source for synchronisation
2015-06-02T07:41:47Z chronyd exiting
# /usr/sbin/chronyd -Q 'acquisitionport 123' 'server ntp1.telenet-ops.be iburst'
2015-06-02T07:41:50Z chronyd version 2.0 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +DEBUG +ASYNCDNS +IPV6 +SECHASH)
2015-06-02T07:41:50Z Initial frequency -22.698 ppm
2015-06-02T07:41:54Z System clock wrong by -0.002986 seconds (ignored)
2015-06-02T07:41:54Z chronyd exiting
--
Miroslav Lichvar
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Roel Schroeven
2015-06-02 08:25:29 UTC
Permalink
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Miroslav Lichvar schreef:
<blockquote cite="mid:***@localhost" type="cite">
<pre wrap="">On Mon, Jun 01, 2015 at 07:36:37PM +0200, Roel Schroeven wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Roel Schroeven schreef op 2015-06-01 17:41:
</pre>
<blockquote type="cite">
<pre wrap="">That setup has worked nicely for quote some time, but today suddenly it
failed: chrony can't connect to its upstream servers anymore, and I have no
idea why.
</pre>
</blockquote>
<pre wrap="">Everything works again now. It's a complete mystery to me.
</pre>
</blockquote>
<pre wrap=""><!---->
It's probably the NAT on your firewall giving you source ports below
123, which older ntpd versions reject as bogus:

<a class="moz-txt-link-freetext" href="https://bugs.ntp.org/show_bug.cgi?id=2174">https://bugs.ntp.org/show_bug.cgi?id=2174</a>

ntpdate -q works because it uses a random source port. chronyd
in recent versions does that too by default.</pre>
</blockquote>
Aha, that sounds plausible.<br>
<br>
According to tcpdump, initially chrony sends requests from a random
port (41820 in this case) and does indeed receive replies; after that
it starts using port 123 and receives no more replies. I can't see what
the NAT does with that, so I assume you're right.<br>
<br>
Thanks for clearing up the mystery!<br>
<br>
<br>
Best regards,<br>
Roel<br>
<pre class="moz-signature" cols="72">--
"Life ain't no fairy tale
Just give me another ale
And I'll drink to Rock 'n Roll"
-- Barkeep (The Scabs)
</pre>
</body>
</html>
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Roel Schroeven
2015-06-02 09:01:43 UTC
Permalink
Post by Miroslav Lichvar
# /usr/sbin/chronyd -Q 'acquisitionport 122' 'server ntp1.telenet-ops.be iburst'
A question about the acquisitionport option: if I'm reading to the
documentation on my system (old chrony version) correctly, it only
affects the rapid-fire measurements requested with the initstepslew
directive. The current documentation on the website doesn't mention
initstepslew, implying that acquisitionport affects all server
communication.

Has its behavior changed, or am I reading the documentation wrong?


Best regards,
Roel
--
"Life ain't no fairy tale
Just give me another ale
And I'll drink to Rock 'n Roll"
-- Barkeep (The Scabs)
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Miroslav Lichvar
2015-06-02 09:23:18 UTC
Permalink
Post by Roel Schroeven
Post by Miroslav Lichvar
# /usr/sbin/chronyd -Q 'acquisitionport 122' 'server ntp1.telenet-ops.be iburst'
A question about the acquisitionport option: if I'm reading to the
documentation on my system (old chrony version) correctly, it only affects
the rapid-fire measurements requested with the initstepslew directive. The
current documentation on the website doesn't mention initstepslew, implying
that acquisitionport affects all server communication.
Has its behavior changed, or am I reading the documentation wrong?
It has changed. initstepslew used to be a separate NTP client inside
chrony, which used its own socket configured with the acquisitionport
directive. The port directive configured the main NTP socket which was
used for both server and client communication. You couldn't change the
client port without changing also the server port.

The initstepslew NTP implementation was removed and it uses the main
NTP code now. It has separate client and server sockets, configured
with acquisitionport and port directives respectively.
--
Miroslav Lichvar
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Roel Schroeven
2015-06-02 11:11:29 UTC
Permalink
Post by Miroslav Lichvar
It has changed. initstepslew used to be a separate NTP client inside
chrony, which used its own socket configured with the acquisitionport
directive. The port directive configured the main NTP socket which was
used for both server and client communication. You couldn't change the
client port without changing also the server port.
The initstepslew NTP implementation was removed and it uses the main
NTP code now. It has separate client and server sockets, configured
with acquisitionport and port directives respectively.
All clear now. Thanks!


Best regarsds,
Roel
--
"Life ain't no fairy tale
Just give me another ale
And I'll drink to Rock 'n Roll"
-- Barkeep (The Scabs)
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Loading...