Discussion:
[chrony-users] chrony and ntpd xleave interoperability
FUSTE Emmanuel
2018-01-23 10:31:38 UTC
Permalink
Hello,

First, my apologies for the fingers crossing on chrony-dev when I tried
to subscribe to chrony-users...

I'm doing some tests to replace ntpd by chrony on some servers groups.
Theses servers use a peer association with interleave option.

When I try to do the same with ntpd on one side and chrony on the other,
things go bad.
At best, chrony got a working association with interleave status with
very long response time.
On the ntpd side, the association never work. The chrony server never
get the "reach" state and the reach counter is stuck a zero.

As soon as I remove  the xleave option on the ntpd side, all start
immediately to work as expected.

ntpd :
peer y.y.y.y minpoll 5 maxpoll10 xleave
restrict y.y.y.y notrap nomodify noquery

chrony :
peer x.x.x.x xleave minpoll 5  maxpoll 10
allow x.x.x.0/24

Since yesterday, I had removed the xleave option on the ntpd side.
All was good on the two sides.
So I tried to reactivate the xleave option
-> Boom it works !!!

I restarted chrony
-> ntpd logged "revceive: KoD packet from 192.54.145.235 has a zero org
or rec timestamp. Ignoring."
and four minute later "y.y.y.y 8613 83 unreacheable"
The previously working assoc is now dead.
No working assoc from chrony.

So I restarted ntpd
-> chrony start to see the other server (ntpdata) but never reach a good
state.
-> ntpd does not reach the "reach" state.

remove the xleave from ntpd and restart
-> all is still stuck
restart chrony
->  ntpd start to see the chrony server, reach state increment, and
reach a "backup" condition. All is good on the chrony side.

Re-add xleave option on ntpd side.
unreach counter increment, flash=1606 so packet_bogus...
on the chrony side, "Total valid RX" no longer increment...

I'm lost.

chrony 3.2
ntp-4.2.8p8, ntp-4.2.8p10

Could I normally expect xleave interoperability between chrony and ntpd
or it is something too much "implementation specific" ?

Emmanuel.
��칻�&�zf���k�|�������z�\��'�۱}���*+����칻�&ފ{az˛��-��zZ^���r��+�z�+z����!�����_jh�ʊ��+a��i�{az˛��-
Miroslav Lichvar
2018-01-23 12:00:14 UTC
Permalink
Post by FUSTE Emmanuel
When I try to do the same with ntpd on one side and chrony on the other,
things go bad.
At best, chrony got a working association with interleave status with
very long response time.
A long response time up to the polling interval of the peer is normal
in symmetric associations.
Post by FUSTE Emmanuel
On the ntpd side, the association never work. The chrony server never
get the "reach" state and the reach counter is stuck a zero.
Have you tried the same configuration and the timing of restarts,
between two ntpd servers? I suspect you would see some of the issues
in this case too.

There are probably multiple issues involved, which make it difficult
to see what's going on. I'm aware of the following:

- ntpd doesn't accept packets from peers that are not synchronized
(yet), so peers have to be configured with other sources in order
for the symmetric association (in both basic and interleaved modes)
to start. See https://bugs.ntp.org/show_bug.cgi?id=3445.
- interleaved mode in ntpd works only when the peers use the same
polling interval. If they have the same minpoll and maxpoll, but
minpoll != maxpoll, they should in theory both get to the maxpoll
if the association doesn't work, but there may be a bug that
prevents that.
- chrony switches to the basic mode when the polling intervals don't
match, but ntpd doesn't accept responses in the basic mode if the
interleaved mode is enabled
Post by FUSTE Emmanuel
chrony 3.2
ntp-4.2.8p8, ntp-4.2.8p10
Could I normally expect xleave interoperability between chrony and ntpd
or it is something too much "implementation specific" ?
With the current versions, if you can avoid the issue with
unsynchronized sources, they should interoperate, at least when their
polling intervals match. If it doesn't work for you, I'd like to see a
tcpdump output.

Please note that the symmetric mode has some security issues and it's
generally recommended to use the client/server mode instead. Even if
authentication is enabled, it is possible to break a symmetric
association by replaying old packets. (chrony has a partial protection
against this attack, but it works only in the basic mode when the
polling intervals match and there are no packets with timestamps from
future that could be replayed. It's too fragile, don't rely on it!)

It is possible that support for symmetric associations will be dropped
from chrony in future.
--
Miroslav Lichvar
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
FUSTE Emmanuel
2018-01-23 13:44:56 UTC
Permalink
Post by Miroslav Lichvar
Post by FUSTE Emmanuel
When I try to do the same with ntpd on one side and chrony on the other,
things go bad.
At best, chrony got a working association with interleave status with
very long response time.
A long response time up to the polling interval of the peer is normal
in symmetric associations.
Post by FUSTE Emmanuel
On the ntpd side, the association never work. The chrony server never
get the "reach" state and the reach counter is stuck a zero.
Have you tried the same configuration and the timing of restarts,
between two ntpd servers? I suspect you would see some of the issues
in this case too.
There are probably multiple issues involved, which make it difficult
- ntpd doesn't accept packets from peers that are not synchronized
(yet), so peers have to be configured with other sources in order
for the symmetric association (in both basic and interleaved modes)
to start. See https://bugs.ntp.org/show_bug.cgi?id=3445.
- interleaved mode in ntpd works only when the peers use the same
polling interval. If they have the same minpoll and maxpoll, but
minpoll != maxpoll, they should in theory both get to the maxpoll
if the association doesn't work, but there may be a bug that
prevents that.
- chrony switches to the basic mode when the polling intervals don't
match, but ntpd doesn't accept responses in the basic mode if the
interleaved mode is enabled
Post by FUSTE Emmanuel
chrony 3.2
ntp-4.2.8p8, ntp-4.2.8p10
Could I normally expect xleave interoperability between chrony and ntpd
or it is something too much "implementation specific" ?
With the current versions, if you can avoid the issue with
unsynchronized sources, they should interoperate, at least when their
polling intervals match. If it doesn't work for you, I'd like to see a
tcpdump output.
Ok. I fixed min/max polling interval to 5 for testing purpose.
Then I first restarted chrony. Wait for it to sync on a online source.
Then restarted ntp and take capture.
Will send you all the datas

NTP is stuck in unreachable state
Chrony is stuck with only one valid RX.
Post by Miroslav Lichvar
Please note that the symmetric mode has some security issues and it's
generally recommended to use the client/server mode instead. Even if
authentication is enabled, it is possible to break a symmetric
association by replaying old packets. (chrony has a partial protection
against this attack, but it works only in the basic mode when the
polling intervals match and there are no packets with timestamps from
future that could be replayed. It's too fragile, don't rely on it!)
Yes I know. It is only used on "trusted" lan segments and/or to try to
inter-operate with ntpd xleave.
Post by Miroslav Lichvar
It is possible that support for symmetric associations will be dropped
from chrony in future.
I only using it to transition from ntpd to chrony. So It will not be missed.
I hope my clock vendor will sometime transition from ntpd to something
else (chrony) to get good xleave support (and much more).
At most, I mainly use theses clocks with PTP so the NTP part only affect
fail-over scenarios.

Emmanuel.
��칻�&�zf���k�|�������z�\��'�۱}���*+����칻�&ފ{az˛��-��zZ^���r��+�z�+z����!�����_jh�ʊ��+a��i�{az˛��-
Miroslav Lichvar
2018-01-23 15:58:06 UTC
Permalink
Post by FUSTE Emmanuel
Post by Miroslav Lichvar
With the current versions, if you can avoid the issue with
unsynchronized sources, they should interoperate, at least when their
polling intervals match. If it doesn't work for you, I'd like to see a
tcpdump output.
Ok. I fixed min/max polling interval to 5 for testing purpose.
Then I first restarted chrony. Wait for it to sync on a online source.
Then restarted ntp and take capture.
Will send you all the datas
NTP is stuck in unreachable state
Chrony is stuck with only one valid RX.
Ok. I can reproduce this problem. It seems ntpd doesn't update its
state in the interleaved mode when it receives a packet with an
unexpected origin timestamp. There was a similar issue fixed for the
basic mode few ntp releases ago:
https://bugs.ntp.org/show_bug.cgi?id=2952

As chronyd doesn't switch to the interleaved mode until it's receiving
valid responses and ntpd doesn't accept responses in the basic mode,
they are stuck waiting forever on each other.

A similar thing seem to happen when trying to use the interleaved mode
between two 4.2.8p10 ntpds. You said it worked for you before, so I
assume one of the ntpds was an older version which didn't have this
bug?
--
Miroslav Lichvar
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
FUSTE Emmanuel
2018-01-23 16:42:22 UTC
Permalink
Post by Miroslav Lichvar
Post by FUSTE Emmanuel
Post by Miroslav Lichvar
With the current versions, if you can avoid the issue with
unsynchronized sources, they should interoperate, at least when their
polling intervals match. If it doesn't work for you, I'd like to see a
tcpdump output.
Ok. I fixed min/max polling interval to 5 for testing purpose.
Then I first restarted chrony. Wait for it to sync on a online source.
Then restarted ntp and take capture.
Will send you all the datas
NTP is stuck in unreachable state
Chrony is stuck with only one valid RX.
Ok. I can reproduce this problem. It seems ntpd doesn't update its
state in the interleaved mode when it receives a packet with an
unexpected origin timestamp. There was a similar issue fixed for the
https://bugs.ntp.org/show_bug.cgi?id=2952
As chronyd doesn't switch to the interleaved mode until it's receiving
valid responses and ntpd doesn't accept responses in the basic mode,
they are stuck waiting forever on each other.
OK !
Post by Miroslav Lichvar
A similar thing seem to happen when trying to use the interleaved mode
between two 4.2.8p10 ntpds. You said it worked for you before, so I
assume one of the ntpds was an older version which didn't have this
bug?
I have a platform with tree ntpds in interleaved mode
Was on 2.4.8p8.
Were upgraded today to 2.4.8p10 and are still working properly.
As in this case i use authent I added authent to the test platform.
Mutual auth validate but the two get stuck as before.

Leap status : Not synchronised
Version : 4
Mode : Symmetric active
Stratum : 0
Poll interval : 5 (32 seconds)
Precision : -24 (0.000000060 seconds)
Root delay : 0.000000 seconds
Root dispersion : 0.000656 seconds
Reference ID : 494E4954 (INIT)
Reference time : Thu Jan 01 00:00:00 1970
Offset : +0.000000000 seconds
Peer delay : 0.000000000 seconds
Peer dispersion : 0.000000000 seconds
Response time : 0.000000000 seconds
Jitter asymmetry: +0.00
NTP tests : 111 101 0000
Interleaved : Yes
Authenticated : Yes
TX timestamping : Hardware
RX timestamping : Hardware
Total TX : 17
Total RX : 18
Total valid RX : 2

ssocid=3540 status=e011 conf, authenb, auth, sel_reject, 1 event, mobilize,
srcadr=y.y.y.y, srcport=123, dstadr=x.x.x.x,
dstport=123, leap=11, stratum=16, precision=-24, rootdelay=0.000,
rootdisp=0.000, refid=INIT,
reftime=00000000.00000000 Thu, Feb 7 2036 7:28:16.000,
rec=de11e02d.60d2f07f Tue, Jan 23 2018 17:24:13.378, reach=000,
unreach=10, hmode=1, pmode=0, hpoll=5, ppoll=5, headway=17,
flash=1606 pkt_bogus, pkt_unsync, peer_stratum, peer_dist, peer_unreach,
keyid=1, offset=0.000, delay=0.000, dispersion=15937.500, jitter=0.000,
xleave=0.028,
filtdelay= 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00,
filtoffset= 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00,
filtdisp= 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0


Emmanuel.��칻�&�zf���k�|�������z�\��'�۱}���*+����칻�&ފ{az˛��-��zZ^���r��+�z�+z����!�����_jh�ʊ��+a��i�{az˛��-
Miroslav Lichvar
2018-01-24 12:45:56 UTC
Permalink
Post by FUSTE Emmanuel
Post by Miroslav Lichvar
A similar thing seem to happen when trying to use the interleaved mode
between two 4.2.8p10 ntpds. You said it worked for you before, so I
assume one of the ntpds was an older version which didn't have this
bug?
I have a platform with tree ntpds in interleaved mode
Was on 2.4.8p8.
Were upgraded today to 2.4.8p10 and are still working properly.
You are right. My test was bad (it hit the bug with unsynchronized
source).

The bug in the interleaved mode is a bit more subtle. The state is
updated from received packet, but only when one of the timestamps is
zero (i.e. it's the first packet of the association). This means two
ntpd 4.2.8p10 can interoperate, but I suspect the association will not
recover if there is a mismatch between the receive timestamps.

I'll send a bug report to the ntp maintainers.

In the meantime, if you are willing to patch ntp, this should fix it:

diff -up ntp-4.2.8p10/ntpd/ntp_proto.c.orig ntp-4.2.8p10/ntpd/ntp_proto.c
--- ntp-4.2.8p10/ntpd/ntp_proto.c.orig 2018-01-24 13:35:16.611488502 +0100
+++ ntp-4.2.8p10/ntpd/ntp_proto.c 2018-01-24 13:35:24.113505866 +0100
@@ -1774,7 +1774,6 @@ receive(
peer->bogusorg++;
peer->flags |= FLAG_XBOGUS;
peer->flash |= TEST2; /* bogus */
- return; /* Bogus packet, we are done */
}
--
Miroslav Lichvar
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
FUSTE Emmanuel
2018-01-24 15:23:47 UTC
Permalink
Post by Miroslav Lichvar
Post by FUSTE Emmanuel
Post by Miroslav Lichvar
A similar thing seem to happen when trying to use the interleaved mode
between two 4.2.8p10 ntpds. You said it worked for you before, so I
assume one of the ntpds was an older version which didn't have this
bug?
I have a platform with tree ntpds in interleaved mode
Was on 2.4.8p8.
Were upgraded today to 2.4.8p10 and are still working properly.
You are right. My test was bad (it hit the bug with unsynchronized
source).
The bug in the interleaved mode is a bit more subtle. The state is
updated from received packet, but only when one of the timestamps is
zero (i.e. it's the first packet of the association). This means two
ntpd 4.2.8p10 can interoperate, but I suspect the association will not
recover if there is a mismatch between the receive timestamps.
I'll send a bug report to the ntp maintainers.
diff -up ntp-4.2.8p10/ntpd/ntp_proto.c.orig ntp-4.2.8p10/ntpd/ntp_proto.c
--- ntp-4.2.8p10/ntpd/ntp_proto.c.orig 2018-01-24 13:35:16.611488502 +0100
+++ ntp-4.2.8p10/ntpd/ntp_proto.c 2018-01-24 13:35:24.113505866 +0100
@@ -1774,7 +1774,6 @@ receive(
peer->bogusorg++;
peer->flags |= FLAG_XBOGUS;
peer->flash |= TEST2; /* bogus */
- return; /* Bogus packet, we are done */
}
Yes it work !

Thank you.
Emmanuel.
��{.n�+������r��+�z�+z����!�����_jh�ʊ��+a�{.n�+�����^���y�E��^���j)\��'����ު笵�k�|��ښ)r��0��azZb��^���y�
Rob Janssen
2018-01-24 16:49:01 UTC
Permalink
Post by Miroslav Lichvar
The bug in the interleaved mode is a bit more subtle. The state is
updated from received packet, but only when one of the timestamps is
zero (i.e. it's the first packet of the association). This means two
ntpd 4.2.8p10 can interoperate, but I suspect the association will not
recover if there is a mismatch between the receive timestamps.
I have seen problems like that, and stopped using symmetric peering.
As far as I know, just declaring "server" in each direction works OK (there is loop-detection code)
and appears a lot more stable.  Probably and debugged tested better.

Rob
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Miroslav Lichvar
2018-01-24 17:08:42 UTC
Permalink
Post by Rob Janssen
Post by Miroslav Lichvar
The bug in the interleaved mode is a bit more subtle. The state is
updated from received packet, but only when one of the timestamps is
zero (i.e. it's the first packet of the association). This means two
ntpd 4.2.8p10 can interoperate, but I suspect the association will not
recover if there is a mismatch between the receive timestamps.
I have seen problems like that, and stopped using symmetric peering.
As far as I know, just declaring "server" in each direction works OK (there is loop-detection code)
and appears a lot more stable.  Probably and debugged tested better.
Yes, the complexity of the symmetric mode is ridiculous when compared
to the client/server mode.

As far as I know the only good use case for the symmetric mode is that
it can be used to push time to a server if it supports ephemeral
associations (chrony does not). I have some stratum-1 servers which
are behind NAT and their address is dynamic, and also some public
servers that are synchronized to them. If the public servers accepted
ephemeral assocations, they could be specified as peers on the
stratum-1 servers and it would work without forwarding ports on the
router and updating a DNS record with the dynamic IP.
--
Miroslav Lichvar
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
FUSTE Emmanuel
2018-01-23 16:50:22 UTC
Permalink
Post by Miroslav Lichvar
Post by FUSTE Emmanuel
Post by Miroslav Lichvar
With the current versions, if you can avoid the issue with
unsynchronized sources, they should interoperate, at least when their
polling intervals match. If it doesn't work for you, I'd like to see a
tcpdump output.
Ok. I fixed min/max polling interval to 5 for testing purpose.
Then I first restarted chrony. Wait for it to sync on a online source.
Then restarted ntp and take capture.
Will send you all the datas
NTP is stuck in unreachable state
Chrony is stuck with only one valid RX.
Ok. I can reproduce this problem. It seems ntpd doesn't update its
state in the interleaved mode when it receives a packet with an
unexpected origin timestamp. There was a similar issue fixed for the
https://bugs.ntp.org/show_bug.cgi?id=2952
As chronyd doesn't switch to the interleaved mode until it's receiving
valid responses and ntpd doesn't accept responses in the basic mode,
they are stuck waiting forever on each other.
A similar thing seem to happen when trying to use the interleaved mode
between two 4.2.8p10 ntpds. You said it worked for you before, so I
assume one of the ntpds was an older version which didn't have this
bug?
Here are data from the working 4.2.8p10 platform which is composed by
w.w.w.w, y.y.y.y, z.z.z.z

ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 29450  f414   yes   yes   ok  candidate   reachable  1
  2 29451  f414   yes   yes   ok  candidate   reachable  1
  3 29452  f31f   yes   yes   ok    outlier              1
  4 29453  961a   yes   yes  none  sys.peer    sys_peer  1
  5 29454  931d   yes   yes  none   outlier              1
ntpq> lpe
     remote           refid      st t when poll reach   delay offset 
jitter
==============================================================================
+x.x.x.x             .MRS.            1 u    5    8  377    0.363
0.038   0.030
+y.y.y.y              .PTP0.           1 s   25   64  377 0.071   
0.017   0.035
-z.z.z.z              .PTP0.           1 s   45   64  376 0.058   
0.041   0.044
*SHM(0)          .PTP0.           0 l    2    8  377    0.000 -0.017   0.005
-ntp-gps-1.thale .GPS.            1 u    4    8  377    5.031 -0.435   0.020
ntpq> rv 29451
associd=29451 status=f414 conf, authenb, auth, reach, sel_candidate, 1
event, reachable,
srcadr=y.y.y.y, srcport=123, dstadr=w.w.w.w,
dstport=123, leap=00, stratum=1, precision=-23, rootdelay=0.000,
rootdisp=1.099, refid=PTP0,
reftime=de11e3d4.1850d73b  Tue, Jan 23 2018 17:39:48.094,
rec=de11e3db.18563cd1  Tue, Jan 23 2018 17:39:55.095, reach=376,
unreach=0, hmode=1, pmode=1, hpoll=6, ppoll=6, headway=51, flash=00 ok,
keyid=112, offset=0.017, delay=0.071, dispersion=1.719, jitter=0.035,
xleave=0.024,
filtdelay=     0.09    0.10    0.07    0.12    0.13    0.11 0.11    0.16,
filtoffset=   -0.01   -0.02    0.02    0.06    0.05   -0.01 -0.04    0.00,
filtdisp=      0.00    0.96    1.95    2.94    3.90    4.89 5.88    6.86
ntpq> rv 29452
associd=29452 status=f31f conf, authenb, auth, reach, sel_outlier, 1
event, interleave_error,
srcadr=z.z.z.z, srcport=123, dstadr=w.w.w.w,
dstport=123, leap=00, stratum=1, precision=-23, rootdelay=0.000,
rootdisp=1.099, refid=PTP0,
reftime=de11e4c0.a5c3751c  Tue, Jan 23 2018 17:43:44.647,
rec=de11e4c7.a5ca043a  Tue, Jan 23 2018 17:43:51.647, reach=377,
unreach=0, hmode=1, pmode=1, hpoll=6, ppoll=6, headway=13, flash=00 ok,
keyid=113, offset=0.041, delay=0.058, dispersion=5.542, jitter=0.062,
xleave=0.014,
filtdelay=     0.11    0.14    0.11    0.11    0.10    0.08 0.06    0.08,
filtoffset=    0.03   -0.05   -0.02   -0.02   -0.03   -0.02 0.04    0.09,
filtdisp=      0.00    0.98    1.92    2.87    3.84    4.83 5.78    6.75

Emmanuel.��칻�&�zf���k�|�������z�\��'�۱}���*+����칻�&ފ{az˛��-��zZ^���r��+�z�+z����!�����_jh�ʊ��+a��i�{az˛��-
Loading...