Discussion:
[chrony-users] Two nearly identical boxes behaving differently
Paul J R
2017-11-09 16:39:13 UTC
Permalink
Hi all,

Using chrony 3.1 with gps and pps.

i've got two little routers (AR9331 based boxes) that are identical but
for two things:

1) different ip addresses/hostname

2) different gps chips (ones a Quectel L80, the other is the MTK3339)

They're running an openwrt firmware that i've custom compiled for my
needs but they're both having issues I just cant figure out.

The config on both is identical:

    refclock SHM 0 offset 0.395 delay 0.2 refid NMEA noselect
    refclock PPS /dev/pps0 refid PPS poll 4 prefer lock NMEA
    server 10.69.69.1

One cant seem to poll pps:

***@ntp03:~# chronyc sources
210 Number of sources = 3
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#? NMEA                          0   4   377    10 -264ms[ -264ms] +/- 
110ms
#? PPS                           0   4     0     - +0ns[   +0ns] +/-    0ns
^* 10.69.69.1                    2   6   177    46 -4689ns[  -14us]
+/-   92ms

And the other cant poll the server:

***@ntp04:~# chronyc sources
210 Number of sources = 3
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#? NMEA                          0   4   377    19    -24ms[ -24ms] +/- 
105ms
#* PPS                           0   4   277    18    +39ns[ -5ns] +/-  
56ns
^? 10.69.69.1                    0   7   377     -     +0ns[ +0ns]
+/-    0ns

Both can happily ping the server (which is actually the gateway), both
seem to be able to poll the gpsd, both are connected to the same network
and are even in the same switch, and both have a working pps line:

***@ntp03:~# ppstest /dev/pps0

trying PPS source "/dev/pps0"
found PPS source "/dev/pps0"
ok, found 1 source(s), now start fetching data...
source 0 - assert 1510241218.002956935, sequence: 1525 - clear
0.000000000, sequence: 0
source 0 - assert 1510241219.002954396, sequence: 1526 - clear
0.000000000, sequence: 0

***@ntp04:~# ppstest /dev/pps0
trying PPS source "/dev/pps0"
found PPS source "/dev/pps0"
ok, found 1 source(s), now start fetching data...
source 0 - assert 1510241194.000000503, sequence: 1490 - clear
0.000000000, sequence: 0
source 0 - assert 1510241195.000000044, sequence: 1491 - clear
0.000000000, sequence: 0

But I just cant seem to find a reason why they're both showing different
problems (thought the issue polling the server may be the server itself
not responding to one of them as other servers do work with both).

I should also say that the kernel is based on my own pps code which is
in turn based on the gpio interupt code for the AR9331, though as yet i
cant find a fault with either of those patches.

Any hints anyone can give me on how to debug this would be really handy?
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Miroslav Lichvar
2017-11-09 17:16:39 UTC
Permalink
Post by Paul J R
    refclock SHM 0 offset 0.395 delay 0.2 refid NMEA noselect
    refclock PPS /dev/pps0 refid PPS poll 4 prefer lock NMEA
    server 10.69.69.1
210 Number of sources = 3
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#? NMEA                          0   4   377    10 -264ms[ -264ms] +/- 
110ms
#? PPS                           0   4     0     - +0ns[   +0ns] +/-    0ns
^* 10.69.69.1                    2   6   177    46 -4689ns[  -14us] +/-  
92ms
The offset of the NMEA source is probably too large to be locked with
the PPS. You will need to adjust the offset value on the SHM line to
bring it closer to UTC and allow PPS to lock.
Post by Paul J R
210 Number of sources = 3
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#? NMEA                          0   4   377    19    -24ms[ -24ms] +/- 
105ms
#* PPS                           0   4   277    18    +39ns[ -5ns] +/-  
56ns
^? 10.69.69.1                    0   7   377     -     +0ns[ +0ns] +/-   
0ns
As a first step, it would be good to confirm that the server is
responding. tcpdump-mini would be a good tool for that. :) If it does
respond, then a second step would be to recompile chrony with
debugging enabled, run chronyd with -d -d and see if there is
anything interesting.
--
Miroslav Lichvar
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Paul J R
2017-11-14 07:07:49 UTC
Permalink
Post by Miroslav Lichvar
Post by Paul J R
    refclock SHM 0 offset 0.395 delay 0.2 refid NMEA noselect
    refclock PPS /dev/pps0 refid PPS poll 4 prefer lock NMEA
    server 10.69.69.1
210 Number of sources = 3
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#? NMEA                          0   4   377    10 -264ms[ -264ms] +/-
110ms
#? PPS                           0   4     0     - +0ns[   +0ns] +/-    0ns
^* 10.69.69.1                    2   6   177    46 -4689ns[  -14us] +/-
92ms
The offset of the NMEA source is probably too large to be locked with
the PPS. You will need to adjust the offset value on the SHM line to
bring it closer to UTC and allow PPS to lock.
Indeed that did turn out to be the issue, thanks for the pointer, hadn't
considered that. Haven't really tried too many gps/pps implementations
with chrony so far so somewhat new territory for me. Having said that,
this is the first one i've built with and mtk3339 chip in it as well.
Thanks for the pointer!
Post by Miroslav Lichvar
Post by Paul J R
210 Number of sources = 3
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#? NMEA                          0   4   377    19    -24ms[ -24ms] +/-
105ms
#* PPS                           0   4   277    18    +39ns[ -5ns] +/-
56ns
^? 10.69.69.1                    0   7   377     -     +0ns[ +0ns] +/-
0ns
As a first step, it would be good to confirm that the server is
responding. tcpdump-mini would be a good tool for that. :) If it does
respond, then a second step would be to recompile chrony with
debugging enabled, run chronyd with -d -d and see if there is
anything interesting.
I can only guess when i soldered in the gps board i've damaged the
router in some spectacularly unique way. Even down to the bit level, the
boxes are identical. I added a few more time sources and one has issues
with some time servers (but not all), the other works without issue. Did
a mirror port on the switch its connected to and data looks fine,
switched their configs around, changed the port they're plugged into,
tested their network latency and bandwidth, everything checks out. Last
step is to recompile the firmware with debug on.

Running them with only gps/pps time sources and pointing clients at
them, they perform almost the same too, so why one refuses to talk to
some time sources and the other is working ok is quite mystifying and
hopefully debug might tell me a few things.

Thanks for the reply!
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Loading...