Discussion:
[chrony-users] Monitoring Chrony
Ben Kochie
2016-02-11 09:54:49 UTC
Permalink
Hey,

I'm a systems engineer that is a contributor to the Prometheus monitoring
system. I also maintain the servers for my company.

I've been following various ntpd replacement projects and I'm pretty
impressed with the progress of Chrony.

One of the things I would need to do in order to replace our existing
monitoring of ntpd. We currently parse the output of `ntpq -np` in order
to generate metrics.

Prometheus[0] uses a simple metric+labels combination format, similar
things like OpenTSDB.

Here's an example of what `ntpq -np` turns into:

# TYPE node_ntpd_delay_milliseconds gauge
node_ntpd_delay_milliseconds{remote="130.149.17.8"} 17.092
node_ntpd_delay_milliseconds{remote="193.190.230.65"} 4.937
node_ntpd_delay_milliseconds{remote="82.95.215.61"} 11.726
# TYPE node_ntpd_jitter_milliseconds gauge
node_ntpd_jitter_milliseconds{remote="130.149.17.8"} 0.494
node_ntpd_jitter_milliseconds{remote="193.190.230.65"} 0.770
node_ntpd_jitter_milliseconds{remote="82.95.215.61"} 0.722
# TYPE node_ntpd_offset_milliseconds gauge
node_ntpd_offset_milliseconds{remote="130.149.17.8"} 1.675
node_ntpd_offset_milliseconds{remote="193.190.230.65"} 0.135
node_ntpd_offset_milliseconds{remote="82.95.215.61"} -0.645
# TYPE node_ntpd_peer_status gauge
node_ntpd_peer_status{remote="130.149.17.8",reference=".GPS.",stratum="1",type="unicast"}
3
node_ntpd_peer_status{remote="193.190.230.65",reference=".MRS.",stratum="1",type="unicast"}
4
node_ntpd_peer_status{remote="82.95.215.61",reference=".PPS.",stratum="1",type="unicast"}
6

This allows us to keep running timeseries metrics for peers, and write
rules for things like "node_ntpd_peer_status < 4" to find unsynced servers.
See here[1] for the code to status value map.

The above metrics are generated by a bash script, which works but isn't my
favorite way to deal with getting metrics from software.

So far, I haven't been able to find a good programmatic way to extract
stats with chronyc. There are a bunch of annoying parsing issues with
things like the sourcestats command. The offset includes a precision, so I
have to parse the precision and convert that to be all in one precision. I
haven't seen much documentation on the protocol between chronyc and chronyd.

A couple of specific questions.
* Would chrony be interested in supporting the Prometheus metrics format?
* Is there a mode for the various metrics outputs to be more machine
readable? (json?)
* Is there documentation for the chronyc protocol outside the code?
* Are there any non-C chronyc client implementations? (python/ruby/whatever)

[0]: http://prometheus.io/
[1]: https://www.eecis.udel.edu/~mills/ntp/html/decode.html#peer

- Ben Kochie
Bryan Christianson
2016-02-11 10:35:30 UTC
Permalink
So far, I haven't been able to find a good programmatic way to extract stats with chronyc. There are a bunch of annoying parsing issues with things like the sourcestats command. The offset includes a precision, so I have to parse the precision and convert that to be all in one precision. I haven't seen much documentation on the protocol between chronyc and chronyd.
Take a look at the chronyd log files. The data is more amenable to machine reading than the chronyc output.
- Ben Kochie
Bryan Christianson
***@whatroute.net
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Ben Kochie
2016-02-11 11:14:32 UTC
Permalink
Log parsing is possible, there is a tool called mtail that can parse logs
to collect metrics. It is generally recommend to ask services directly
about their current state rather than regexping logs.

This way the stats are read in a more on-demand way.
Post by Ben Kochie
Post by Ben Kochie
So far, I haven't been able to find a good programmatic way to extract
stats with chronyc. There are a bunch of annoying parsing issues with
things like the sourcestats command. The offset includes a precision, so I
have to parse the precision and convert that to be all in one precision. I
haven't seen much documentation on the protocol between chronyc and chronyd.
Take a look at the chronyd log files. The data is more amenable to machine
reading than the chronyc output.
Post by Ben Kochie
- Ben Kochie
Bryan Christianson
--
with "unsubscribe" in the subject.
with "help" in the subject.
Miroslav Lichvar
2016-02-11 12:58:21 UTC
Permalink
Post by Ben Kochie
So far, I haven't been able to find a good programmatic way to extract
stats with chronyc. There are a bunch of annoying parsing issues with
things like the sourcestats command. The offset includes a precision, so I
have to parse the precision and convert that to be all in one precision.
Yeah, I've struggled with that too. I like the human readable format
when inspecting the chrony state, but it does complicate parsing quite
a bit.
Post by Ben Kochie
A couple of specific questions.
* Would chrony be interested in supporting the Prometheus metrics format?
I looked at the page describing the archicture, but it's not clear to
me how would a support in chrony look like. Would chronyd or something
using the chronyc protocol be listening on a port for requests? Or
would it periodically push data over socket somewhere? The page
listing client libraries does't include a C library.
Post by Ben Kochie
* Is there a mode for the various metrics outputs to be more machine
readable? (json?)
No, not yet. I'd like to add a raw mode to chronyc that would print
the values in something easily parseable. I'm not sure about json, I'd
probably prefer something usable even from shell using just sed or
awk.
Post by Ben Kochie
* Is there documentation for the chronyc protocol outside the code?
No, unfortunately not. FWIW, the protocol is quite simple, almost all
information you would need to implement a new client is contained in
candm.h.
Post by Ben Kochie
* Are there any non-C chronyc client implementations? (python/ruby/whatever)
Probably not, at least I've not seen anything. At some point I'd like
to split chronyc into a library and a client application. Bindings for
other languages could then be easily created.
--
Miroslav Lichvar
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Ben Kochie
2016-02-11 13:12:36 UTC
Permalink
Post by Miroslav Lichvar
Post by Ben Kochie
So far, I haven't been able to find a good programmatic way to extract
stats with chronyc. There are a bunch of annoying parsing issues with
things like the sourcestats command. The offset includes a precision,
so I
Post by Ben Kochie
have to parse the precision and convert that to be all in one precision.
Yeah, I've struggled with that too. I like the human readable format
when inspecting the chrony state, but it does complicate parsing quite
a bit.
Post by Ben Kochie
A couple of specific questions.
* Would chrony be interested in supporting the Prometheus metrics format?
I looked at the page describing the archicture, but it's not clear to
me how would a support in chrony look like. Would chronyd or something
using the chronyc protocol be listening on a port for requests? Or
would it periodically push data over socket somewhere? The page
listing client libraries does't include a C library.
Typically we do this one of a few ways.
#1 - The application listens on a port for http requests, the default is
/metrics. It then can respond with plain/text in the format I posted
above. Or it will content negotiate and use grpc, a nice compact protobuf
format. The grpc format is the most efficient, but we've had few problems
collecting text metrics at scale.

#2 - We run a side-car exporter. We do this quite a lot for existing open
source software, like mysql, that would never listen on http, but can
provide metrics with their own protocol.

#3 - The way we collect metrics for ntpd, is we have a loop script, or cron
script, that parse output and put that output in prometheus format into a
text file. Then we access these metrics via the node_exporter's textfile
reader.

#4 - We use something like mtail[0] and parse log files. This is what I do
for things like apache[1] that have minimal useful internal metrics.

[0]: https://github.com/google/mtail
[1]:
https://github.com/google/mtail/blob/master/examples/apache_metrics.mtail
Post by Miroslav Lichvar
Post by Ben Kochie
* Is there a mode for the various metrics outputs to be more machine
readable? (json?)
No, not yet. I'd like to add a raw mode to chronyc that would print
the values in something easily parseable. I'm not sure about json, I'd
probably prefer something usable even from shell using just sed or
awk.
One idea I had would be to add a "metrics" command to chronyc. Then you
could run a loop/cron job that would be basically "chronyc metrics >
chrony_metrics.prom"

The output format would be sed/awk friendly as you always get one metric
key and value per line.
Post by Miroslav Lichvar
Post by Ben Kochie
* Is there documentation for the chronyc protocol outside the code?
No, unfortunately not. FWIW, the protocol is quite simple, almost all
information you would need to implement a new client is contained in
candm.h.
Ok, I will take a look.
Post by Miroslav Lichvar
Post by Ben Kochie
* Are there any non-C chronyc client implementations?
(python/ruby/whatever)
Probably not, at least I've not seen anything. At some point I'd like
to split chronyc into a library and a client application. Bindings for
other languages could then be easily created.
This would be pretty nice.
Post by Miroslav Lichvar
--
Miroslav Lichvar
--
with "unsubscribe" in the subject.
with "help" in the subject.
Miroslav Lichvar
2016-02-12 09:05:28 UTC
Permalink
Post by Ben Kochie
Post by Miroslav Lichvar
I looked at the page describing the archicture, but it's not clear to
me how would a support in chrony look like. Would chronyd or something
using the chronyc protocol be listening on a port for requests? Or
would it periodically push data over socket somewhere? The page
listing client libraries does't include a C library.
Typically we do this one of a few ways.
#2 - We run a side-car exporter. We do this quite a lot for existing open
source software, like mysql, that would never listen on http, but can
provide metrics with their own protocol.
This one seems most reasonable to me. A separate service that uses the
chronyc protocol to read the metrics from chronyd.
Post by Ben Kochie
#3 - The way we collect metrics for ntpd, is we have a loop script, or cron
script, that parse output and put that output in prometheus format into a
text file. Then we access these metrics via the node_exporter's textfile
reader.
This is probably the easiest way :).
Post by Ben Kochie
#4 - We use something like mtail[0] and parse log files. This is what I do
for things like apache[1] that have minimal useful internal metrics.
The chrony logs are good in showing when exactly has the state
changed, but if you are interested in metrics like root dispersion,
which are constantly changing (in a deterministic way), you would have
to calculate their current value.
Post by Ben Kochie
Post by Miroslav Lichvar
Post by Ben Kochie
* Is there a mode for the various metrics outputs to be more machine
readable? (json?)
No, not yet. I'd like to add a raw mode to chronyc that would print
the values in something easily parseable. I'm not sure about json, I'd
probably prefer something usable even from shell using just sed or
awk.
One idea I had would be to add a "metrics" command to chronyc. Then you
could run a loop/cron job that would be basically "chronyc metrics >
chrony_metrics.prom"
Which metrics it would print? With the "clients" command for instance
there can megabytes of data, which in most cases probably wouldn't be
useful to collect, but in some cases I think it might, e.g. monitoring
if clients are alive from the server in a small network.
Post by Ben Kochie
The output format would be sed/awk friendly as you always get one metric
key and value per line.
If there was just one key/value per line, wouldn't it be more
difficult for a simple sed/awk parser to group data by source, as in
sourcestats?

I was considering something like CSV, which can be parsed in shell
with a single "read" command and can be easily converted to more
verbose formats like json.

$ chronyc -r tracking
#refid,address,stratum,...
10.16.255.1,10.16.255.1,2,...

$ chronyc -r sources | grep -v '^#' | while IFS=, read mode state ...
do
echo $mode $state ...
done
--
Miroslav Lichvar
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Ben Kochie
2016-02-27 14:08:12 UTC
Permalink
So I started work on adding a "metrics" command to client.c. It's pretty
hacky, but works.

https://github.com/SuperQ/chrony/pull/1

Comments welcome.

- Ben Kochie
Post by Miroslav Lichvar
Post by Ben Kochie
Post by Miroslav Lichvar
I looked at the page describing the archicture, but it's not clear to
me how would a support in chrony look like. Would chronyd or something
using the chronyc protocol be listening on a port for requests? Or
would it periodically push data over socket somewhere? The page
listing client libraries does't include a C library.
Typically we do this one of a few ways.
#2 - We run a side-car exporter. We do this quite a lot for existing
open
Post by Ben Kochie
source software, like mysql, that would never listen on http, but can
provide metrics with their own protocol.
This one seems most reasonable to me. A separate service that uses the
chronyc protocol to read the metrics from chronyd.
Post by Ben Kochie
#3 - The way we collect metrics for ntpd, is we have a loop script, or
cron
Post by Ben Kochie
script, that parse output and put that output in prometheus format into a
text file. Then we access these metrics via the node_exporter's textfile
reader.
This is probably the easiest way :).
Post by Ben Kochie
#4 - We use something like mtail[0] and parse log files. This is what I
do
Post by Ben Kochie
for things like apache[1] that have minimal useful internal metrics.
The chrony logs are good in showing when exactly has the state
changed, but if you are interested in metrics like root dispersion,
which are constantly changing (in a deterministic way), you would have
to calculate their current value.
Post by Ben Kochie
Post by Miroslav Lichvar
Post by Ben Kochie
* Is there a mode for the various metrics outputs to be more machine
readable? (json?)
No, not yet. I'd like to add a raw mode to chronyc that would print
the values in something easily parseable. I'm not sure about json, I'd
probably prefer something usable even from shell using just sed or
awk.
One idea I had would be to add a "metrics" command to chronyc. Then you
could run a loop/cron job that would be basically "chronyc metrics >
chrony_metrics.prom"
Which metrics it would print? With the "clients" command for instance
there can megabytes of data, which in most cases probably wouldn't be
useful to collect, but in some cases I think it might, e.g. monitoring
if clients are alive from the server in a small network.
Post by Ben Kochie
The output format would be sed/awk friendly as you always get one metric
key and value per line.
If there was just one key/value per line, wouldn't it be more
difficult for a simple sed/awk parser to group data by source, as in
sourcestats?
I was considering something like CSV, which can be parsed in shell
with a single "read" command and can be easily converted to more
verbose formats like json.
$ chronyc -r tracking
#refid,address,stratum,...
10.16.255.1,10.16.255.1,2,...
$ chronyc -r sources | grep -v '^#' | while IFS=, read mode state ...
do
echo $mode $state ...
done
--
Miroslav Lichvar
--
with "unsubscribe" in the subject.
with "help" in the subject.
Miroslav Lichvar
2016-02-29 09:10:14 UTC
Permalink
(this discussion would better fit the chrony-devel list)
Post by Ben Kochie
So I started work on adding a "metrics" command to client.c. It's pretty
hacky, but works.
https://github.com/SuperQ/chrony/pull/1
Comments welcome.
Ok, so you implemented the metrics command as a new function which
does the same as the serverstats command, but uses a different output
format. I assume you would extend it later to include also the
tracking, sources and sourcestats data. That would be a lot of
duplicated code.

As I said in the previous mail, I'd rather see it implemented as a
different output format for the existing commands. A new chronyc
option could be added to select the format, with default being the
currently used human-readable output. A new printf-like function would
be added, which would support printing hostnames or IP addresses, time
intervals, offsets, and all other data that need to be printed.
Depending on what output mode chronyc was running in, it would print
the labels, align the columns, print the values with units, print end
of lines, etc. All functions that implement the individual commands
would then be modified to use this new function.

I'm planning to look into this in the next few weeks. At this point
I'm mainly interested in adding the CSV format to allow easy parsing
in shell, but I think the Prometheus format could be added too.

Does this make sense?
--
Miroslav Lichvar
--
To unsubscribe email chrony-users-***@chrony.tuxfamily.org
with "unsubscribe" in the subject.
For help email chrony-users-***@chrony.tuxfamily.org
with "help" in the subject.
Trouble? Email ***@chrony.tuxfamily.org.
Ben Kochie
2016-02-29 09:40:49 UTC
Permalink
It makes sense, but the real problem is the there are ordering issues with
the way things are processed in most of the stats outputs.

For example in the sources/sourcestats, the code walks each source and
outputs all the stats for each source.

Prometheus expects metrics to be output in order of metric. For example
all of the offsets for each source.

I had a discussion with another person over the weekend, and I think what
we're going to do is to abandon the text metrics output idea and implement
the chronyd protocol (probably in Go) so that we can build a direct
Prometheus exporter.

- Ben Kochie
Post by Miroslav Lichvar
(this discussion would better fit the chrony-devel list)
Post by Ben Kochie
So I started work on adding a "metrics" command to client.c. It's pretty
hacky, but works.
https://github.com/SuperQ/chrony/pull/1
Comments welcome.
Ok, so you implemented the metrics command as a new function which
does the same as the serverstats command, but uses a different output
format. I assume you would extend it later to include also the
tracking, sources and sourcestats data. That would be a lot of
duplicated code.
As I said in the previous mail, I'd rather see it implemented as a
different output format for the existing commands. A new chronyc
option could be added to select the format, with default being the
currently used human-readable output. A new printf-like function would
be added, which would support printing hostnames or IP addresses, time
intervals, offsets, and all other data that need to be printed.
Depending on what output mode chronyc was running in, it would print
the labels, align the columns, print the values with units, print end
of lines, etc. All functions that implement the individual commands
would then be modified to use this new function.
I'm planning to look into this in the next few weeks. At this point
I'm mainly interested in adding the CSV format to allow easy parsing
in shell, but I think the Prometheus format could be added too.
Does this make sense?
--
Miroslav Lichvar
--
with "unsubscribe" in the subject.
with "help" in the subject.
Loading...