Chrony Rate Fluctuations

The following graphs show the fluctuations in the rates of the system clock and of the real time clocks on a variety of computers on the theory network. All synchronize against the same system, ntp.ubc.ca, a stratum 2 ntp server on campus ( the time delay is on the order of 100s of microseconds to that machine from any of these computers).

The following graphs plot the rate of the system clock vs the ntp server (red line and left hand scale) and the rate of the RTC vs the system clock(real time clock-- the CMOS clock)( dotted lines and right hand scale) against the time in days after 00:00 on the date shown. The rates are in units of microseconds per second. These rates are determined by comparing the reading on the system clock with the ntp determined times on the NTP server to adjust the rate of the system clock, and the rate of the RTC vs the system clock. Note that the strong correlation between the rate fluctuations suggests that the system clock is the primary source of noise, and that in general the RTC has better stability than does the system clock.

In the graphs for the week ending Feb 11, the huge instability in the case of one of the machines, info,i and of the other machines after they were restarted on Feb 9, is unexplained. There seems to be an instability in the operation of chrony. The restoration of a semblance of order after the 10th was done by decreasing the maxupdateskew to 1/5 (from unlimited). Dilaton was the most accurate clock in its rate fluctuations before that restarting, but not afterwards.
Well, I have finally tracked down the problem. That stratum 2 server ntp.ubc.ca stinks. I got a gps device with a PPS output, which I hooked up to a couple of the machines. The most interesting is string, which had some of the most unstable behaviour with chrony and ntp.ubc.ca. in the following graph, I have plotted the response of string to the gps clock ( with chriny switched off) to ntp.ubc.ca and to tick.usask.edu, a stratum 1 server. The huge regular sawtooth waves come from ntp.ubc.ca. Not only is the system on average about 3ms fast, its offset varies regularly.
tick.usask.edu is very much better behaved-- considering that it is almost 10 msec away ( peer delay), its accuracy differs from the gps time by only about a few tens of a microsecond. (The "line" across the top is the gps time, with a width, a jtter of about 3 microseconds. The jagged line starting at 24 hr is tick.usask.ca, while the huge oscillation is ntp.ubc.ca, a supposed stratum 2 source. It may be that because it is running SunOS, the kernel cannot regulate the system clock properly leading to this behaviour.
(Note that in each case exactly the same overall drift has been removed from the data-- ie the drift was determined from teh GPS clock and then the same drift was removed from each of the other graphs.)
What is interesting is that while the gps spikes are all late ( by a few microseconds) both the ntp sources are early. This seems to imply that the outbound ntp packets take slightly longer than the inbound packets. On Apr 14 all of the machines except dilaton and string were changed to get their primary time from string, which gets its time from tick.usask.edu. Dilaton got its time from time-nw.nist.gov, a time server located at Microsoft but was switched to string on Apr 15.

This is especially obvious in the week ending Feb 18 Some of the machines have huge (10ppm) fluctuations in the rate, and at exactly the same time, others (eg charge) are running in the .2 ppm range of fluctuations. Ie, these fluctutions are not coming from the source ntp.ubc.ca. They seem to be inherent in the way chrony is setting the rates.

Since the time between comparison of the system clock vs the NTP server is of the order of 100-1000 sec (peer delay is .6ms typically) , the noise rate in the case of the best system would correspond to less than a millisecond drift















inflaton
    One 450MHz Intel Pentium III Processor, 128M RAM, 903.19 Bogomips Total
doublet
   Two 450MHz Intel Pentium III Processors, 256M RAM, 1805.35 Bogomips Total
dilaton
    One 750MHz Intel Pentium III Processor, 256M RAM, 1498.05 Bogomips Total
gauge
    One 750MHz Intel Pentium III Processor, 256M RAM, 1498.00 Bogomips Total
monopole
    One 750MHz Intel Pentium III Processor, 384M RAM, 1498.05 Bogomips Total
charge
    One 935MHz Intel Pentium III Processor, 384M RAM, 1872.92 Bogomips Total
orbit
    One 935MHz Intel Pentium III Processor, 256M RAM, 1872.86 Bogomips Total
string
    One 1.6GHz Intel Pentium 4 Processor, 512M RAM, 3194.28 Bogomips Total
fluxon
     One 2.67GHz Intel Pentium 4 Processor, 0.99GB RAM, 5339.53 Bogomips Total
boson
   Two 2.8GHz Intel Pentium 4 Processors, 0.98GB RAM, 11179.02 Bogomips Total
info
    Two 3GHz Intel Pentium 4 Processors, 0.99GB RAM, 12008.29 Bogomips Total
flory
    Two 3GHz Intel Intel(R) Pentium(R) D CPU 3.00GHz Processors, 1GB RAM, 12008.64 Bogomips Total

These rate fluctuations do not represent the actual clock accuracy, (in general chrony keeps the clocks to within a millisecond or less) but do represent the stability in the onboard system clock (driven from the bus frequency) and to some extent the real time clock. As chrony works, it measures the real time clock against the system clock, so an unstable system clock would produce an apparently unstable real time clock. In general the RTC seems to be more stable than is the system clock ( the correleated fluctuations in the system and RTC would suggest that a fair amount of the RTC instability comes from the system clock, rather than the RTC itself. )