Comment by cdfuller

5 days ago

Can anybody expand on the implications of this?

Being unfamiliar with it, it's hard to tell if this is a minor blip that happens all the time, or if it's potentially a major issue that could cause cascading errors equal to the hype of Y2K.

Time travel is extremely dangerous right now. I highly recommend deferring time travel plans except for extreme temporal emergencies.

  • Same for database transaction roll back and roll forward actions.

    And most enterprises, including banks, use databases.

    So by bad luck, you may get a couple of transactions reversed in order of time, such as a $20 debit incorrectly happening before a $10 credit, when your bank balance was only $10 prior to both those transactions. So your balance temporarily goes negative.

    Now imagine if all those amounts were ten thousand times higher ...

  • Uhh, here's the problem, I'm sort of stuck travelling into the future at a more or less constant rate. I don't know how to stop doing that...

  • Would traveling to the past in order to put in place a preemptive fix for this outage be wise or dangerous?

    Asking for a friend.

    • I couldn't comment on the causal hazards but since time is currently having an outage they've got an improved shot at getting away with it. I say go for it.

    • Very unproblematic. Your travelling back will land you in a freshly branched universe with no way back to the one you came from, so no worries there.

Google has their own fleet of atomic clocks and time servers. So does AWS. So does Microsoft. So does Ubuntu. They're not going to drift enough for months to cause trouble. So the Internet can ride through this, mostly.

The main problem will be services that assume at least one of the NIST time servers is up. Somewhere, there's going to be something that won't work right when all the NIST NTP servers are down. But what?

  • Ubuntu using atomic clocks would surprise me. Sure they could, but it's not obvious to me why they would spend $$$$ on such. More plausible to me seems that they would be using GPSDO as reference clocks (in this context, about as good as your own atomic clock), iff they were running their own time servers. Google finds only that they are using servers from the NTP Pool Project, which will be using a variety of reference clocks.

    If you have information on what they actually are using internally, please share.

    • I think people have a wrong idea of what a modern atomic clock looks like. These are readily available commercially, Microchip for example will happily sell you hydrogen, cesium or rubidium atomic clocks. Hydrogen masers are rather unwieldy, but you can get a rubidium clock in a 1U format and cesium ones are not much bigger. I think their cesium freq standards are formerly a HP business they acquired.

      Example: https://www.microchip.com/en-us/products/clock-and-timing/co...

      5 replies →

    • atomic clock is not expensive. they have different grades. module level atomic clock cost only $3500.

      the NIST hydrogen clock is very expensive and sophisticated.

  • Atomic clock non-expert here, what does having a fleet of atomic clocks entail and why would the hyperscalers bother?

    • Having clocks synchronized between your servers is extremely useful. For example, having a guarantee that the timestamp of arrival of a packet (measured by the clock on the destination) is ALWAYS bigger than the timestamp recorded by the sender is a huge win, especially for things like database scaling.

      For this though you need to go beyond NTP into PTP which is still usually based on GPS time and atomic clocks

      2 replies →

    • There's a lot of focus in this thread on the atomic clocks but in most datacenters, they're not actually that important and I'm dubious that the hyperscalers actually maintain a "fleet" of them, in the sense that there are hundreds or thousands of these clocks in their datacenters.

      The ultimate goal is usually to have a bunch of computers all around the world run synchronised to one clock, within some very small error bound. This enables fancy things like [0].

      Usually, this is achieved by having some master clock(s) for each datacenter, which distribute time to other servers using something like NTP or PTP. These clocks, like any other clock, need two things to be useful: an oscillator, to provide ticks, and something by which to set the clock.

      In standard off-the-shelf hardware, like the Intel E810 network card, you'll have an OXCO, like [1], with a GPS module. The OXCO provides the ticks, the GPS module provides a timestamp to set the clock with and a pulse for when to set it.

      As long as you have GPS reception, even this hardware is extremely accurate. The GPS module provides a new timestamp, potentially accurate to within single-digit nanoseconds ([2] datasheet), every second. These timestamps can be used to adjust the oscillator and/or how its ticks are interpreted, such that you maintain accuracy between the timestamps from GPS.

      The problem comes when you lose GPS. Once this happens, you become dependent on the accuracy of the oscillator. An OXCO like [1] can hold to within 1µs accuracy over 4 hours without any corrections but if you need better than that (either more time below 1µs, or more accurate than 1µs over the same time), you need a better oscillator.

      The best oscillators are atomic oscillators. [2] for example can maintain better than 200ns accuracy over 24h.

      So for a datacenter application, I think the main reason for an atomic clock is simply for retaining extreme accuracy in the event of an outage. For quite reasonable accuracy, a more affordable OXCO works perfectly well.

      [0]: https://docs.cloud.google.com/spanner/docs/true-time-externa...

      [1]: https://www.microchip.com/en-us/product/OX-221

      [2]: https://www.u-blox.com/en/product/zed-f9t-module

      [3]: https://www.microchip.com/en-us/products/clock-and-timing/co...

      3 replies →

  • Can't they point these dns records to working servers meanwhile to avoid degradation?

    • My understanding is that people who connect specifically to the NIST ensemble in Boulder (often via a direct fiber hookup rather than using the internet) are doing so because they are running a scientific experiment that relies on that specific clock. When your use case is sensitive enough, it's not directly interchangable with other clocks.

      Everyone else is already connecting to load balanced services that rotate through many servers, or have set up their own load balancing / fallbacks. The mistakenly hardcoded configurations should probably be shaken loose anyways.

    • If you use a general purpose hostname like time.nist.gov: that should resolve to an operational server and it makes sense to adjust during an incident. If you use a specific server hostname like time-a-b.nist.gov: that should resolve to the specific server and you're expected to have multiple hosts specified; it doesn't make sense to adjust during an incident, IMHO. You wanted boulder, you're getting boulder, faults and all.

  • I know this is HN, but the internet is pretty low on the list of things NIST time standards are important for.

    • could you list 3 things that you think are more important than the internet? (I know the internet is going to be fine; I just want to understand what you think ranks higher globally...)

      12 replies →

NIST maintains several time standards. Gaithersburg MD is still up and I assume Hawaii is as well. Other than potential damage to equipment from loss of power (turbo molecular vacuum pumps and oil diffusion pumps might end up failing in interesting ways if not shut down properly) it will just take some time for the clocks to be recalibrated against the other NIST standards.

Time engineers are very paranoid. I expect large problems can't occur due to a single provider misbehaving.

No noteworthy impact at all. The NTP network has hundreds to thousands of redundant servers and hundreds of redundant reference clocks.

The network will route around the damage with no real effects. Maybe a few microseconds of jitter as you have to ask a more distant server for the time.

>Can anybody expand on the implications of this?

The answer is no. Anyone claiming this will have an impact on infrastructure has no evidence backing it up. Table top exercises at best.

Songs with lyrics such as, "What time is it?" will have no clap back.

Perhaps, "We don't know." will become popular?

If your computer was using it as your time server and you didn't have alternatives configured your clock my have drifted a few seconds.

  • I never checked it, but how much a typical's pc/server's clock does actually drift over a week or a month? I always thought it's well under a second.

    • Clocks do drift. Seconds a week is definitely possible. I think there are varying quality of internal clocks in electronic devices, and the cheaper the quality the more drift there is. I think small cheap microcontrollers can drift seconds per day.

      1 reply →

    • I've seen some new ThinkPads lose a minute a month and others (the old ThinkPads) keep within a second of NTP over an entire year. It depends.

    • Several seconds per week is normal. Oscillator accuracy is roughly on the order of 10 PPM, which would correspond to 6 seconds per week.

    • I have an extremely cheap and extremely low power automatic cat feed - it’s been on 2 D batteries for 18 months. I just reset it after it had drifted 19 minutes, so about 1 minute a month, or 15 seconds a week!