Comment by obelos

2 years ago

Not every time. Sometimes it's DNS.

Once it was a failing line card in router zeroing last bit in IPv4 addresses, resulting in ticket about "only even IPv4 addresses are accessible" ...

  • For some reason this reminded me of the "500mi email" bug [1], maybe a similar level of initial apparent absurdity?

    [1] https://www.ibiblio.org/harris/500milemail.html

    • The most absurd thing to me about the 500 mile email situation is that sendmail just happily started up and soldiered on after being given a completely alien config file. Could be read as another example of "be liberal in what you accept" going awry, but sendmail's wretched config format is really a volume of war stories all its own...

      4 replies →

    • I can definitely confirm our initial reaction was "WTF" followed with idea that the dev team is making fun of us... but we went in and run traceroutes and there it was :O

      Was fixed in incredible coincidence manner, too - the CTO of the network link provider was in their offices (in the same building as me) and felt bored. Apparently having went through all the levels from hauling cables in datacenter up to CTO level, after short look at traceroutes he just picked a phone, called NOC, and ordered a line card replacement on the router :D

One time for me it was: the glass was dirty.

Some router near a construction site had dust settle into the gap between the laser and the fiber, and it attenuated the signal enough to see 40-50% packet loss.

We figured out where the loss was and had our NOC email the relevant transit provider. A day later we got an email back from the tech they dispatched with the story.

Once every 50 years and 2 billion kilometers, it's a failing memory chip. But you can usually just patch around them, so no big deal.

When it fails, it's DNS. When it just stops moving, it's either TCP_NODELAY or stream buffering.

Really complex systems (the Web) also fail because of caching.

I chuckle whenever I see this meme, because in my experience, the issue is usually DHCP.

  • But it's usually DHCP that sets the wrong DNS servers.

    It's funny that some folks claim DNS outage is a legitimate issue in systems whose both ends they control. I get it; reimplementing functionality is rarely a good sign, but since you already know your own addresses in the first place, you should also have an internal mechanism for sharing them.