We can't send email more than 500 miles (2002)

6 years ago (web.mit.edu)

I've had enough years to become wiser, become a fanatic for configuration management, and get over the embarrassment: I'm the consultant that screwed things up. Some background: the Stat department was running a variety of systems besides the Solaris workstations, and there was, within UNC-CH, a separate support organization that was cheaper and more comfortable with Microsoft products where Stat was sending their support dollars. When that organization needed Unix support, they called my employer, Network Computing Solutions, and I showed up.

There was effectively no firewall at UNC-CH at the time (something something academic freedom something something), and the Stat Solaris machines were not being regularly patched. Uninvited guests had infested them, and it appeared the most likely entry point was sendmail - at the time, it was the most notorious vulnerability on the internet. Since my preference to wipe and reload was unacceptable - too much downtime and too many billable hours - the obvious thing to do was update sendmail. The rest is history.

  • This is absolutely one of the key formative stories that helped me to think about systems at light speed scales.

    I'm currently at the very early stages of building a science museum and will eventually try to incorporate this story into an exhibit about light speed. This along with "Nanoseconds", foot long pieces of wire like what Grace Hopper handed out, can truly help to bring this topic to life.

    I'm also attempting to use this as the basis for a blockchain based "proof of proximity" in which a very high number of round trip encryptions of the previous blocks hash are stored in a bitcoin block. The number of round trips would be high enough that devices even a few hundred feet apart couldn't complete the task before the next block.

  • I read this story years ago and thought it was hilarious. Could've happened to anyone. In my book you're a near-celebrity and it's great that you can verify the story! Thanks for making things a little more interesting and a lot more fun :)

  • I’ve read this story many times, it’s hilarious (and could happen to anyone). Thanks for filling in that background - HN is so great for these kind of moments!

  • This should be added as an appendix to the original story. The original has been a favorite of mine for years and I love the addition.

  • > and get over the embarrassment

    It's not that bad, these things happen.

    It makes an interesting story though

I love that this took a perfect storm of having a statistician and sys admin both bent on finding the cause of a weird intermittent problem in their own way.

This could have happened a million times where the story was a lot less interesting:

"Hey, I'm having weird intermittent problems sending email."

"Hmm, we're using the wrong version of Sendmail. All fixed, case closed."

Best part of reading this is coming away having learned the existence of units the CLI. How did I spend 20 years on the shell and not have needed or discovered this?

  • One thing I got bitten by was the handling of Fahrenheit/Celsius, because it's a non-linear conversion between the two. When you ask to convert `10 degC` to `degF` you get 18, which is the delta of ºF corresponding to increment of 10ºC. To get the absolute temperature, you have to ask to convert `tempC(10)` to `tempF` which is 50, as expected.

    https://www.gnu.org/software/units/

    • "Non-linear" threw me off for a second - I almost never see the mathematically correct definition of linear in computer science spaces. For anyone wondering, Celsius to Fahrenheit is an affine transform, technically not linear, because you have to add an offset, not just multiply.

      1 reply →

    • FWIW, units on macOS (not GNU) handles conversion of `10 degC` to `degF` correctly, although it dates back to 1993.

      It seems that GNU units at some point added support for several non-linear units, which may have prompted them to rethink their syntax.

  • Be aware that currencies are stuck with rates from several years ago and don’t update.

    • Running `sudo units_cur` does the trick for me.

        $ units
        Currency exchange rates from FloatRates (USD base) on 2020-05-12
        $ sudo units_cur
        $ units
        Currency exchange rates from FloatRates (USD base) on 2020-07-09
      

      (GNU units, packed by Debian)

      1 reply →

    • Looking at the source of the default configuration (cat /usr/share/misc/units.lib), I believe it only defines conversions for currencies that are pegged to another one (mainly to EUR or USD).

          You have: 10 franc
          You want: dollar
          conformability error
           1.5244902 euro
           1 usdollar
          You have: 10 franc
          You want: euro
           * 1.5244902
           / 0.655957

    • I'm tempted to say it shouldn't even attempt to support currency conversion, as constantly in flux as it is.

  • I also discovered `units` because of this tale... but I was lucky enough to read it beck in the early days (pre 2005 at least).

  • 'units' was new to me too. The version I have on my Mac wouldn't accept 'millilightseconds' but it would take 'milli-c-seconds' - presumably the units.lib database is a little different from one in the original article.

  • Though sadly millilightseconds is not supported on macOS, at least, so you have to go:

        3 millilightyears / 365 / 86400
    

    Of course, round 365 to whatever average number of days you believe in :-)

  • units

    You have: mph

    You want: kph

            * 1.609344
    
            / 0.62137119

    • I have

        alias units='units --verbose
      

      in my shell rc which makes the output much more understandable:

        You have: mph
        You want: kph
                mph = 1.609344 kph
                mph = (1 / 0.62137119) kph

      1 reply →

I used to collect these kind of stories:

When I flush my toilet my computer reboots: http://www.techtales.com/tftechs.php?m=199712#66 (the first story on the page)

If I buy vanilla ice-cream my car wont start: http://www.netscrap.com/netscrap_detail.cfm?scrap_id=501

A specific cargo routing crashes system: https://www.jakepoz.com/debugging-behind-the-iron-curtain/

Tape-drive failure only within large print jobs: http://patrickthomson.tumblr.com/post/2499755681/the-best-de...

Interplanetary debugging with the Mars Rover: https://www.eetimes.com/the-trouble-with-rover-is-revealed/#

  • In my intern days some time around 10 years ago, a PI at the NASA GRC facility told me about a problem of this flavor an old grad student of his had.

    The guy was working on an optical sensor in a light-tight lab. Every morning, he came in, calibrated the sensor, and performed measurements. All morning, it held calibration with negligible drift. But when he came back from lunch, each time, the calibration had drifted off.

    Could it be related to the time of day? He tried taking his lunch an hour earlier and an hour later. Each time, the calibration was rock solid until right after lunch.

    In spite of protocol, he tried eating lunch in the lab, no one else in or out. Before lunch: good calibration. After lunch: bad calibration.

    He tried not eating lunch at all. That day, the calibration held all day.

    How could an optical sensor have any concept of whether its user had eaten lunch? It turned out, it only had to do with the lunch box. The sensor was fiber coupled, and it was sensitive to changes in transmission losses generated by changes to local radii of the patch chord. Every morning, the grad student set his lunch box down on the lab bench, nudging the fiber into some path. After eating, he’d replace his lunch box on the bench, nudging the fiber into a different path.

    After that, the fiber was secured with fixed conduit, and lunch boxes no longer entered the lab.

Previously:

2018: https://news.ycombinator.com/item?id=9338708

The ending of this makes it sound super clean. 3 ms * speed of light => ~560 miles. "It all makes sense!"

But ... isn't the speed of light through fiber actually like 2/3 of the speed of light in a vacuum? And that fiber isn't going to be laid out in a straight line exactly to the destination. So I think really there must have been a fair bit of uncertainty around that ~3ms to abort a connection.

  • Was gonna say. Speed of light through cables or fiber optics is roughly 2/3 the speed of light through vacuum. Also I don't see how it would know it has established a connection until the round-trip has happened. All in all, it probably waited more like 10 ms, if this story were true, which it probably isn't.

  • I am surprised that signals travel faster in copper (3c/4) than in fiber (2c/3), anyone has an explanation?

    • The number for copper is not a fixed quantity, it varies with the type of cable. Electric fields are outside of the copper wire, not inside. The copper conductor acts as a wave guide for the electromagnetic wave. So it turns out that things around the cable have an effect on the speed of propagation [1], particularly the insulator. Bare copper wire in a vacuum would be very close to c. In the case of fibre, the issue is index of refraction.

      [1] https://en.wikipedia.org/wiki/Velocity_factor

  • And the time it takes to get to that fiber. It’s a fictive piece probably?

Truly this is a worldly gem. Thank you for submitting this. :)

It's easy to forget that, even though transmissions still travel at near lightspeed, it still takes more than an instant to reach its destination, even digitally. I should keep this in mind, I think.

IMO even though this has been posted a bunch of times it’s important to repost these sort of campfire ghost stories so future developers can think more creatively about strange errors.

  • It should be recommended reading for new IT support staff. While you get unhelpful "its not working" requests all the time, when the user does give you information on the working / not working scenarios, you should always consider them, even if it doesn't make any sense.

  • Is there a book written about these campfire programming stories?

    This one and the microsoft bedlam one are some of my favorites.

    • Not sure, but there’s plenty of stories out there that one could probably be compiled.

Only the other week we were doing some testing on a new HCI (hyper-converged infrastructure) I'm doing the network for. At the end of the test period, we were having some storage sync issues. Everything seemed to PING ok, except my colleague happened to notice large jumbo frames over 8000 bytes were getting dropped. I double checked that we hadn't inadvertently changed network configuration. It was only by chance we had another test looking at transceiver signal levels that a customer engineer saw an alarm on RX level. It was then we remembered one test was to remove a module. I then noticed some error counts. We shutdown that particular link until we could visit the site. Sure enough, that fibre wasn't quite clicked in anymore. There was enough of a bridge across the fibre air gap for shall packets, but just wide enough so large packets statically couldn't be corrected enough to work.

  • Made ne think about the precision required for some errors to occur. Have had sort of similar things occur, and where it's almost impossible to reproduce it when you try!

    As a hobbyist sound engineer, usually regular cables are the first I check, but maybe I should extend that to fibre?

    Interesting error nevertheless, and honestly, checking fibre cabling for those kind of errors would probably be a bit lower on my list, unless I saw a lot of tranciever errors.

If somebody is tracking apt/yum downloads of packages, they might see a sudden spike for "units". Just installed and it is a nifty little useful tool.

Love this story.

I'm certain it will continue to be reposted to this website until ceases operations or the heat death of the universe whichever comes first.

Love this story, funnily enough it was the first story I ever read on HN on my first day of work as a junior developer.

Can someone please explain to me the POP reference? I do not understand what this author means by that.

I also would like help understanding what $ units gives? The command looks to be "units", but where do the numbers he entered in come from? I would appreciate this extra context.

  • $ is the shell prompt; he's not typing it. "3 millilightseconds" is the distance light travels in 3 milliseconds, the time a "zero" timeout would take to actually timeout. (This comes directly from the definition of the lightsecond: how far light travels in one second) "miles" is what he wants to see that distance converted to. Turns out it's 558 miles; one mile is 0.00179 of 3 millilightseconds.

  • POP I believe is "point of presence"; I believe its meaning is "the point at which our network connects to the internet".

Edit: Found it - its /usr/share/units/definitions.units (on Pop!_OS, so probably same on Ubuntu/Debian).

The FAQ[0] mentions:

> units on SunOS doesn't know about "millilightseconds."

> Yes. So? I used to populate my units.dat file with tons of extra prefixes and units. And actually, I think I was using AIX to run units; I don't know if it knew about millilightseconds. Take a look at the units.dat shipped with Linux these days. It definitely knows about millilightseconds.

I tried locate for units.dat but couldn't find it. Anyone knows where is it? Not keen on running a system-wide find.

[0]: https://www.ibiblio.org/harris/500milemail-faq.html

Hat tip to @nfriedly from back in 2011:

FYI units on OS X doesn't recognize millilightseconds, but you can do this:

  You have: 3 lightyears / 365 / 24 / 60 / 60 / 1000

  You want: miles

   * 559.21802

  / 0.0017882113

This story makes me smile every time it comes up. It's fascinating how many arbitrarily coded limits we keep breaking as we make our tech go faster without re-assessing the original assumptions :)

Does anyone know of stories similar to this? I would like to read more.

Haha, this was interesting. I posted the same story with more or less the same title a couple of month ago [1], and no one saw it and no comments. This time it got almost 1000 points and a lots of comments. What I think is interesting is how the same thing can get so different traction. Wonder what factors it is that makes a thing get traction and not?

[1] https://news.ycombinator.com/item?id=22164691

Since I've worked with linux email servers (sendmail, qmail, postfix, exim, etc) practically my whole profissional life (since around 1997, but used BBS since 91 - I'm 41 now), this story really amused me and got my attention! I love this kind of email debugging! LOL

I've heard this story before, but i didn't realize it was as recent as 2002. It feels like something from a much earlier bygone era, like the early 90s

The one thing that throws a wrench in this story for me: Lattes.

Latte's in 1994? In North Carolina? No way. Maybe on the West Coast, but I moved to Cali in 1989 and they were a rarity until the mid-late 90's. There were only 425 starbucks in the US in 1994 (from their site). The "fancy coffee" craze was just a blip on the radar in the mid 90's but gaining momentum.

;-)

  • > The "fancy coffee" craze was just a blip on the radar in the mid 90's but gaining momentum.

    Friends premiered in 1994, with The Central Perk being a major set piece of the show. I mean, yeah, its New York City and not North Carolina, but college towns anywhere are going to be early in trends.

    A latte in 1994 seems plausible to me. I remember getting them from a Gloria Jeans in my local suburban mall around 1990 or so.

    You're not wrong about the shape of the trajectory, but all throughout the 80s the coffee shop/latte trend was slowly building steam (heh) before it went hockey stick in the mid-90s.

  • In ‘94 (if not earlier), I was drinking lattes at a mom and pop coffee shop in a tiny town in the Midwest. And at another indie coffee shop at the nearest major university campus. That place was open 24 hours and busy at all hours. I didn’t even know what Starbucks was, but I sure knew lattes and cappuccinos.

    So yeah, lattes in ‘94 in a major college town seems totally plausible.

    • Agreed, I graduated high school in 1995, and I was dating a college girl in Rome, Georgia. Our favorite hangout was a coffee shop that served, among other things, lattes and frappes.

  • Latte's in 1994? In North Carolina? No way

    Definitely possible. Chapel Hill isn't like the rest of North Carolina, so I'd expect something like that to appear here before other parts of the state. And I remember the Books-a-Million in Wilmington started adding a cafe / starbucks-like area for "fancy coffee" about 1996 or so. I have no problem believing there were shops serving latte's in Chapel Hill during the era this story is described as happening in. And to be fair, the author even says in the FAQ that he's not sure about the exact date(s). It could have been as late as 1997.

  • From the FAQ

    > My guess, from the office I remember being in, the coworkers I remember speaking about this to, and some other such irrelevant but timely details, place it somewhere between 1994 and 1997.

  • Latte's in 1994? In North Carolina?

    We drank lattes in Louisiana in the 80's. Time to upgrade your stereotypes.