This one comes up ever 3-4 years or so in sysadmin communities, and I read it every single time. because it's worth it.
It's one of those things that I highly doubt would have occurred to me to have even checked, or given even a moments thought to, under normal circumstances.
I was looking for another famous sysadmin story, where the guy who also happens to be a top Linux developer (so maybe Alan Cox?) rescues a deeply broken Linux system where even glibc is no longer accessible by manipulating inodes in a running process. Or something.
Over the years, my Google-fu has failed me. Any clue? :)
I remember it happened at NYU a couple of years ago and they turned it into a kind of ad-hoc social network/partyline. I wonder if anyone archived those emails? I suppose they deserve to remain "private."
One listserve (can't remember which) made up a list for people who complained like this instead of following the unsubscribe instructions. The admins would remove complainers from the normal lists and add them all to one mailing list, where the only emails they got were each others' demands to be taken off the mailing list, with unsubscribe instructions added to the beginning and the end of every single email.
Ha. There is no explanation of why the mailing lists were named "Bedlam" though, and I doubt non-native readers know what it refers to. To quote Wikipedia [0]:
"Bedlam may refer to:
Bethlem Royal Hospital, London hospital first to specialise in the mentally ill and origin of the word "bedlam" describing chaos or madness"
I also found that to be evidence of pretty horrific architecture in Exchange. Two actual recipient lists with a secret internal one? Bloating headers to 13K? At the very least, it seems to me like they chose to put the distribution logic at the wrong layer...
Thanks for the link. I was surprised that it was written by Larry Osterman. I enjoy listening to his stories about Microsoft. Have you seen his Channel 9 videos [0]? I really enjoy the checking in videos with Erik Meijer.
If only every bug report that I received had been processed by a geostatistician... Usually I get a "hey, I can't get X to work". One of three responses from me usually fixes it: "Is your computer on?", "are you online?", and "try hitting refresh".
I am actually surprised the sysadmin in this scenario thought it was a bad thing that the statistics department did their research and presented a well documented error.
Well, technically, the geostatistician (Did I spell that right?) was doing research that was orthogonal to the actual problem and its symptoms. In this case, the results were sufficiently odd that they sort of pointed in the right direction, but I've been sent off on wild goose chases by people skillfully applying their own particular set of skills before.
On the other hand, there's the word document with nothing but a screen shot showing half of a useless error message.
Reminds me... when I post a support request to Google Apps, the issue description header says "in as much detail as possible"... but the field is limited to 1000 characters. When you're dealing with anything other than simple first-level support issues, a user simply can't put in a usefully descriptive amount of detail...
units is nice, but there isn't much help, and the syntax isn't always easy to remember. It was fun to play with for a while, but wolframalpha.com is better.
Shouldn't this account for a round trip, and the speed through copper (~ 2/3rd of the speed of light)? That would lower the radius to much more than 500 miles.
I had this thought when reading this before as well. I imagine that the "3 milliseconds" they determined from testing was a typical number, maybe the median/mean, and that the actual timeout varied considerably depending on CPU load at that particular moment. Add in a number of retries for the server to attempt sending each email, and the effective timeout might have been a few milliseconds more... or at least it must have been, because `(2 * 500 miles) / (2/3 speed of light)` works out to about 8 milliseconds (where the 2X is for the round trip, and 2/3 is a rough multiplier for the speed of light traveling in either copper or optical fiber).
Another of the 10,000 here - this is such a delightful story.
Also just discovered the "units" conversion program and disappointed that the default Mac library has only 586 units. And shockingly there don't seem to be compatible libraries out there.
Thanks for a good read. Its strange to think about a time when there were a myriad of incompatible networks, and their different capabilities could be exploited.
Since I've seen a few comments about units not having lightseconds so here are a few ways to add the missing unit if you don't have it.
1) Add this line under the lightyear definition in /usr/share/misc/units.lib (or wherever `man units` says the standard units library is under the FILES section)
lightsecond lightyear / 365.25 / 24 / 60 / 60
2) If you're on a mac and use homebrew just `brew install gnu-units` and then run `gunits`
But then it sent him off in a direction not worth going. He literally started to map out how far emails would go if they succeeded. The whole time the error was in the timeout instead.
TTL is involved when dealing with routed networks. The farther the destiny, you normally get more hops on the way. If the starting TTL is low, you won't reach the destiny. So, TTL values cause problems like this, although the radius wouldn't be so precise. Damn statisticians!
Via 'man units': "The conversion information is read from a units data file that is called 'definitions.units' and is usually located in the '/usr/share/units' directory."
Some distributions only support lightyear so adding this line to your units file (which you can find with man units) will give you support for *lightseconds:
I had the same thing happen to me. From the manpage I gathered that units uses the definitions defined in /usr/share/misc/units.lib, by running cat /usr/share/misc/units.lib | grep light I found I only had lightyear and it's shortcut ly defined. I added lightsecond, and since milli prefix is already defined it worked a treat.
Absolytely a good reading. Sometimes this kind of readings can help in a complete different problem. Sometime happens you are dealing with another problem, then you remember this story, and you figure out what's wrong because there're some similarities. I remember to have fixed a problem with Postgresql remembering a story about Unicode and Postfix, different domain, but similar problem.
If you're a sysadmin and someone brings in a consultant who gets root access and upgrades the whole OS to a new operating system which then almost takes out email.. wouldn't that be a problem?
If I were the sysadmin and that happened, I would need to have a meeting with some people. What's the point of being a sysadmin if he operating system is randomly going to be completely changed without someone telling you?
I have a fair amount of built up rage. This seems like one of those situations where it is actually your responsibility to rip people a new one.
Every time I read this I am reminded of units(1) util, which is super useful and I always forget about and revert to Google. But yeah, that connect timeout to 500 mi correlation is fun too.
Once a year is about the right frequency. Recurring stories is one way in which a community shares and perpetuates its culture with newcomers. Some of them are a delight to read on that yearly cadence, like the SR-71 story about a pilot and his copilot becoming a crew.
That said, it's wise to consider the frequency with which such things appear, individually and in total. Too much repetition and focus on memes becomes dysfunctionally self-obsessive. Not sure what the right answer is, but I can probably deal with once per year, short time on front page, and small % of total content.
This is an interesting idea. Have a system where a community can mark something as important, and to have it automatically reposted at preset intervals. Community members could be allowed to additionally repost, or the system can politely say it's already archived and will be shared again on such & such date. Use it as a way to reinforce community history.
This doesn't apply very well... HN is heavily archived... this comic is about being rude to people for not knowing about something, not justifying shoving the same cyclical content in people's faces repeatedly.
Wow, I must have bad timing. I've had an account here for almost all of those, and I think I was probably lurking for the 1 or 2 occurrences when I did not have an account, but don't remember seeing it before.
I don't think this is a bad thing. It was either 1 or 2 years ago when I first read about this - newcomers to the community have to find out about things in one way or another.
> And also being a good system administrator, I had written a sendmail.cf [...]
Say what? Nobody writes a sendmail.cf from scratch, unless they are crazy.
> ... that used the nice long self-documenting option and variable names available in Sendmail 8 rather than the cryptic punctuation-mark codes that had been used in Sendmail 5
Good system administrators stick to conservative, portable subsets of configuration and scripting languages, rather than bleeding edge stuff.
When they deviate, they have a clear plan. They document their choice to use something new and shiny, and they keep it separated from the default system configuration.
Since SunOS came with Sendmail 5, the upgraded Sendmail 8 should have been installed in some custom location with its own path so that it coexists with the stock Sendmail, and is not perturbed if the OS happens to upgrade that.
A good syadmin would stick that in some /usr/local/bin type local directory, and not overwrite /usr/bin/sendmail.
The consultant was not wrong to update the OS. People have reasons to do that. The consultant should have consulted with the sysadmin, of course. But even in that event, it might not have immediately occurred to the sysadmin what the implication would be to the sendmail setup.
Goodness, you're determined to find fault, aren't you? (For the record in re your comment later about my "basis to call [myself] a good system admin", those claims were a) jokey, and b) fairly well-substantiated by my reputation by that time, I should think. I was published by that point and had been on several conference committees along with many who'd be reading that mailing list; I hardly needed to peacock like you seem to think I was doing.)
But I think your criticisms seem a little uninformed (or possibly over-informed by later practice to the point where you aren't considering this in the context of mid-1990's practice). Let's see...
> > And also being a good system administrator, I had written a sendmail.cf [...]
> Say what? Nobody writes a sendmail.cf from scratch, unless they are crazy.
I didn't say "from scratch". I used the m4 macros to create a cf, like everyone did at the time. Using the default file would only work if you still used email programs that read raw mbox files, had no email lists, and needed no interesting aliasing or vacation script behavior. Oh, and ran in an environment where it was reasonable to assume someone's canonical email address could be found via the equivalent of "echo "${USER}@${HOST#.}".
Very few production systems could get away with that; writing a sendmail.cf was standard practice. And with m4, you usually spoke of "writing" a file where today we'd call it "configuring" a file; either way it was taking boilerplate and replacing bits with things that were right for your situation. I assume you wouldn't have had an issue with my writing that I'd "configured" the sendmail.cf. That's all I did.
> > ... that used the nice long self-documenting option and variable names available in Sendmail 8 rather than the cryptic punctuation-mark codes that had been used in Sendmail 5
> Good system administrators stick to conservative, portable subsets of configuration and scripting languages, rather than bleeding edge stuff.
Hmm, you either weren't administering SunOS in the mid-90's or you're forgetting some details. SunOS still came with Sendmail 5 years* after best practice was to use Sendmail 8. Check out the O'Reilly Sendmail book of the time's pagecount: it was longer than the prior and the later versions because it had to document both. I'm not entirely certain SunOS (as opposed to Solaris) ever was upgraded to Sendmail 8 in the distribution; obviously the people using SunOS still so late were change-averse.
"Bleeding edge" != "the version that all but the most conservative holdouts are using". Also, remember that this was the same period we were doing the rsh/rlogin conversion to SSH. Sendmail 5 still had known security issues that were fixed in Sendmail 8. We were used to replacing system components when what the OS vendor was shipping us was literally dangerous to run.
And Sendmail 8's Sendmail 5 compatibility mode was simply there for testing; it was never intended to be used production long-term, so using a least-common-denominator sendmail.cf wouldn't have been "conservative and portable"; it would have been risky, bordering on malpractice.
> Since SunOS came with Sendmail 5, the upgraded Sendmail 8 should have been installed in some custom location with its own path so that it coexists with the stock Sendmail, and is not perturbed if the OS happens to upgrade that.
> A good syadmin would stick that in some /usr/local/bin type local directory, and not overwrite /usr/bin/sendmail.
Again, either you didn't run this installation in the mid-90's or you're forgetting some details. /usr/lib/sendmail (notice the "lib"! Your referring to "/usr/bin/sendmail" suggests to me you definitely weren't running SunOS 4 or have forgotten details; sendmail was never in /usr/bin) couldn't be left alone, as other tools hardcoded that path. The actual executable was there, so symlinking couldn't be used to get around that.
> Say what? Nobody writes a sendmail.cf from scratch, unless they are crazy.
The point moreover was that he had a custom version of the config file (not just default).
Yes, sites have necessary customizations in sendmail.cf. These do not have to be rewrites that use shiny new syntax.
My biggest problem with the author was not that he uses his admin blunders as a basis to call himself a good sysadmin, but that he assumed that the stats people were idiots who don't know anything about `puters or networks.
I was not surprised by the 500 mile claim. It strikes me as obvious that the 500 miles has to do with some combination of network topology and propagation delays, those being approximately the same in every direction.
Yes, networking does work "that way": farther places take more time to reach than nearer ones, broadly speaking. (Of course, it's faster to reach something 12,000 km away with no packet switch in between than something 50 miles away with switching. That doesn't eliminate the generality.)
It was also obvious why they didn't report the problem instantly; you cannot instantly know that mail isn't reaching beyond 500 miles without gathering data and correlating to a map, which takes time. Instantly, you can only know data points like "I can't mail to users@example.com". You know that if a stats person gives you a number, it was based on data, and not just a couple of data points. The head of the stats department isn't going to give you a number that isn't factual and backed by science. Of course stats people pride themselves on their data analysis; they are not just going to relay a couple of data points with no analysis attached.
This one comes up ever 3-4 years or so in sysadmin communities, and I read it every single time. because it's worth it.
It's one of those things that I highly doubt would have occurred to me to have even checked, or given even a moments thought to, under normal circumstances.
This and the story of Mel never get old.
http://www.catb.org/jargon/html/story-of-mel.html
I was looking for another famous sysadmin story, where the guy who also happens to be a top Linux developer (so maybe Alan Cox?) rescues a deeply broken Linux system where even glibc is no longer accessible by manipulating inodes in a running process. Or something.
Over the years, my Google-fu has failed me. Any clue? :)
http://www.lug.wsu.edu/node/414 is what you are looking for.
1 reply →
That's a classic.
Best I can claim is zmodem transfers of uunecoded packages over a PLIP link as I tried to get ethernet support up on an old but fairly reliable box.
Another email incident at Microsoft worth reading [1].
[1] http://blogs.technet.com/b/exchange/archive/2004/04/08/10962...
My favorite version of this tale: "Free Bananas in the Kitchen!"
http://www.metafilter.com/78177/PLEASE-UNSUBSCRIBE-ME-FROM-T...
I remember it happened at NYU a couple of years ago and they turned it into a kind of ad-hoc social network/partyline. I wonder if anyone archived those emails? I suppose they deserve to remain "private."
One listserve (can't remember which) made up a list for people who complained like this instead of following the unsubscribe instructions. The admins would remove complainers from the normal lists and add them all to one mailing list, where the only emails they got were each others' demands to be taken off the mailing list, with unsubscribe instructions added to the beginning and the end of every single email.
Ha. There is no explanation of why the mailing lists were named "Bedlam" though, and I doubt non-native readers know what it refers to. To quote Wikipedia [0]:
"Bedlam may refer to:
Bethlem Royal Hospital, London hospital first to specialise in the mentally ill and origin of the word "bedlam" describing chaos or madness"
[0] http://en.wikipedia.org/wiki/Bedlam
I'm a non-native speaker and I know what Bedlam means. Thanks to Ultima Online and Diablo :)
I also found that to be evidence of pretty horrific architecture in Exchange. Two actual recipient lists with a secret internal one? Bloating headers to 13K? At the very least, it seems to me like they chose to put the distribution logic at the wrong layer...
> Two actual recipient lists with a secret internal one?
How else do you propose handling BCC and mailing lists?
Thanks for the link. I was surprised that it was written by Larry Osterman. I enjoy listening to his stories about Microsoft. Have you seen his Channel 9 videos [0]? I really enjoy the checking in videos with Erik Meijer.
[0] http://channel9.msdn.com/tags/Larry+Osterman
Literally the exact same thing happened at Case Western this past weekend
Ah yes, the age old "reply-all" email storm.
The bit about the recipient processing bug is novel tough, ouch.
If only every bug report that I received had been processed by a geostatistician... Usually I get a "hey, I can't get X to work". One of three responses from me usually fixes it: "Is your computer on?", "are you online?", and "try hitting refresh".
I am actually surprised the sysadmin in this scenario thought it was a bad thing that the statistics department did their research and presented a well documented error.
Well, technically, the geostatistician (Did I spell that right?) was doing research that was orthogonal to the actual problem and its symptoms. In this case, the results were sufficiently odd that they sort of pointed in the right direction, but I've been sent off on wild goose chases by people skillfully applying their own particular set of skills before.
On the other hand, there's the word document with nothing but a screen shot showing half of a useless error message.
Reminds me... when I post a support request to Google Apps, the issue description header says "in as much detail as possible"... but the field is limited to 1000 characters. When you're dealing with anything other than simple first-level support issues, a user simply can't put in a usefully descriptive amount of detail...
I didn't know about the units program. Is there any resource out there that lists these little *nix utility programs?
That's the thing about unix-like systems. No matter how much you have learned there's always some command you don't know.
paste was my most recent "holy shit this saves so much time" discovery. I blame it on the not quite intuitive name.
8 replies →
Unless `units` doesn't happen to be installed by default, which is the case at least for Arch Linux.
Though it doesn't contain `units` either, here's a Wikipedia list of the standardized (IEEE 1003.1-2008) unix commands. http://en.wikipedia.org/wiki/List_of_Unix_commands
2 replies →
info coreutils is a great place to start.
units is nice, but there isn't much help, and the syntax isn't always easy to remember. It was fun to play with for a while, but wolframalpha.com is better.
Shouldn't this account for a round trip, and the speed through copper (~ 2/3rd of the speed of light)? That would lower the radius to much more than 500 miles.
I had this thought when reading this before as well. I imagine that the "3 milliseconds" they determined from testing was a typical number, maybe the median/mean, and that the actual timeout varied considerably depending on CPU load at that particular moment. Add in a number of retries for the server to attempt sending each email, and the effective timeout might have been a few milliseconds more... or at least it must have been, because `(2 * 500 miles) / (2/3 speed of light)` works out to about 8 milliseconds (where the 2X is for the round trip, and 2/3 is a rough multiplier for the speed of light traveling in either copper or optical fiber).
The FAQ answers this question. Basically; it was a long time ago, and the point of the story isn't in the detail. :)
http://www.ibiblio.org/harris/500milemail-faq.html
11 replies →
Another of the 10,000 here - this is such a delightful story.
Also just discovered the "units" conversion program and disappointed that the default Mac library has only 586 units. And shockingly there don't seem to be compatible libraries out there.
`brew install gnu-units` should do it :)
Edit: You'll then want to run it with `gunits` rather than `units`
Now I know where the rapper got his name.
Awesome, thank you!
Yay! Works.
I was so happy to discover that units command line program, then i realized that Google already does this, it just wasn't as fun.
Google has units, but not detailed commentary on unit definitions.
See https://futureboy.us/frinkdata/units.txt for that.
yeah and units on the mac terminal doesn't recognise "3 millilightseconds" whereas Google works for "0.003 light seconds to miles"
works exactly as in the blog post in fedora.
2 replies →
Forgot to account for the difference between traditional speed of light (in a vacuum) and speed of light traveling through copper of fiber. :)
And the time it takes to make a round trip.
Better link that contains more headers (showing the email's date, and linking to a FAQ):
http://www.ibiblio.org/harris/500milemail.html
This always reminds me of the email around the world: http://phrack.org/issues/41/4.html
Thanks for a good read. Its strange to think about a time when there were a myriad of incompatible networks, and their different capabilities could be exploited.
Since I've seen a few comments about units not having lightseconds so here are a few ways to add the missing unit if you don't have it.
1) Add this line under the lightyear definition in /usr/share/misc/units.lib (or wherever `man units` says the standard units library is under the FILES section)
2) If you're on a mac and use homebrew just `brew install gnu-units` and then run `gunits`
That's the speed of light in a vacuum ... through fiber-optic cable the speed of light is about two-thirds that value.
I did #2, then:
sudo mv units macunits sudo ln -s $(which gunits) units
or use brew install option --with-default-names and put your homebrew at the start of your path.
Damn statisticians. They do know their job quite well.
It was a seriously accurate bug report. If only all users were so thoughtful.
> If only all users were so thoughtful.
But then it sent him off in a direction not worth going. He literally started to map out how far emails would go if they succeeded. The whole time the error was in the timeout instead.
2 replies →
Who though it was going to be a TTL issue before finishing reading the story? :)
You probably mean something else (RTT?) but definitely not TTL, which is a completely different thing :)
TTL is involved when dealing with routed networks. The farther the destiny, you normally get more hops on the way. If the starting TTL is low, you won't reach the destiny. So, TTL values cause problems like this, although the radius wouldn't be so precise. Damn statisticians!
1 reply →
Not I, I don't know why but I was thinking IP version issue but that makes no sense.
Why do I get:
> unknown unit 'millilightseconds'
Is this one of the embellishments that just makes the story more entertaining?
Not an embellishment at all.
Via 'man units': "The conversion information is read from a units data file that is called 'definitions.units' and is usually located in the '/usr/share/units' directory."
Via definitions.units (L. 223), you can see the milli- prefix: https://gist.github.com/anonymous/f06769de95e0c7f9e658#file-...
Via deifnitions.units (L. 1060), you can see the lightsecond unit: https://gist.github.com/anonymous/f06769de95e0c7f9e658#file-...
Maybe check it for completeness?
Edit: Spelling
Some distributions only support lightyear so adding this line to your units file (which you can find with man units) will give you support for *lightseconds:
lightsecond lightyear / 365.25 / 24 / 60 / 60
1 reply →
I don't know about what "units" support, but if you ever need picolightseconds in your web design, CSS got you covered:
http://dev.w3.org/csswg/css-egg/#astro-units
I had the same thing happen to me. From the manpage I gathered that units uses the definitions defined in /usr/share/misc/units.lib, by running cat /usr/share/misc/units.lib | grep light I found I only had lightyear and it's shortcut ly defined. I added lightsecond, and since milli prefix is already defined it worked a treat.
Here's the line you'll want to add:
lightsecond lightyear / 365.25 / 24 / 60 / 60
If you're on a mac, try $ brew install gnu-units - it's probably using a very incomplete library of units.
More complete units library. Note how the original author's units has 1311 units and 63 prefixes, OSX only has 586 and 56.
I see this story every so often, and it's good one, but haven't thought to verify it. Has anyone else?
This FAQ was posted in the comments of a previous posting: https://webcache.googleusercontent.com/search?q=cache:http:/...
Absolytely a good reading. Sometimes this kind of readings can help in a complete different problem. Sometime happens you are dealing with another problem, then you remember this story, and you figure out what's wrong because there're some similarities. I remember to have fixed a problem with Postgresql remembering a story about Unicode and Postfix, different domain, but similar problem.
That was great out-of-the-box thinking, and I wonder if that could be used as one of these job interview questions:
Q: "Your email server for some reason is only working for addresses within 500 miles of the server. What may go wrong?"
And let the candidate think logically and reach some sane answer, even if not 100% accurate (i.e. check routers first, connectivity, DNS, timeouts...)
That's one of those interview questions that tests for someone reading hacker news and pretending that they figured it out all on their own...
"culture fit"
1 reply →
Anyone who short circuits and brings up this story should get a +1.
If you're a sysadmin and someone brings in a consultant who gets root access and upgrades the whole OS to a new operating system which then almost takes out email.. wouldn't that be a problem?
If I were the sysadmin and that happened, I would need to have a meeting with some people. What's the point of being a sysadmin if he operating system is randomly going to be completely changed without someone telling you?
I have a fair amount of built up rage. This seems like one of those situations where it is actually your responsibility to rip people a new one.
A perfect answer to the YC application question - "Tell us something surprising or amusing that one of you has discovered" :)
I tried the 'units' program on OSX. It seems that it does not recognise the 'millilightseconds' unit.
Try one L in mili?
Didn't work.
1 reply →
I'm wondering how many hits that email address got at the bottom of the page :)
Was this just a clever way to let people know he was looking for a job?
Every time I read this I am reminded of units(1) util, which is super useful and I always forget about and revert to Google. But yeah, that connect timeout to 500 mi correlation is fun too.
1 year ago: https://news.ycombinator.com/item?id=123489
Reposts are fine when a story hasn't had significant attention in the past year or so.
https://news.ycombinator.com/newsfaq.html
Once a year is about the right frequency. Recurring stories is one way in which a community shares and perpetuates its culture with newcomers. Some of them are a delight to read on that yearly cadence, like the SR-71 story about a pilot and his copilot becoming a crew.
That said, it's wise to consider the frequency with which such things appear, individually and in total. Too much repetition and focus on memes becomes dysfunctionally self-obsessive. Not sure what the right answer is, but I can probably deal with once per year, short time on front page, and small % of total content.
This is an interesting idea. Have a system where a community can mark something as important, and to have it automatically reposted at preset intervals. Community members could be allowed to additionally repost, or the system can politely say it's already archived and will be shared again on such & such date. Use it as a way to reinforce community history.
4 replies →
Would you mind linking the SR-71 story? Somehow I never saw that.
4 replies →
Maybe the submitter is one of the ten thousand https://xkcd.com/1053/
Bet there are a few more that will find this submission too.
Ha, I'm one of the ten thousand for both the XKCD and this post. Lucky me!
2 replies →
I learned about diet coke and mentos from that cartoon. It was wonderful.
This doesn't apply very well... HN is heavily archived... this comic is about being rude to people for not knowing about something, not justifying shoving the same cyclical content in people's faces repeatedly.
And the thing is, every time I see this story (I saw it years ago before I discovered HN), I read it in its entirety, and love the shit out of it.
It really makes me sad the BOFH series of stories is over, I loved those too.
The Register still carries them[1]. I don't know their exact relation to the original but I think they're official.
[1] http://www.theregister.co.uk/data_centre/bofh/
3 replies →
Wow, I must have bad timing. I've had an account here for almost all of those, and I think I was probably lurking for the 1 or 2 occurrences when I did not have an account, but don't remember seeing it before.
Or perhaps senility is setting in early. :-D
The only significant discussion was almost five years ago. Or about the time the first iPads went on sale. And before either of us were members.
I missed it all the other times and am glad it was reposted.
I don't think this is a bad thing. It was either 1 or 2 years ago when I first read about this - newcomers to the community have to find out about things in one way or another.
/me is unsure whether this is condemnation or validation. anyway, let's ride the wave.
old but gold
Can we get a nice "HN Classic" tag to put beside annual stories like this? I'm fine if stories like this pop up every year, actually.
https://gunungbromosunrisetour.wordpress.com/2015/01/08/dari...
> And also being a good system administrator, I had written a sendmail.cf [...]
Say what? Nobody writes a sendmail.cf from scratch, unless they are crazy.
> ... that used the nice long self-documenting option and variable names available in Sendmail 8 rather than the cryptic punctuation-mark codes that had been used in Sendmail 5
Good system administrators stick to conservative, portable subsets of configuration and scripting languages, rather than bleeding edge stuff.
When they deviate, they have a clear plan. They document their choice to use something new and shiny, and they keep it separated from the default system configuration.
Since SunOS came with Sendmail 5, the upgraded Sendmail 8 should have been installed in some custom location with its own path so that it coexists with the stock Sendmail, and is not perturbed if the OS happens to upgrade that.
A good syadmin would stick that in some /usr/local/bin type local directory, and not overwrite /usr/bin/sendmail.
The consultant was not wrong to update the OS. People have reasons to do that. The consultant should have consulted with the sysadmin, of course. But even in that event, it might not have immediately occurred to the sysadmin what the implication would be to the sendmail setup.
Goodness, you're determined to find fault, aren't you? (For the record in re your comment later about my "basis to call [myself] a good system admin", those claims were a) jokey, and b) fairly well-substantiated by my reputation by that time, I should think. I was published by that point and had been on several conference committees along with many who'd be reading that mailing list; I hardly needed to peacock like you seem to think I was doing.)
But I think your criticisms seem a little uninformed (or possibly over-informed by later practice to the point where you aren't considering this in the context of mid-1990's practice). Let's see...
> > And also being a good system administrator, I had written a sendmail.cf [...]
> Say what? Nobody writes a sendmail.cf from scratch, unless they are crazy.
I didn't say "from scratch". I used the m4 macros to create a cf, like everyone did at the time. Using the default file would only work if you still used email programs that read raw mbox files, had no email lists, and needed no interesting aliasing or vacation script behavior. Oh, and ran in an environment where it was reasonable to assume someone's canonical email address could be found via the equivalent of "echo "${USER}@${HOST#.}".
Very few production systems could get away with that; writing a sendmail.cf was standard practice. And with m4, you usually spoke of "writing" a file where today we'd call it "configuring" a file; either way it was taking boilerplate and replacing bits with things that were right for your situation. I assume you wouldn't have had an issue with my writing that I'd "configured" the sendmail.cf. That's all I did.
> > ... that used the nice long self-documenting option and variable names available in Sendmail 8 rather than the cryptic punctuation-mark codes that had been used in Sendmail 5
> Good system administrators stick to conservative, portable subsets of configuration and scripting languages, rather than bleeding edge stuff.
Hmm, you either weren't administering SunOS in the mid-90's or you're forgetting some details. SunOS still came with Sendmail 5 years* after best practice was to use Sendmail 8. Check out the O'Reilly Sendmail book of the time's pagecount: it was longer than the prior and the later versions because it had to document both. I'm not entirely certain SunOS (as opposed to Solaris) ever was upgraded to Sendmail 8 in the distribution; obviously the people using SunOS still so late were change-averse.
"Bleeding edge" != "the version that all but the most conservative holdouts are using". Also, remember that this was the same period we were doing the rsh/rlogin conversion to SSH. Sendmail 5 still had known security issues that were fixed in Sendmail 8. We were used to replacing system components when what the OS vendor was shipping us was literally dangerous to run.
And Sendmail 8's Sendmail 5 compatibility mode was simply there for testing; it was never intended to be used production long-term, so using a least-common-denominator sendmail.cf wouldn't have been "conservative and portable"; it would have been risky, bordering on malpractice.
> Since SunOS came with Sendmail 5, the upgraded Sendmail 8 should have been installed in some custom location with its own path so that it coexists with the stock Sendmail, and is not perturbed if the OS happens to upgrade that. > A good syadmin would stick that in some /usr/local/bin type local directory, and not overwrite /usr/bin/sendmail.
Again, either you didn't run this installation in the mid-90's or you're forgetting some details. /usr/lib/sendmail (notice the "lib"! Your referring to "/usr/bin/sendmail" suggests to me you definitely weren't running SunOS 4 or have forgotten details; sendmail was never in /usr/bin) couldn't be left alone, as other tools hardcoded that path. The actual executable was there, so symlinking couldn't be used to get around that.
> Say what? Nobody writes a sendmail.cf from scratch, unless they are crazy. The point moreover was that he had a custom version of the config file (not just default).
Yes, sites have necessary customizations in sendmail.cf. These do not have to be rewrites that use shiny new syntax.
My biggest problem with the author was not that he uses his admin blunders as a basis to call himself a good sysadmin, but that he assumed that the stats people were idiots who don't know anything about `puters or networks.
I was not surprised by the 500 mile claim. It strikes me as obvious that the 500 miles has to do with some combination of network topology and propagation delays, those being approximately the same in every direction.
Yes, networking does work "that way": farther places take more time to reach than nearer ones, broadly speaking. (Of course, it's faster to reach something 12,000 km away with no packet switch in between than something 50 miles away with switching. That doesn't eliminate the generality.)
It was also obvious why they didn't report the problem instantly; you cannot instantly know that mail isn't reaching beyond 500 miles without gathering data and correlating to a map, which takes time. Instantly, you can only know data points like "I can't mail to users@example.com". You know that if a stats person gives you a number, it was based on data, and not just a couple of data points. The head of the stats department isn't going to give you a number that isn't factual and backed by science. Of course stats people pride themselves on their data analysis; they are not just going to relay a couple of data points with no analysis attached.
1 reply →