← Back to context

Comment by NhanH

10 years ago

I'd second the jacquesm's post: you should write a book or a blog post on those.

One of the reason I've noticed for everyone's wanting to use "distributed" and "cluster" stuffs is that we have no intuition/ experience on how much can be processed within the limit of a single machine: when someone start designing a data pipeline, even if they know how big (in term of GBs/TBs/ whatever criteria it is) the dataset is, they still don't know if it can fit in a machine or not. So the safe solution is to design a distributed system: if it doesn't fit, you throws more hardware at it. It's somewhat similar to the "no one got fired for buying IBM"

We probably need to collectively start doing the same thing that people do when they ask "How can I speed up my code?"... "Profile, profile, profile."

"How can I make my code cloud-scale?" "Profile, profile, profile." First make it fast. Not hyper-ultra fast, optimized to within an inch of its life with embedded ASM and crazy data structures, but as far as you can get it while still writing simple, sensible code. Then, only if you have a problem do you even worry about whether it should go cloud-scale, and by the time you're done with this you'll probably already have a good idea how to partition the problem better because you learned a lot more about it while profiling.

I strongly suspect a lot of this is a problem of the "no one got fired for buying IBM" type. Surely many if not most organizations have much greater faith in their ability to go high and wide than deep. The former requires opening their checkbook, the latter is "magic" from wizards/non-managers who are notoriously difficult for many to manage, assuming they really even care that much.

You're also looking at this from the viewpoint of the good of the organization, whereas we know that's generally not how things play out in the long term, e.g. Pournelle's Iron Law of Bureaucracy: http://www.jerrypournelle.com/reports/jerryp/iron.html

The deep approach results in a different balance of resources in an organization, which inevitably produces losers, like those responsible for those big fleets of machines. I can't help but notice that half of your examples were done on surplus desktop machines, and if I remember correctly jacquesm's latest reported experience was with a company that was desperate.

This is not to pour cold water on the deep approach, just to suggest in many situation how to target your advocacy and opportunities, to be prepared for blowback if you embarrass people expending massive resources for what you can fit on one idle surplus desktop, etc.