Comment by kstrauser

5 months ago

My experience was a bit different. I first saw a Starfire when we were deploying a bunch of Linux servers in the DC. The Sun machine was brilliant, fast, enormous, and far more expensive per unit of work than these little x86 boxes we were carting in.

The Starfire started at around $800K. Our Linux servers started at around $1K. The Sun box was not 800x faster at anything than a single x86 box.

It was an impressive example of what I considered the wrong road. I think history backs me on this one.

> It’s a bit sad that nobody gives a shit about performance any more.

Everyone gives a shit about performance at some point, but the answer is horizontal scaling. You can’t vertically scale a single machine to run a FAANG. At a certain vertical scale, it starts to look a helluva lot like horizontal scaling (“how many CPUs for this container? How many drives?”), except in a single box with finite and small limits.

12 comments

kstrauser

axiolite 5 months ago

> The Sun box was not 800x faster at anything than a single x86 box.

You don't buy enterprise gear because it's economical for bulk number-crunching... You buy enterprise gear when you have a critical SPOF application (typically the database) that has to be super-reliable, or that requires greater resources than you can get in commodity boxes.

RAS is an expensive proposition. Commodity servers often don't have it, or have much less of it than enterprise gear. Proprietary Unix systems offered RAS as a major selling point. IBM mainframes still have a strong market today.

It wasn't until the late 2000's when x86 went to 64-bit, so if your application wanted to gobble more than 2GB/4GB of RAM, you had to go with something proprietary.

It was even more recently that the world collectively put a huge amount of effort in, and figured out how to parallelize a large amount of number-crunching problems that were previously limited to single-threaded.

There have been many situations like these through the history of computing... Going commodity is always cheaper, but if you have needs commodity systems don't meet, you pay the premium for proprietary systems that do.

jeffbee 5 months ago

You didn't need imaginary 64-bit PCs because a rack full of smaller 64-bit SPARC systems would have been much cheaper than a single E10k. Something that large in a single system was only necessary for people with irreducible memory requirements, ie not delivering mail.
kstrauser 5 months ago

First, yes, everything you said is true. And especially when you’re supporting an older application designed around such SPOFs, you need those to be bulletproof. That’s completely reasonable. That said, a fair chunk of my work since the 90s has been in building systems that try to avoid SPOFs in the first place. Can we use sharded databases such that upgrading one doesn’t take the others down? Shared-nothing backend servers? M-to-N meshes so we’re not shoving everything through a single load balancer or switch? Redundant data centers? The list goes on.
I don’t think that approach is inherently better than what you described. Each has its own tradeoffs and there’s a time and place for both of them. I absolutely did see a lot of Big Iron companies marketing their giant boxes as the “real, proven” alternative to a small cluster of LAMP servers, though. I don’t blame them for wanting to be big players in that market, too, but that wasn’t a good reason to use them (unless you already had their stuff installed and wanted to add a web service next to your existing programs).
I wouldn’t run a bank on an EC2 instance, but neither would I ever buy a mainframe to host Wordpress at any scale.
jandrewrogers 5 months ago

As a technical nit, the 64-bit AMD Opteron was released in 2003, not late 2000s. It almost immediately took over the low- to mid-range server market and HPC market because nothing could touch its performance and scalability for the price. It was a state-of-the-art design for the time and relatively cheap, same vibes as the Apple M1 release.
People still used the big mainframe-y UNIX servers but their usage shrunk and you could see the writing on the wall. I was already replacing Sparc database servers with Opterons in 2004. The hardware wasn’t as gold-plated but they were fast and workloads were already outgrowing the biggest mainframe-y servers.
TBH, a lot of the gold-plated “enterprise” hardware failed far more often in practice than you would expect, including unrecoverable hard failures. That was a common enough experience that it probably detracted from the sales pitch for that extremely expensive hardware.
emmelaich 5 months ago

True but even then the actual redundancy fell short of that advertised.

trollied 5 months ago

I don’t disagree. But most also don’t give a shit and then scale horizontally endlessly, and spend too much money, to deal with their crappy code.

As a dev it isn’t your problem if the company you work for just happily provisions and sucks it up.

kstrauser 5 months ago

That’s a thing, to be sure. The calculus gets a little complicated when that developer’s pay is far more than the EC2 bill. There’s a spectrum with a small shop wasting $1000 a year hosting inefficient code, and Google-scale where SRE teams would love to put “saved .3% on our cloud bill!” on their annual review.
mshook 5 months ago

That's my experience as well in 2 different companies where we went from 2 E15K to 2 E25K because it was "cheaper" than rewriting who knows how much code, for how long and at what cost
At the other one, we jumped from 2 25k to 2 M9000-64 for the same reasons...
rjsw 5 months ago

> ... to deal with their crappy code
written in an interpreted language.

znpy 5 months ago

> Everyone gives a shit about performance at some point, but the answer is horizontal scaling. You can’t vertically scale a single machine to run a FAANG.

You might be surprised about how many companies think they're FAANG (but aren't) though.

kstrauser 5 months ago
That’s a whole other story, to be sure! “We absolutely must have multi-region simultaneous writes capable of supporting 300,000,000 simultaneous users!” “Your company sells door knobs and has 47 customers. Throw it on PostgreSQL and call it solved.”
- znpy 5 months ago
  
  Yup. And if you reach 94 users, just get a larger machine for PostgreSQL and call it a day, if you really think you need it.