Comment by lelanthran

3 years ago

> The long term solutions end up being difficult to implement and can be high risk because now you have real customers (maybe not so happy because now slow db) and probably not much in house experience for dealing with such large scale data; and an absolute lack of ability to hire existing talent as the few people that really can solve for it are up to their ears in job offers.

This is a problem of having succeeded beyond your expectations, which is a problem only unicorns have.

At that point you have all this income from having fully saturated the One Big Server (which, TBH, has unimaginably large capacity when everything is local with no network requests), so you can use that money to expand your capacity.

Any reason why the following won't work:

Step 1: Move the DB onto it's own DBOneBigServer[1]. Warn your customers of the downtime in advance. Keep the monolith as-is on the current OriginalOneBigServer.

Step 2: OriginalOneBigServer still saturated? Put copies of the monolith on separate machines behind a load-balancer.

Step 3: DBOneBigServer is still saturated, in spite of being the biggest Oxide rack there is? Okay, now go ahead and make RO instances, shards, etc. Monolith needs to connect to RO instances for RO operations, and business as usual for everything else.

Okay, so Step 3 is not as easy as you'd like, but until you get to the point that your DBOneBigServer cannot handle the loads, there's no point in spending the dev effort on sharding. Replication doesn't usually require a team of engineers f/time, like a distributed DB would.

If, after Step 3, you're still saturated, then it might be time to hire the f/time team of engineers to break up everything into microservices. While they get up to speed you're making more money than god.

Competitors who went the distributed route from day one have long since gone out of business because while they were still bugfixing in month 6, and solving operational issues for half of each workday (all at a higher salary) in month 12, and blowing their runway cash on AWS for the first 24 months, you had already deployed in month 2, spending less than they did.

I guess the TLDR is "don't architect your system as if you're gonna be a unicorn". It's the equivalent of you, personally, setting your two-year budget to include the revenue from winning a significant lottery.

You don't plan your personal life "just in case I win the lottery", so why do it with a company?

[1] backedup/failover as needed

^ This. Not so long ago, I had worked in the finance department of a $350M company as one of the five IT guys and we had just begun implementing Step 2, after OriginalOneBigServer had shown its limits. DBOneBigServer was really big though, 256 GB RAM and 128 cores if I remember correctly. So big in fact that I implemented some of my ETL tasks as stored SQL procedures to be run directly on the server. The result? A task that would easily take a big fraction of OneBigServer memory and 15 hours (expected to increase correlatedly with the revenue) is run in 30 minutes.

It's worth noting that when I left we still were nowhere close to saturate DBOneBigServer.

> This is a problem of having succeeded beyond your expectations, which is a problem only unicorns have.

Nope. I've worked on a few projects that are not "unicorns" yet have legitimately hit that wall. Particularly around on line gaming and gambling.

  • Maybe unicorn is not the right word? If your app has millions of DAUs choking your DB, you should at least be tacking your next big investment round or some other success milestone.

    Otherwise, your product is on it's way to failure, so good thing you did One Big DB...

    • You’re thinking too Silicon Valley.

      These services didn’t need additional rounds of funding and aren't the kind of thing that would scale like a unicorn.

      Some services might only been transient (like services based around a particular sports league or TV series) or be regional (like government sites or, also, sports leagues).

      Not every service out there has aspirations to “change the world”. Some exist to fill a niche. But sometimes that “niche” still covers millions of people.