Comment by chatmasta

1 year ago

I mostly agree with you, but a single server won’t solve redundancy and disaster recovery. That doesn’t mean you need to adopt a fully distributed system – a read replica or even periodic backup should be sufficient – but it’s not as simple as “just use a single server.”

Stackoverflow is famously powered by a cluster of three vertically scaled database servers.

3 comments

chatmasta

bigiain 1 year ago

The complexity you avoid by choosing this 'scale vertically all the way' approach is _huge_.

Like you say, deploy read replicas or automate regular backups (with regularly tested restore automation), whatever you need to meet your RTO and RPOs.

You need to be able to ignore any ignorant sales/marketing/growth management that try to tell you "we can't afford any downtime at all!!!" and have enough internal political power on the tech side. Tech leadership should be able to document the costs of each extra nine of uptime to senior management and have them acknowledge that occasional second or perhaps even minutes of downtime during failover is acceptable.

By the time you start approaching the limits of your server with two 64 core epyc CPUs and 3TB of RAM and 24 NVMe flash SSDs - hopefully you no longer need to take advice from randoms on HN because you should by then have 100+ database and network engineers working follow the sun shifts with some _really_ smart and deeply experienced dedicated database leadership managing them.

Or, you need to get someone on board to teach all your junior devs or "vibe coders" about database indexes and how to construct queries that don't do multiple fulltable scans just to render some user profile widget on every page load... "But it wrks fine on my laptop!!!" (with a whole 7 user accounts and 47 rows in the user_activity table...)

stavros 1 year ago

Honestly, so few companies do this that you'll have a massive execution advantage by just avoiding the network where possible.
The "scalability by default" mindset has wasted billions of hours of productivity globally, yet we're still doing it.

UltraSane 1 year ago

Realistically for a production database you would use a clustering technology like Oracle RAC or SQL Server Always On Failover Clustering or Always On Availability Groups