Comment by bigiain

1 year ago

The complexity you avoid by choosing this 'scale vertically all the way' approach is _huge_.

Like you say, deploy read replicas or automate regular backups (with regularly tested restore automation), whatever you need to meet your RTO and RPOs.

You need to be able to ignore any ignorant sales/marketing/growth management that try to tell you "we can't afford any downtime at all!!!" and have enough internal political power on the tech side. Tech leadership should be able to document the costs of each extra nine of uptime to senior management and have them acknowledge that occasional second or perhaps even minutes of downtime during failover is acceptable.

By the time you start approaching the limits of your server with two 64 core epyc CPUs and 3TB of RAM and 24 NVMe flash SSDs - hopefully you no longer need to take advice from randoms on HN because you should by then have 100+ database and network engineers working follow the sun shifts with some _really_ smart and deeply experienced dedicated database leadership managing them.

Or, you need to get someone on board to teach all your junior devs or "vibe coders" about database indexes and how to construct queries that don't do multiple fulltable scans just to render some user profile widget on every page load... "But it wrks fine on my laptop!!!" (with a whole 7 user accounts and 47 rows in the user_activity table...)

1 comment

bigiain

stavros 1 year ago

Honestly, so few companies do this that you'll have a massive execution advantage by just avoiding the network where possible.

The "scalability by default" mindset has wasted billions of hours of productivity globally, yet we're still doing it.