← Back to context

Comment by misswaterfairy

3 hours ago

Part of the up-time solution is keeping as much of your app and infrastructure within your control, rather than being at the behest of mega-providers as we've witnessed in the past month: Cloudflare, and AWS.

Probably:

- a couple of tower servers, running Linux or FreeBSD, backed up by a UPS and an auto-run generator with 24 hours worth of diesel (depending on where you are, and the local areas propensity for natural disasters - maybe 72 hours),

- Caddy for a reverse proxy, Apache for the web server, PostgreSQL for the database;

- behind a router with sensible security settings, that also can load-balance between the two servers (for availability rather than scaling);

- on static WAN IPs,

- with dual redundant (different ISPs/network provider) WAN connections,

- a regular and strictly followed patch and hardware maintenance cycle,

- located in an area resistant to wildfire, civil unrest, and riverine or coastal flooding.

I'd say that'd get you close to five 9s (no more than ~5 minutes downtime per year), though I'd pretty much guarantee five 9s (maybe even six 9s - no more than 32 seconds downtime per year) if the two machines were physically separated from each other by a few hundred kilometres, each with their own supporting infrastructure above, sans the load balancing (see below), through two separate network routes.

Load balancing would become human-driven in this 'physically separate' example (cheaper, less complex): if your-site-1.com fails, simply re-point your browser to your-site-2.com which routes to the other redundant server on a different network.

The hard part now will be picking network providers that don't use the same pipes/cables, i.e. they both use Cloudflare, or AWS...

Keep the WAN IPs written down in case DNS fails.

PostgreSQL can do master-master replication, but it's a pain to set up I understand.