Comment by misswaterfairy
5 hours ago
Part of the up-time solution is keeping as much of your app and infrastructure within your control, rather than being at the behest of mega-providers as we've witnessed in the past month: Cloudflare, and AWS.
Probably:
- a couple of tower servers, running Linux or FreeBSD, backed up by a UPS and an auto-run generator with 24 hours worth of diesel (depending on where you are, and the local areas propensity for natural disasters - maybe 72 hours),
- Caddy for a reverse proxy, Apache for the web server, PostgreSQL for the database;
- behind a router with sensible security settings, that also can load-balance between the two servers (for availability rather than scaling);
- on static WAN IPs,
- with dual redundant (different ISPs/network provider) WAN connections,
- a regular and strictly followed patch and hardware maintenance cycle,
- located in an area resistant to wildfire, civil unrest, and riverine or coastal flooding.
I'd say that'd get you close to five 9s (no more than ~5 minutes downtime per year), though I'd pretty much guarantee five 9s (maybe even six 9s - no more than 32 seconds downtime per year) if the two machines were physically separated from each other by a few hundred kilometres, each with their own supporting infrastructure above, sans the load balancing (see below), through two separate network routes.
Load balancing would become human-driven in this 'physically separate' example (cheaper, less complex): if your-site-1.com fails, simply re-point your browser to your-site-2.com which routes to the other redundant server on a different network.
The hard part now will be picking network providers that don't use the same pipes/cables, i.e. they both use Cloudflare, or AWS...
Keep the WAN IPs written down in case DNS fails.
PostgreSQL can do master-master replication, but it's a pain to set up I understand.
No comments yet
Contribute on Hacker News ↗