← Back to context

Comment by Carrok

6 hours ago

I have been self hosting for years. The maintenance is minimal to nonexistent. You are conflating modern SaaS with a stable OSS docker image.

Keeping services running is fairly trivial. Getting to parity with the operationalization you get from a cloud platform takes more ongoing work.

I have a homelab that supports a number of services for my family. I have offsite backups (rsync.net for most data, a server sitting at our cottage for our media library), alerting, and some redundancy for hardware failures.

Right now, I have a few things I need to fix: - one of the nodes didn't boot back up after a power outage last fall; need to hook up a KVM to troubleshoot - cottage internet has been down since a power outage, so those backups are behind (I'm assuming it's something stupid, like I forgot to change the BIOS to power on automatically on the new router I just put in) - various services occasionally throw alerts at me

I have a much more complex setup than necessary (k8s in a homelab is overkill), but even the simplest system still needs backups if you care at all about your data. To be fair, cloud services aren't immune to this, either (the failure mode is more likely to be something like your account getting compromised, rather than a hardware failure).

  • A hidden cost of self-hosting.

    I love self-hosting and run tons of services that I use daily. The thought of random hardware failures scares me, though. Troubleshooting hardware failure is hard and time consuming. Having spare minipcs is expensive. My NAS server failing would have the biggest impact, however.

    • Other than the firewall (itself a minipc), I only have one server where a failure would cause issues: it's connected to the HDDs I use for high-capacity storage, and has a GPU that Jellyfin uses for transcoding. That would only cause Jellyfin to stop working—the other services that have lower storage needs would continue working, since their storage is replicated across multiple nodes using Longhorn.

      Kubernetes adds a lot of complexity initially, but it does make it easier to add fault tolerance for hardware failures, especially in conjunction with a replicating filesystem provider like Longhorn. I only knew that I had a failed node because some services didn't come back up until I drained and cordoned the node from the cluster (looks like there are various projects to automate this—I should look into those).