Comment by imiric

3 hours ago

> How often do you test your OS replication script?

Not often. It's mostly outdated, and I spend a lot of time bringing it up to date when I have to rely on it.

BUT I can easily understand what it does, and the tools it uses. In practice I use it rarely, so spending a few hours a year updating it is not a huge problem. I don't have the sense of urgency you describe, and when things do fail, it's an extraordinary event where everything else can wait for me to be productive again. I'm not running a critical business, these are my personal machines. Besides, I have plenty of spare machines I can use while one is out of service.

This is the tradeoff I have decided to make, which works for me. I'm sure that using ZFS and a reproducible system has its benefits, and I'm trying to adopt better practices at my own pace, but all of those have significant drawbacks as well.

> Have you actually tested your backup by doing a sudden, unplanned restore without thinking about it for three days before?

No, but again, I'm not running a critical business. Things can wait. I would argue that even in most corporate environments the obsession over HA comes at the expense of operational complexity, which has a greater negative impact than using boring tools and technology. Few companies need Kubernetes clusters and IaC tools, and even fewer people need ZFS and NixOS for personal use. It would be great if the benefits of these tools were accessible to more people with less drawbacks, but the technology is not there yet. You shouldn't gloss over these issues because they're not issues for you.

Most companies have terrible infrastructure; they're hardly ever examples to follow. But they also have it because there's a certain widespread mentality among those who work there, which originates on the average student's desktop, where they play with Docker instead of understanding what they're using. This is the origin of many modern software problems: the lack of proper IT training in universities.

MIT came up with "The Missing Semester of Your CS Education" to compensate, but it's nothing compared to what's actually needed. It's assumed that students will figure it out on their own, but that almost never happens, at least not in recent decades. It's also assumed that it's something easy to do on your own, that it can be done quickly, which is certainly not the case and I don't think it ever has been. But the teacher who doesn't know is the first to have that bias.

The exceptional event, even if it doesn't require such a rapid response, still reveals a fundamental problem in your setup. So the question should be: why maintain this complex script when you can do less work with something else? NixOS and Guix are tough nuts to crack at first: NixOS because of its language and poor/outdated/not exactly well-done documentation; Guix because its development is centered away from the desktop and it lacks some elements common in modern distros, etc. But once you learn them, there's much less overhead to solve problems and keep everything updated, much less than maintaining custom scripts.