Comment by kkfx
5 hours ago
How often do you test your OS replication script? I used to do that too, and every time there was always something broken, outdated, or needing modification, often right when I desperately needed a restore because I was about to leave on a business trip and had a flight to catch with a broken laptop disk.
How much time do you spend setting up a desktop and maintaining it with mdraid+LUKS+LVM+your choice of filesystem, replacing a disk and doing the resilvering, or making backups with SnapRAID/Kopia etc? Again, I used to do that. I stopped after finding better solutions, also because I always had issues during restores, maybe small ones, but they were there, and when it's not a test but a real restore, the last thing you want is problems.
Have you actually tested your backup by doing a sudden, unplanned restore without thinking about it for three days before? Do you do it at least once a year to make sure everything works, or do you just hope that since computers rarely fail and restores take a long time, everything will work when you need it? When I did things like you and others I know who still do it, practically no one ever tested their restore, and the recovery script was always one distro major release behind. You had to modify it every few releases when doing a fresh install. In the meantime, it's "hope everything goes well or spend a whole day scrambling to fix things."
Maybe a student is okay with that risk and enjoys fixing things, but generally, it's definitely not best practice and that's why most are on someone else's computer, called the cloud, as protection from their IT choices...
> How often do you test your OS replication script?
Not often. It's mostly outdated, and I spend a lot of time bringing it up to date when I have to rely on it.
BUT I can easily understand what it does, and the tools it uses. In practice I use it rarely, so spending a few hours a year updating it is not a huge problem. I don't have the sense of urgency you describe, and when things do fail, it's an extraordinary event where everything else can wait for me to be productive again. I'm not running a critical business, these are my personal machines. Besides, I have plenty of spare machines I can use while one is out of service.
This is the tradeoff I have decided to make, which works for me. I'm sure that using ZFS and a reproducible system has its benefits, and I'm trying to adopt better practices at my own pace, but all of those have significant drawbacks as well.
> Have you actually tested your backup by doing a sudden, unplanned restore without thinking about it for three days before?
No, but again, I'm not running a critical business. Things can wait. I would argue that even in most corporate environments the obsession over HA comes at the expense of operational complexity, which has a greater negative impact than using boring tools and technology. Few companies need Kubernetes clusters and IaC tools, and even fewer people need ZFS and NixOS for personal use. It would be great if the benefits of these tools were accessible to more people with less drawbacks, but the technology is not there yet. You shouldn't gloss over these issues because they're not issues for you.