Comment by gchamonlive

2 days ago

> I come from the old days of striping 16 HDDs together (at a minimum number) to get 1GB/s throughput

Woah, how long would that last before you'd start having to replace the drives?

If you're interested in some hard data, Backblaze publishes their HD failure numbers[1]. These disks are storage optimized, not performance optimized like the parent comment, but they have a pretty large collection of various hard drives, and it's pretty interesting to see how reliability can vary dramatically across brand and model.

---

1. https://www.backblaze.com/cloud-storage/resources/hard-drive...

  • The Backblaze reports are impressive. It would have been very handy to know which models to buy. They break it down to capacity of the same family of drives so a 2TB might be sound, but the 4TB might be more flaky. That information is very useful when it comes time to think about upgrading capacity in the arrays. Having someone go through these battles and then give away the data learned would just be dumb to not take advantage of their generosity.

Depending on the HDD vendor/model. We had hot spares and cold spares. On one build, we had a bad batch of drives. We built the array on a Friday, and left it for burn-in running over the weekend. On Monday, we came in to a bunch of alarms and >50% failure rate. At least they died during the burn-in so no data loss, but it was an extreme example. That was across multiple 16-bay rack mount chassis. It was an infamous case though, we were not alone.

More typically, you'd have a drive die much less frequently, but it was something you absolutely had to be prepared for. With RAID-6 and a hot spare, you could be okay with a single drive failure. Theoretically, you could lose two, but it would be a very nervy day getting the array to rebuild without issue.

  • I asked because I did a makeshift NAS for myself with three 4tb ironwolf, but they died before the third year. I didn't investigate much, but it was most likely because of power outages and a lack of a nobreak PSU at that time. It's still quite a bit of work to maintain physical hard drives and the probability of failure as I understand tend to increase the more units the array has because of inverse probability (not the likelihood of one of them failing but the likelihood of none of them failing after a period of time, which is cumulative)

    • Any electronic gear that you care about must be connected to a UPS. HDDs are very susceptible to power issues. Good UPS are also line conditioners so you get a clean sine wave rather than whatever comes straight from the mains. If you've never seen it, connect a meter to an outlet in your home and what how much fluctuations you get throughout the day. Most people think about spikes/surges, while forgetting that dips and under-volting is damaging as well. Most equipment have a range of acceptable voltage, but you'd be amazed at the number of times mains will dip below that range. Obviously location will have an affect on quality of service, but I hear my UPSes kick in multiple times a week to cover a dip if only for a couple of seconds.

      The fun thing about storage pools is that they can lull you into thinking they are set it and forget it. You have to monitor SMART messages. Most drives will give you a heads up if you know where to look. Having the fortitude to have a hot spare instead of just adding it to the storage pool goes a long way from losing data.

I run 24x RAID at home. I’m replacing disks 2-3 times per year.

  • Are your drives under heavy load or primarily just spinning waiting for use? Are they dying unsuspectedly, or are you watching the SMART messages and being prepared when it happens?

    • They’re idle most of the time. Poweted on 24/7 though, and maybe a few hundred megabytes written every day, plus a few dozen gigabytes now and then. Mostly long-term storage. SMART has too much noise; I wait for zfs to kick it out of the pool before changing. With triple redundancy, never got close to data loss.

      To be clear, I should have said replacing 2-3 disks per year.

      2 replies →