← Back to context

Comment by dylan604

2 days ago

Depending on the HDD vendor/model. We had hot spares and cold spares. On one build, we had a bad batch of drives. We built the array on a Friday, and left it for burn-in running over the weekend. On Monday, we came in to a bunch of alarms and >50% failure rate. At least they died during the burn-in so no data loss, but it was an extreme example. That was across multiple 16-bay rack mount chassis. It was an infamous case though, we were not alone.

More typically, you'd have a drive die much less frequently, but it was something you absolutely had to be prepared for. With RAID-6 and a hot spare, you could be okay with a single drive failure. Theoretically, you could lose two, but it would be a very nervy day getting the array to rebuild without issue.

I asked because I did a makeshift NAS for myself with three 4tb ironwolf, but they died before the third year. I didn't investigate much, but it was most likely because of power outages and a lack of a nobreak PSU at that time. It's still quite a bit of work to maintain physical hard drives and the probability of failure as I understand tend to increase the more units the array has because of inverse probability (not the likelihood of one of them failing but the likelihood of none of them failing after a period of time, which is cumulative)

  • Any electronic gear that you care about must be connected to a UPS. HDDs are very susceptible to power issues. Good UPS are also line conditioners so you get a clean sine wave rather than whatever comes straight from the mains. If you've never seen it, connect a meter to an outlet in your home and what how much fluctuations you get throughout the day. Most people think about spikes/surges, while forgetting that dips and under-volting is damaging as well. Most equipment have a range of acceptable voltage, but you'd be amazed at the number of times mains will dip below that range. Obviously location will have an affect on quality of service, but I hear my UPSes kick in multiple times a week to cover a dip if only for a couple of seconds.

    The fun thing about storage pools is that they can lull you into thinking they are set it and forget it. You have to monitor SMART messages. Most drives will give you a heads up if you know where to look. Having the fortitude to have a hot spare instead of just adding it to the storage pool goes a long way from losing data.