Comment by jiggawatts
9 days ago
A key metric for recovery is the time it takes to read or write an entire drive (or drive array) in full. This is simply a function of the capacity and bandwidth, which has been getting worse and worse as drive capacities increase exponentially, but the throughput hasn't kept up at the same pace.
A typical 2005 era drive from two decades ago might have been 0.5 TB with a throughput of 70 MB/s for a full-drive transfer time (FDTT) of about 2 hours. A modern 32 TB drive is 64x bigger but has a throughput of only 270 MB/s which is less than 4x higher. Hence the FDDT is 33 hours!
This is the optimal scenario, things get worse in modern high-density disk arrays that may have 50 drives in a single enclosure with as little as 8-32 Gbps (1 GB/sec to 4 GB/sec) of effective bandwidth. That can push FDDT times out to many days or even weeks.
I've seen storage arrays where the drive trays were daisy chained, which meant that while the individual ports were fast, the bandwidth per drive would drop precipitously as capacity was expanded.
It's a very easy mistake to just keep buying more drives, plugging them in, and never going back to the whiteboard to rethink the HA/DR architecture and timings. The team doing this kind of BAU upgrade/maintenance is not the team that designed the thing originally!
No comments yet
Contribute on Hacker News ↗