Comment by alexfoo

24 days ago

> This stat is also complete bullshit. If it were true, your scrubs of any 20+TB pool would get at least corrected errors quite frequently. But this is not the case.

I would expect the ZFS code is written with the expected BER in mind. If it reads something, computes the checksum and goes "uh oh" then it will probably first re-read the block/sector, see that the result is different, possibly re-read it a third time and if all OK continue on without even bothering to log an obvious BER related error. I would expect it only bothers to log or warn about something when it repeatedly reads the same data that breaks the checksum.

Caveat Reddit but https://www.reddit.com/r/zfs/comments/3gpkm9/statistics_on_r... has some useful info in it. The OP starts off with a similar premise that a BER of 10^-14 is rubbish but then people in charge of very large pools of drives wade in with real world experience to give more context.

1 comment

alexfoo

digiown 24 days ago

That's some very old data. I'm curious as to how stuff have changed with all the new advancements like helium drives, HAMR, etc. From the stats Backblaze helpfully publish, I feel like the huge amount of variance between models far outweigh the importance of this specific stat in terms of considering failure risks.

I also thought that it's "URE", i.e. unrecoverable with all the correction mechanisms. I'm aware that drives use various ways to protect against bitrot internally.